HPE Completes $1.3B Cray Deal, Launches DAOS, Nvidia‑Nokia GPU AI, and Exascale Liquid Cooling

HPE Completes $1.3B Cray Deal, Launches DAOS, Nvidia‑Nokia GPU AI, and Exascale Liquid Cooling
Photo by Mukil Menon

TL;DR

  • HPE finalizes $1.3 billion Cray acquisition, positioning itself to deliver world‑class supercomputers
  • GX5000 exascale system launches, succeeding Cray Shasta, shipping ahead of SC2018
  • HPE introduces Distributed Asynchronous Object Storage (DAOS) cluster, pioneering Intel Aurora architecture
  • Frontier and Fugaku lead the exascale era, setting new physics and climate modeling benchmarks
  • Nvidia collaborates with Nokia to deliver GPU‑powered AI in next‑generation wireless networking
  • HPE advances liquid cooling technologies for exascale GPUs, enhancing density and energy efficiency

HPE’s $1.3 B Cray Acquisition Reshapes AI‑Driven Supercomputing

Strategic Integration

  • Cash purchase of Cray consolidates two leading HPC ecosystems under HPE’s enterprise‑grade platform.
  • Combines HPE’s ProLiant server line with Cray’s “Shasta” interconnect architecture, creating a unified stack for AI‑first workloads.
  • Reduces reliance on third‑party HPC vendors, aligning firmware, cooling, and networking development.

Product Roadmap Enhancements

  • Private Cloud AI (2nd Gen) – ships Dec 2025 on ProLiant DL380 Gen12, delivers ≥3× performance versus 1st‑gen, supports Nvidia Blackwell‑Era GPUs (NVL72, NVL144).
  • Personal Cloud AI – air‑gapped, modular clusters built on Cray‑derived interconnects; prototype stage, early‑2026 target.
  • SuperNIC & DPU Integration – ConnectX‑9 SuperNIC and BlueField‑4 DPU on new ProLiant DL280a platforms (Q4 2025) boost intra‑node bandwidth and offload CPU tasks.
  • Exascale “Solstice” Systems – pilot at Argonne (Q1 2026) uses 100 k Blackwell GPUs for >2 000 exaflops AI throughput, occupying up to 100 k sq ft facilities.
  • AI and HPC convergence accelerates as Nvidia’s GTC announcements and HPE‑Nvidia joint compute initiatives emphasize AI‑centric architectures.
  • Demand for secure, isolated compute clusters rises in government and defense sectors, driving the Personal Cloud AI line.
  • Rapid scaling of GPU density—Blackwell and upcoming Vera Rubin “Superchip”—creates a supply chain advantage for HPE’s integrated solutions.
  • Consolidated product stack positions HPE to capture a larger share of the enterprise AI‑HPC market, traditionally fragmented among niche vendors.

Projected Growth Through 2028

  • Full‑scale Cray‑derived exascale offerings (Solstice, Equinox) slated for market readiness by Q2 2026, supporting >10 k Blackwell GPUs per rack.
  • Forecasted market share exceeds 15 % of enterprise AI‑HPC by 2028, based on an annual demand of >500 k GPU units.
  • Incremental revenue from AI‑accelerated supercomputing projected at USD 2.5 B per year, assuming a five‑year adoption curve across government, cloud providers, and research institutions.
  • Compatibility with Nvidia’s Vera Rubin “Superchip” ensures performance scaling without major architectural redesigns, preserving the roadmap’s continuity.

Why HPE DAOS Powered by Intel Aurora Is a Game‑Changer for AI Supercomputing

Architecture at a Glance

  • Compute: 36 × NVIDIA Grace CPUs paired with 72 × NVIDIA Blackwell Ultra GPUs in a single NVLink domain, delivering > 100 PFLOPS per chip.
  • Storage: All‑flash hot‑tier DAOS pool (NVMe‑over‑Fabric) augmented by Intel Aurora Persistent Memory Modules (PMEM) for sub‑microsecond object access.
  • Network: 9 × NVLink switch trays, Quantum‑X800 InfiniBand plus Spectrum‑X Ethernet, ConnectX‑8 SuperNIC providing > 400 GB/s intra‑rack bandwidth.
  • Chassis: 48U liquid‑cooled rack with manifold cooling, achieving > 60 % water‑usage reduction versus air‑cooled equivalents.
  • Orchestration: HPE GreenLake automation delivers zero‑touch onboarding and lifecycle‑management APIs across hybrid clouds.

Integration with Intel Aurora

  • 2 TB/s aggregated memory bandwidth eliminates I/O throttling for DAOS client pathways.
  • Direct load/store access to PMEM removes copy‑on‑write overhead, improving write amplification by ~30 %.
  • Aurora’s Accelerated Storage Instructions (ASI) offload checksum and compression to the CPU, reducing GPU stall cycles during data ingestion.

Deployment Footprint and Serviceability

  • Modular mix‑and‑match of all‑flash, hybrid SSD/HDD, and NVMe‑over‑Fabric tiers enables workload‑specific tuning without rewiring.
  • Automated firmware, BIOS, and DAOS stack updates complete in a single workflow; hot‑swap replacement of a compute tray takes under 5 minutes.
  • Full rack becomes operational in ≤ 30 minutes, meeting the “30‑day AI rollout” benchmark set by leading cloud providers.
  • AI‑factory scale‑out: ≥ 36 Grace CPUs and 72 Blackwell GPUs per rack satisfy exascale AI training needs reported by DOE supercomputing projects.
  • Sustainability: Liquid cooling cuts total‑cost‑of‑ownership by up to 25 % versus air‑cooled systems, aligning with corporate ESG goals.
  • Cross‑domain compatibility: DAOS’s asynchronous object model abstracts storage from compute, supporting seamless migration between on‑prem, edge, and hyperscale clouds.

Performance Outlook

  • IOPS: > 5 M IOPS for 4 KB objects.
  • Sustained throughput: 1.2 TB/s on the all‑flash tier.
  • Median latency: 2.3 µs for read‑modify‑write cycles leveraging PMEM.
  • Result: ~40 % reduction in total training time for models > 1 TB, compared with traditional NFS‑backed clusters.

Adoption Forecast

  • Short‑term (12 months): Deployment in three DOE AI supercomputing sites and two major cloud providers (AWS, Azure) for AI‑factory workloads.
  • Mid‑term (24 months): Expansion into enterprise AI analytics, with ~15 % of Fortune‑500 AI projects selecting the DAOS‑Aurora stack for latency‑sensitive inference.

Nvidia’s $1 Billion Bet on Nokia Accelerates GPU‑Powered AI for Future Wireless Networks

Strategic Investment and Stakeholding

  • Nvidia transferred US $1 billion to acquire a 2.9 % equity position in Nokia.
  • The capital infusion targets the integration of Nvidia’s GPU‑centric AI compute into Nokia’s telecom‑infrastructure portfolio, positioning both firms for AI‑native 5G and emerging 6G deployments.

Technical Benchmarks of the AI‑RAAN Stack

  • L2 massive‑MIMO processing achieved a 380‑fold speed increase.
  • L1 signal‑processing performance improved by a factor of 40.
  • Cell capacity expanded seven times, while power efficiency rose 3.5‑fold per cell.
  • Benchmarks were demonstrated at Nvidia GTC 2025 (27‑29 Oct) and validated by internal performance reports.

Market Reaction and Valuation Shifts

  • Nokia’s share price climbed between 20 % and 25 % on the NYSE immediately following the announcement.
  • Nvidia’s market capitalization exceeded US $5 trillion during the same period, underscoring its capacity to fund large‑scale telecom initiatives.

Ecosystem Integration and Timeline

  • Key partners: Booz Allen (AI service orchestration), Cisco (user‑plane & 5G core software), MITRE (real‑time spectrum allocation), ODC (L2/L3 radio‑access processing), T‑Mobile (field validation).
  • Milestones:
    • 29‑30 Oct 2025 – Nvidia announces the $1 bn investment and stake acquisition.
    • Oct 2025 (GTC) – AI‑RAAN benchmark demo and AI‑Aerial platform roadmap release.
    • 1 Jun 2026 – Commercial launch of Nvidia‑ODC AI‑RAAN solution (Q1 2026 announcement).
    • 2026 – T‑Mobile conducts first live‑network field test of the AI‑native stack.
    • 2027 onward – Deployment of AI‑enhanced 6G prototypes leveraging edge GPU inference.
  • GPU‑centric AI is being embedded into traditionally CPU‑driven RAN and core functions, reflected in collaborations with Cisco and ODC.
  • The partnership’s focus on AI‑native architecture signals a strategic shift toward 6G research and standardisation.
  • A cross‑industry coalition—spanning consultancy, security, hardware, and operators—reduces integration risk and creates a de‑facto consortium for end‑to‑end AI deployment.
  • Capital‑driven acceleration links Nvidia’s financial commitment directly to Nokia’s R&D pipeline, establishing a reinforcement loop for future investments.

Forward Outlook

  • Analysts project that at least 30 % of new 5G deployments in North America and Europe will incorporate Nvidia‑powered AI‑RAAN by 2028, given the documented performance gains and operator interest.
  • Alignment with open‑RAN specifications positions the AI‑native stack for inclusion in 3GPP Release 18 (targeted 2027), potentially making AI‑RAAN a baseline feature for subsequent 6G certification.
  • The initial equity stake suggests a scalable investment model; further Nvidia equity positions in additional telecom OEMs are plausible as the AI‑RAN market expands.

Why HPE’s Liquid‑Cooling Strategy Is Shaping the Future of Exascale GPUs

Performance‑Driven Architecture

  • HPE’s Gen12 ProLiant DL 280a and DL 380 servers ship with Nvidia Blackwell Ultra GPUs, delivering three‑fold performance gains over first‑generation AI platforms.
  • A single 48U rack integrates 72 Blackwell GPUs across 18 compute trays, establishing an 80 % increase in GPU density compared with legacy air‑cooled designs.
  • The rack’s manifold‑based liquid‑cooling loop reduces GPU temperature rise by up to 65 % under mixed AI workloads, enabling higher power‑density operation.

Energy Efficiency Gains

  • Reduced thermal gradients lower rack‑level power consumption by an estimated 30 %, improving power‑usage effectiveness (PUE) from ~1.6 to ~1.1.
  • DOE AI supercomputer deployments attribute a 2 % reduction in overall data‑center power demand to HPE’s liquid‑cooling implementation.
  • Zero‑touch onboarding and automated lifecycle management contribute to a 15 % improvement in mean‑time‑to‑service metrics.

Comparative Cooling Technologies

  • HPE’s manifold approach is production‑ready, MGX‑compliant, and integrates with existing rack standards.
  • Microsoft‑Corintis micro‑fluidic cooling, featuring AI‑guided coolant routing, achieves up to 75 % temperature reduction in laboratory tests but remains at prototype stage and requires custom silicon.
  • Both approaches demonstrate superior heat extraction versus conventional cold plates; however, HPE’s solution offers immediate scalability for exascale deployments.
  • Standardization of liquid‑cooling interfaces, evidenced by MGX compliance and NVLink domain integration, is consolidating the ecosystem.
  • AI‑driven thermal management, exemplified by Corintis flow‑path optimization, is gaining traction as a differentiator for dynamic cooling.
  • Co‑design of hardware and cooling subsystems, as shown by HPE’s simultaneous launch of GPU‑dense servers and associated cooling infrastructure, marks a shift from retrofitted solutions.

Forecast Through 2028

  • By 2027, HPE’s liquid‑cooled racks combined with Nvidia’s Blackwell and upcoming Vera Rubin GPUs are projected to sustain ≥5 exaflops of AI training per data‑center module.
  • Within two years, liquid‑cooled architectures are expected to be specified in at least 30 % of new exascale purchases, driven by documented density and efficiency benefits.
  • Cumulative energy savings from widespread liquid‑cooling adoption are projected to lower total cost of ownership for AI workloads by approximately 12 % relative to air‑cooled baselines.