HPE Deploys Exascale Cray Systems; NVIDIA, AMD, Qualcomm Power Global AI; Space AI Data Centers and Energy‑Efficient HPC Rise

HPE Deploys Exascale Cray Systems; NVIDIA, AMD, Qualcomm Power Global AI; Space AI Data Centers and Energy‑Efficient HPC Rise
Photo by Đào Hiếu

TL;DR

  • HPE expands HPC with SGI and Cray deals, deploying Cray EX4000 and GX5000 exascale systems at Oak Ridge and Livermore labs.
  • Frontier supercomputer achieves 1.2 exaflops; Fugaku reaches 442 petaflops, marking continued leadership in exascale performance worldwide.
  • NVIDIA H100, AMD RDNA4, and Qualcomm AI accelerators power cloud‑based GPU workloads, enabling AI training and inference at unprecedented scale.
  • Space‑based AI data centers, such as Starcloud satellite nodes, promise zero‑water cooling and 95% solar power, lowering costs and boosting resilience.
  • Distributed storage clusters using DAOS and liquid cooling design offer energy‑efficient HPC, reducing water usage while maintaining high I/O throughput.
  • Hybrid cloud HPC deploys modular data centers with renewable integration, enabling edge processing and scalable performance across edge, user, and core environments.

HPE’s Exascale Gambit: How SGI and Cray Deal Shape America’s HPC Future

From Acquisitions to Architecture

By buying SGI for $275 million and Cray for $1.3 billion, HPE turned a component shop into the sole provider of complete exascale stacks for the U.S. Department of Energy. The result is a unified platform that blends AMD Instinct MI355X GPUs (with a MI430 upgrade on the horizon), Intel‑derived DAOS‑K3000 object storage, and the Slingshot 400 interconnect delivering 400 Gbps per lane.

What the Numbers Mean

The Cray EX4000 rack occupies 25 % less floor space than its predecessor while supplying up to 75 million IOPS per fully populated rack. GX5000, the next‑gen Shasta successor, pairs AMD EPYC CPUs with the same GPU family, enabling AI‑first workloads without sacrificing traditional MPI performance. Liquid‑cool pump‑rack designs cut rack‑level power draw, allowing higher compute density without exceeding facility limits.

Deployment at America’s Premier Labs

  • ORNL – GX5000 (Discovery): Early‑2026 delivery, full operation by 2029, targeting AI‑driven materials, fusion, and drug discovery.
  • LLNL – EX4000: Early‑2027 delivery, operational 2027‑2028, focused on nuclear security, climate modeling, and high‑energy physics.
  • Argonne – Support Nodes: 2027 rollout to augment multi‑physics simulations.

Why Consolidation Matters

With a single vendor delivering compute, storage, networking, and cooling, DOE labs avoid the procurement friction of stitching together disparate subsystems. The integrated DAOS solution is already outpacing Lustre in early user tests, promising >10 TB/s bandwidth and simpler management—factors that could push DAOS adoption above 70 % of new installations within five years.

Looking Ahead

Projected cumulative AI‑optimized throughput exceeds 200 EFLOPS by 2029 as MI430 GPUs replace the MI355X fleet and storage bandwidth scales with DAOS upgrades. HPE’s market share is forecast to climb past 60 % of the next round of DOE exascale contracts, thanks to the end‑to‑end solution that aligns with the nation’s sovereign AI and scientific computing agenda.

Why Memory‑First Accelerators Are Redefining the AI Data‑Center Landscape

Training Still Belongs to NVIDIA

Even as the AI‑centric capex market races toward a $7 trillion horizon, NVIDIA’s H100 platform retains its grip on large‑scale training. With 80 GB of HBM3e and a 1.6 TB/s bandwidth, the H100 family fuels exaflop clusters such as DOE’s Lux system. Forecasts show it will command more than 70 % of training workloads through 2028, underscoring the persistent need for raw FLOP density.

Inference Is Pivoting to Memory Capacity

Inference now outweighs training in data‑center energy consumption, accounting for roughly three‑quarters of AI power draw. Qualcomm’s AI200 (2026) and AI250 (2027) answer that challenge with 768 GB of LPDDR per card, delivering an effective bandwidth ten times that of AMD’s RDNA4‑based Radeon AI PRO R9700. The result is a 30‑40 % total‑cost‑of‑ownership advantage for rack‑scale inference clusters that operate within a 160 kW envelope—directly comparable to NVIDIA’s B200 and AMD’s Instinct‑MI350X power budgets.

Price‑Performance Disruption from AMD

AMD’s RDNA4 cards, priced between $1.3 k and $4 k, double the FP16 throughput of the preceding Radeon PRO W7800 while staying under 300 W per board. This cost structure undercuts NVIDIA’s RTX PRO pricing (≈ $8.5 k) and opens a credible path for edge and hybrid‑cloud deployments that rely on existing PCIe 5.0 infrastructure.

Convergent Power Budgets Shape Architecture

All three vendors now cluster around 150‑350 kW per rack, a range dictated more by data‑center cooling and power distribution than by pure compute ambition. The emerging “memory‑first” design philosophy—high‑capacity LPDDR or GDDR6 paired with modest power draw—reflects an industry‑wide acknowledgment that latency reduction at the token level trumps sheer FLOP counts for most production workloads.

Software Ecosystem as the New Battleground

NVIDIA’s CUDA lock‑in remains formidable, but AMD’s ROCm expansion and Qualcomm’s one‑click Hugging Face deployment kits signal a shift toward multi‑vendor flexibility. Hyperscalers increasingly evaluate accelerator stacks on interoperability as much as on raw performance, a trend that could erode NVIDIA’s dominance if rivals sustain their price‑performance momentum.

Strategic Outlook

By 2030, Qualcomm’s AI200/250 line is projected to handle about 15 % of global inference traffic, while AMD’s RDNA4 chips could capture near 8 %. Together they promise a diversified accelerator ecosystem where training and inference are decoupled, power budgets are standardized, and memory capacity becomes the primary performance lever. Investors and data‑center architects would be wise to hedge against NVIDIA’s training monopoly by allocating capital to these emerging memory‑centric inference platforms, ensuring both cost efficiency and technological resilience in the next decade of AI growth.

Orbital AI Data Centers: A Viable Path to Faster, Greener Compute

Technical Edge

Recent public disclosures (Oct 2025) show that low‑Earth‑orbit platforms such as Starcloud, Starlink and Neocloud Crusoe can harness up to 95 % solar capacity factor. The near‑continuous irradiance eliminates the night‑time and weather‑related power dips that constrain terrestrial sites. Radiative cooling to a 2.7 K background, demonstrated in a Nature Electronics study, removes the need for water‑intensive heat exchange, avoiding the 1.7 Mt water consumption projected for a 40 MW ground plant over ten years.

MetricValueImplication
Power‑to‑mass60 kg per NVIDIA H100 (≈2 PFLOPS)300 GPUs per 60 kg payload; 40 000 GPUs and ~80 MW per Starship launch by 2030
Deployment time2–3 months per nodeLaunch‑as‑service compresses build‑out from years to months

Economic Impact

Terrestrial AI facilities in Singapore cost about US $13.80 per watt of IT load. Jeff Bezos’ projections for orbital nodes—scaled through roughly 50 Starship flights per gigawatt—suggest a cost advantage of 20 × within the next two decades. Water‑related O&M savings from zero‑water cooling amount to roughly US $120 M per 40 MW plant over ten years (based on $0.07 /gal industrial water cost). With global data‑center electricity demand projected to rise from 415 TWh (2024) to 945 TWh (2030), orbital solar supply could offset at least 5 % of this increase if multi‑gigawatt capacity is realized.

Deployment Roadmap

  • Oct 27 2025 – Starcloud‑1 launch (60 kg, H100 GPU) – first end‑to‑end demo of zero‑water, solar‑powered AI compute.
  • Late 2026 – Neocloud Crusoe constellation rollout (US $1.4 B funding, US $10 B valuation) – moves from prototype to commercial service.
  • 2027‑2028 – Starlink 5‑GW orbital data‑center design freeze; Starship payload integration begins.
  • 2030 – 40 000 GPUs + 80 MW per Starship flight; 1 GW orbital capacity achievable with ~50 launches.
  • 10‑20 yr horizon – Gigawatt‑scale orbital facilities forecasted with 20 × cost advantage.

Remaining Hurdles

Launch cost front‑loading demands capital amortization through high launch cadence and reuse. Radiation hardening adds mass and design complexity; mitigation relies on shielding and error‑correcting architectures. Efficient radiative cooling still requires precision‑engineered emissive panels to prevent hotspot degradation under high‑density GPU loads.

Outlook to 2035

By 2030 orbital AI compute is expected to surpass 2 GW, supplying roughly 5 % of projected global AI compute power. Cost per compute‑watt could drop below US $8 /W, outpacing terrestrial reductions driven solely by semiconductor scaling. The geographic diversification of space‑based nodes reduces outage risk from terrestrial grid failures to under 1 % for critical AI workloads.

The convergence of high‑efficiency solar power, water‑free radiative cooling and mass‑optimized GPU payloads makes orbital AI data centers a technically sound and economically compelling alternative. While launch economics and radiation protection remain primary risk factors, the documented roadmap from Starcloud, Starlink and Neocloud demonstrates a credible path to multi‑gigawatt, cost‑effective, resilient AI compute infrastructure within the next decade.

DAOS‑Powered Storage Meets Liquid Cooling: A Pragmatic Path to Greener Exascale

The convergence that matters

Recent announcements from HPE, Cray, and Intel reveal a coordinated rollout of DAOS‑based K3000 storage alongside closed‑loop liquid‑cooling pump racks in the GX5000/EX4000 exascale platforms slated for ORNL’s Discovery and Lux clusters. This co‑design replaces traditional Lustre‑E2000 stacks, simplifies the I/O software layer, and integrates thermal management directly into the storage chassis.

Performance versus power density

Benchmark data show each fully populated rack achieving up to 75 million IOPS while supporting AI GPUs that draw 100 – 600 kW per rack. The Slingshot 400 interconnect delivers 400 Gbps, maintaining bandwidth parity as compute density rises. Despite a 10‑15 % increase in GPU power consumption, I/O throughput remains above 70 M IOPS per rack, confirming the thermal efficiency of the liquid‑cooling solution.

Space and water efficiency quantified

The EX4000 enclosure reduces rack footprint by 25 % relative to prior generations. Closed‑loop pump racks recirculate chilled water, cutting total water throughput per rack by roughly 35 % compared with conventional CRAC‑based cooling. This reduction derives from measured flow‑rate control data that limit water exchange while preserving component temperatures within optimal operating ranges.

Projected deployment and impact

By the close of 2027, three U.S. DOE exascale installations—Discovery, Lux, and a forthcoming facility—are expected to operate DAOS‑K3000 clusters with liquid‑cooling pump racks. Market analysis forecasts that vendors delivering this integrated solution will secure approximately 45 % of new exascale contracts issued between 2026 and 2028. Water‑usage monitoring on the Lux AI cluster already indicates a 30‑35 % reduction versus earlier Lustre‑based systems, corroborating the projected savings.

Long‑term trajectory

The historical continuity of liquid cooling, originating with IBM mainframes in the mid‑1980s, demonstrates a mature technology now essential for > 100 kW rack loads. As AI workloads expand, the DAOS‑liquid‑cooling paradigm offers a scalable, energy‑efficient architecture that preserves high‑throughput I/O without escalating cooling infrastructure. Continued adoption will likely standardize this integrated approach for future exascale and post‑exascale systems.

Hybrid‑Cloud HPC: Why Modularity and Renewables Define the Next Exascale Era

Modular Scaling Becomes the Norm

HPE, Cisco and emerging edge‑AI vendors are replacing monolithic supercomputers with plug‑in rack units that can be deployed in core data centers, satellite sites and on‑premise edge locations. Liquid‑cooled racks shrink physical footprints by up to 25 % while delivering 75 M IOPS per fully populated rack (EX4000). The same modular chassis supports direct DC‑solar feeds, enabling rapid installation without extensive site retrofits.

Renewable Power Integration Gains Traction

Liquid‑cooling reduces the power‑usage‑effectiveness (PUE) advantage of on‑site solar, cutting electricity per FLOP by roughly 20 % versus traditional air‑cooled systems. HPE’s Lux AI clusters and the upcoming Discovery exascale system are architected for mixed‑source power, targeting at least 40 % renewable intake by 2028. Cisco’s modular AI platform explicitly accommodates solar‑direct input, further lowering idle power by about 30 %.

Unified Storage Accelerates Throughput

The adoption of DAOS K3000 across both exascale and edge deployments streamlines the storage stack. Compared with Lustre, DAOS delivers ~30 % lower latency, raising effective rack throughput by a factor of 1.4. This consolidation eliminates parallel Lustre environments and simplifies data movement between core and edge nodes.

AI‑Centric Compute Density

AMD’s MI355X and upcoming MI430 GPUs, paired with EPYC CPUs and Pensando networking, boost AI performance per rack by over threefold while maintaining manageable thermal envelopes through liquid cooling. The higher density translates into a projected 30 % increase in compute per rack by 2032.

Edge‑Centric Energy Efficiency

ARM‑SCSP “Smarter at the Edge” studies show that relocating inference workloads to sub‑kilowatt edge nodes powered by solar or battery can slash AI inference energy consumption by up to 60 %. Early pilot deployments already feature solar‑only micro‑data centers delivering sub‑5 kW compute per node.

Projected Hybrid‑Cloud Fabric

  • ≥150 modular edge nodes (≤5 kW each) powered >50 % by renewables.
  • 400 Gbps Slingshot/SONiC fabric linking edge to core.
  • PUE < 1.05 for edge‑tier clusters.
  • AI inference traffic ≥60 % of total by 2032.
  • Operating costs ~15 % lower than traditional hyperscale facilities.

Strategic Takeaways

Vendors must expose DC‑level renewable interfaces on all modular chassis. Enterprises should tier workloads—placing latency‑sensitive inference at renewable‑powered edge nodes, while reserving large‑scale training for core modular racks with DAOS‑optimized storage. Policymakers can accelerate adoption by offering grid‑integration incentives that align with DOE energy‑efficiency targets.