NVIDIA HGX B300 Hits 144 GPUs/Rack with Liquid Cooling | Location Tracking for AI Chips, AMD ROCm in Ubuntu 26.04, TSMC 80% CoWoS to Blackwell Ultra, Risen-WEG 3GWh BESS Deal
NVIDIA ships liquid-cooled HGX B300 for 144 GPUs/rack hyperscale AI. New location verification combats chip smuggling to China; AMD integrates ROCm 5.7 in Ubuntu 26.04 LTS; TSMC allocates 80% CoWoS to NVIDIA; Risen Energy-WEG 3GWh UL9540A BESS MOU for HPC
TL;DR
- NVIDIA unveils liquid-cooled HGX B300 systems with 144 GPUs per rack for hyperscale AI data centers
- NVIDIA develops location verification software for Blackwell AI chips to prevent smuggling into China, enhancing data center security and compliance with U.S. export controls
- AMD and Canonical partner to natively integrate ROCm 5.7 into Ubuntu 26.04 LTS, enabling open-source GPU acceleration for HPC and AI workloads on Linux-based data centers
- AMD expands EPYC Embedded 2005 series with Zen 5 cores for low-power, high-density edge data centers
- TSMC allocates 80% of its CoWoS packaging capacity to NVIDIA for Blackwell Ultra GPUs, securing supply chain dominance for exascale AI and HPC systems through 2028
- Empyrion Digital secures approval for 200MW hyperscale data center in Malaysia with PUE <1.3
- Intel launches Panther Lake Core Ultra 300 series with integrated Xe3 GPU, targeting 2026 HPC and edge AI deployments with 45W TDP and 12 Xe3 cores for scalable compute
- Risen Energy and WEG sign 3GWh MOU for global energy storage systems, deploying UL9540A-certified BESS solutions to support renewable integration and sustainable HPC data center power
- U.S. EPA accelerates review of immersion cooling fluids to support AI-driven data center energy efficiency
NVIDIA's HGX B300 Systems Enable 144 GPUs Per Rack With Liquid Cooling for Hyperscale AI
What is the significance of NVIDIA’s HGX B300 systems?
NVIDIA has begun shipping liquid-cooled HGX B300 systems in 4U and 2U OCP form factors, supporting up to 144 GPUs per rack. These systems use Blackwell architecture GPUs and integrate with Supermicro’s Data Center Building Block (DCBB) platform for pre-validated power and cooling.
How does liquid cooling improve density?
Liquid cooling eliminates thermal constraints of air-cooled racks, enabling higher compute density. The HGX B300 achieves 144 GPUs per rack, surpassing previous air-cooled limits and reducing the footprint required for large-scale AI training.
What technical advantages does the design offer?
The system prioritizes performance-per-watt and accelerated time-to-market. Integration with Supermicro’s DCBB reduces deployment complexity, allowing hyperscalers to deploy high-density AI infrastructure with validated components.
How does this align with industry trends?
- Liquid cooling is becoming standard for hyperscale AI racks.
- GPU density continues to rise, driven by Blackwell architecture scalability.
- NVIDIA’s approach mirrors broader market shifts toward energy-efficient compute.
What are the implications for stakeholders?
- Data-center operators: Gain higher AI capacity without proportional space expansion and reduced cooling infrastructure costs.
- GPU OEMs: Face increased pressure to match NVIDIA’s integrated liquid-cooled solutions.
- Supply-chain managers: Must scale procurement of liquid-cooling components, including pumps and coolant loops, to meet demand.
What policy and energy factors support adoption?
The U.S. approval of H200 GPU exports to China (December 9) expands the market for Blackwell-based systems. Concurrent growth in renewable-powered data centers—such as 300 MW hydroelectric facilities in Paraguay—aligns with the B300’s efficiency focus, reinforcing its suitability for sustainable AI infrastructure.
What is the outlook?
The HGX B300 is positioned as a reference design for hyperscale AI deployments through 2026–2027. Continued policy support for AI chip exports and expansion of renewable energy infrastructure are expected to drive adoption.
NVIDIA's Location Verification Software Enhances AI Chip Compliance with U.S. Export Controls
How does NVIDIA's new software prevent AI chip smuggling into China?
NVIDIA has developed an optional location-verification software stack for its Blackwell-generation H200 AI GPUs. The system generates cryptographic proofs of a chip’s physical location by measuring GPU-to-GPU latency and server-chip handshake data. This telemetry-based attestation does not degrade inference performance and can be audited by U.S. licensing authorities.
What regulatory pressures prompted this development?
The U.S. government imposed a 25% revenue-share requirement on H200 exports to China, effective December 9, 2025. This policy followed federal investigations uncovering over $160 million in smuggled NVIDIA GPUs. The combination of financial incentives and national security concerns drove NVIDIA’s technical response.
Is the software mandatory for customers?
No. The location-verification module is offered as an optional add-on. Customers may choose not to enable it, but those who do not may face restrictions on future export licenses or revenue-sharing benefits.
How does China respond to this technology?
China’s cybersecurity authority has summoned NVIDIA to clarify concerns over potential backdoors and questioned the system’s reliability. This skepticism may lead to the development of a domestic verification framework, creating potential incompatibility with U.S.-aligned standards.
What is the rollout plan for other GPU generations?
NVIDIA is evaluating the extension of this attestation capability to Hopper and Ampere architectures. A phased approach ensures supply-chain visibility across legacy and current hardware without requiring immediate hardware redesigns.
What are the long-term implications?
- Compliance: Data centers can generate auditable logs to avoid penalties under the Export Administration Regulations.
- Security: Real-time location verification reduces the risk of high-performance chips being covertly redeployed for adversarial AI.
- Commercial: Uneven adoption may create tiered access to U.S. export benefits.
- Geopolitical: Divergent verification standards could emerge, fragmenting global AI hardware ecosystems.
What developments are expected in 2026?
- Q1 2026: NVIDIA may pilot mandatory verification for select Hopper GPUs under new export licenses.
- Q2 2026: The U.S. Commerce Department is expected to require attestation logs for all AI chips above the H200 tier.
- H2 2026: Chinese firms may announce a sovereign verification system incompatible with NVIDIA’s protocol.
- 2027: Industry-wide standardization of an attestation API by NVIDIA, AMD, and Intel is projected.
AMD and Canonical Integrate ROCm 5.7 into Ubuntu 26.04 LTS for Enterprise GPU Acceleration
How does native ROCm 5.7 integration impact Linux-based data centers?
AMD and Canonical have natively packaged ROCm 5.7 for Ubuntu 26.04 LTS, making GPU-accelerated computing available via apt install rocm. This eliminates manual driver installation and aligns with enterprise deployment practices.
What technical changes enable this integration?
- ROCm 5.7 components, including hip, rocblas, and rocsolver, are built from AMD’s upstream sources and signed by Canonical.
- Installation is standardized across Ubuntu Server, WSL2, and container images such as ollama-amd.
- Updates are delivered through Ubuntu’s regular
apt upgradecycle, with security patches and performance improvements included. - Ubuntu Pro extends support for ROCm packages to 15 years, matching hardware lifecycle expectations in HPC and AI environments.
What is the timeline for deployment and expansion?
- December 10, 2025: Partnership announced; ROCm 5.7 available in Ubuntu 26.04 LTS beta.
- April 2026: Ubuntu 26.04 LTS general availability with ROCm 5.7 as default.
- April 2028: Planned upstream submission of ROCm packages to Debian.
- April 2031: Full 15-year support window activated for ROCm-enabled LTS releases.
How does this affect the broader ecosystem?
- AMD gains direct access to Ubuntu’s enterprise user base, increasing ROCm adoption relative to NVIDIA’s CUDA ecosystem.
- Canonical differentiates Ubuntu as the only LTS distribution with native, long-term AMD GPU support.
- Third-party tools like Ollama-amd and llama-dodel now list
rocmas a dependency, indicating early ecosystem adoption. - Debian integration would extend ROCm availability to Pop!_OS, Raspberry Pi OS, and other downstream distributions.
What are the strategic implications for data centers?
- Reduced total cost of ownership through standardized, auditable package management.
- Simplified compliance and patch management for regulated environments.
- Predictable upgrade cycles aligned with Ubuntu’s two-year LTS cadence.
- Potential influence on procurement standards for hyperscale and sovereign cloud operators due to 15-year support.
What future developments are anticipated?
- Expansion of ROCm version updates to match Ubuntu’s LTS schedule.
- Integration with Kubernetes device plugins for orchestration support.
- Broader adoption in edge and hybrid cloud deployments via WSL2 and containerized workloads.
- Increased competition in open-source GPU acceleration, alongside Intel’s oneAPI and Apple’s Metal-based frameworks.
TSMC Allocates 80% of CoWoS Capacity to NVIDIA for Blackwell Ultra GPUs Through 2028
What does TSMC’s 80% CoWoS allocation to NVIDIA mean for AI and HPC supply chains?
TSMC has committed approximately 80% of its Chip-on-Wafer-on-Substrate (CoWoS) packaging capacity to NVIDIA for the Blackwell Ultra GPU family, securing a pipeline of 800,000–850,000 wafers for 2026. This allocation ensures dominant supply chain control for exascale AI and high-performance computing systems through 2028.
How is capacity distributed geographically?
- Taiwan: Eight CoWoS lines at the Chiayi AP7 facility handle the majority of current production.
- United States: Two dedicated CoWoS plants under construction in Arizona are scheduled to begin mass production in 2028, supporting U.S. semiconductor resilience goals.
What capacity remains for competitors?
Approximately 20% of CoWoS capacity is allocated to AMD, Broadcom, and other fabless firms. This forces rivals to pursue alternative packaging technologies such as InFO and EMIB, or to secure long-term volume agreements with TSMC.
How are capacity constraints being mitigated?
TSMC is outsourcing overflow CoWoS work to ASE Technology and SPIL to manage immediate bottlenecks. This reduces direct control over advanced nodes but maintains production continuity.
What is the production timeline?
| Year | Milestone |
|---|---|
| 2025 Q4 | TSMC announces 80% CoWoS allocation to NVIDIA |
| 2026 | Full wafer bookings; mass-production planning begins |
| 2027 | Arizona plants begin tooling; overflow outsourcing operational |
| 2028 | Blackwell Ultra mass production in Arizona; exascale systems deployed |
What are the strategic implications?
- NVIDIA gains a critical path advantage for exascale AI, with Blackwell Ultra delivering over 2× memory bandwidth versus H200.
- TSMC reduces geopolitical risk through dual-site production (Taiwan and U.S.).
- AI data center power demand is projected to reach 250 TWh annually by 2030, increasing reliance on high-bandwidth, energy-efficient packaging.
- Advanced packaging revenue is expected to grow at over 40% CAGR through 2030, reinforcing CoWoS as a strategic bottleneck.
How will competitors respond?
AMD, Broadcom, and Intel are accelerating adoption of alternative packaging technologies and may pursue U.S.-based fabrication partnerships to mitigate supply constraints. Long-term volume agreements with TSMC remain a viable, though limited, option.
Risen Energy and WEG Sign 3GWh MOU for UL9540A-Certified BESS to Support Renewable-Powered HPC Data Centers
What is the significance of the Risen Energy and WEG 3GWh MOU?
Risen Energy and WEG have signed a memorandum of understanding to deploy 3GWh of UL9540A-certified battery energy storage systems (BESS). These systems are designed to provide grid-stabilizing power for high-performance computing (HPC) data centers, enabling consistent operation under variable renewable energy inputs.
How does UL9540A certification impact deployment?
UL9540A certification validates thermal runaway containment and fire safety performance under extreme conditions. This certification is mandatory for BESS installations in U.S. commercial and critical infrastructure markets, reducing regulatory delays and insurance costs.
What role do HPC data centers play in this deployment?
HPC data centers require uninterrupted, high-density power. Renewable sources alone cannot guarantee this without storage. The BESS will buffer solar and wind fluctuations, allowing data centers to reduce grid dependence and meet sustainability targets without compromising uptime.
How does this align with broader automation trends in energy infrastructure?
The deployment mirrors advancements in industrial automation, such as Guozi Robotics’ 10GWh/year battery-pack plant in Texas, which achieved a 30% faster implementation cycle through integrated robotics and plug-and-play logistics. Similar modular, pre-certified systems are being adopted in energy storage to accelerate scalability.
What are the market implications?
- Supply chain resilience: Domestic BESS production reduces reliance on imported components.
- Regulatory alignment: UL9540A compliance ensures nationwide deployability without state-by-state re-certification.
- Energy transition: Enables HPC facilities to meet corporate ESG mandates without sacrificing computational capacity.
What challenges remain?
- Grid interconnection queues remain backlogged in key regions.
- Long-term maintenance protocols for large-scale BESS are still evolving.
- Capital costs for 3GWh-scale systems remain high, though declining at 5–7% annually.
What is the projected timeline?
- Q1 2026: Final engineering and site assessments
- Q3 2026: First 1GWh unit operational
- Q1 2027: Full 3GWh capacity online
The agreement represents a concrete step toward decarbonizing critical digital infrastructure through certified, scalable storage technology.
Comments ()