NVIDIA Licenses Groq’s LPU Tech, SoftBank Buys DigitalBridge for AI Compute, Microsoft Deploys AI Agents for ARM Migration, AMD Teams with OpenAI on MI300X

NVIDIA Licenses Groq’s LPU Tech, SoftBank Buys DigitalBridge for AI Compute, Microsoft Deploys AI Agents for ARM Migration, AMD Teams with OpenAI on MI300X
Photo from CDH

TL;DR

  • NVIDIA and Groq sign $20B licensing deal to accelerate AI data center infrastructure with Groq’s LPU architecture and NVIDIA’s GPU ecosystem
  • SoftBank acquires DigitalBridge for $4.04B to expand global AI data center footprint with 1GW+ liquid-cooled GPU capacity in Texas and Wisconsin
  • Intel and NVIDIA collaborate on NVLink interconnect integration and 18A process optimization to enhance CPU-GPU coordination for exascale AI workloads
  • Microsoft’s Project StrongARMed deploys AI agents to automate x64-to-ARM64 code migration across Azure and Windows E+D divisions, targeting Cobalt 100/200 adoption
  • AMD partners with OpenAI to deploy MI300X accelerators in next-gen AI data centers, enabling scalable HPC and LLM training with ROCm 6.0 software stack
  • Vantage Data Centers constructs 10 new liquid-cooled AI facilities with $25B investment, supporting Frontier-class supercomputers and H100 GPU clusters

NVIDIA-Groq $20B Licensing Deal: Reshaping AI Infrastructure Through Hybrid Innovation?

How Does the Licensing Model Balance Innovation and Independence?

The $20B deal (up-front fees + royalties) is a non-exclusive licensing agreement: NVIDIA gains rights to embed Groq’s LPU (Learning Processing Unit) IP into its GPU stack, while Groq retains independence, monetizes its silicon design, and keeps brand services like GroqCloud. A key detail: ~90% of Groq’s 200 engineers joined NVIDIA’s AI hardware teams, compressing LPU R&D timelines by 12–18 months. The FTC’s clearance—based on non-exclusivity and Groq’s independence—removes regulatory hurdles, setting a precedent for large-scale IP deals.

What Technical Edge Does the Hybrid GPU-LPU Architecture Bring?

The core innovation lies in reducing inference bottlenecks: NVLink connects LPU tiles to Hopper-class GPUs, cutting data movement by 45% and enabling sub-microsecond token latency for transformer models. The roadmap is clear: a Q1 2026 reference design demo at GPU Tech Summit, H2 2026 Canadian government pilot deployments (testing sovereign cloud compliance), and an early 2027 CUDA-LPU SDK GA—integrated with NVIDIA’s Triton Inference Server—to expand ecosystem adoption.

Why Are Geographic Pilots Critical for Sovereign Cloud Adoption?

Pilots in Canada (Toronto, Kitchener-Waterloo) and Saudi Arabia (via a $1.5B data center expansion) align with global sovereign cloud trends. These regions prioritize data residency, making the hybrid architecture—with shared HBM2e pools and unified memory—attractive for government-backed AI infrastructure. By 2028, NVIDIA projects 10% of AI infrastructure spend could shift to such nodes, driven by policy incentives and latency demands.

How Does This Deal Redefine Competitive and M&A Landscapes?

The deal marks a shift from full acquisitions to “licensing-first” M&A: the $20B valuation dwarfs past IP deals (e.g., Mellanox’s $6.9B acquisition), setting a new ceiling for pure-IP transactions. Competitors like AMD (CDNA-LPU roadmap) and Intel (Xe-HPC stack) face pressure, but NVIDIA’s edge depends on locking in ecosystem partners (cloud providers, OEMs) in the next 12–18 months. The model could inspire others—think Intel or AMD—seeking rapid capability gains without full integration costs.

What Risks and Outlook Should Stakeholders Watch?

The baseline scenario: hybrid nodes capture ~10% of AI infrastructure spend by 2028, with NVIDIA-Groq ARR hitting $3.5B. Accelerated adoption could lift market share to 15% if cloud giants (AWS, Azure) adopt the CUDA-LPU API. Risks include supply chain shocks (e.g., TSMC fab delays) or export controls on NVLink, which could push integration timelines to 2029. Over-reliance on Groq’s talent influx also demands phased onboarding to avoid bottlenecks.

Overall, the deal positions NVIDIA as a leader in low-latency inference while letting Groq thrive—proof that strategic licensing, not just acquisition, can drive AI infrastructure evolution.


SoftBank’s $4.04B DigitalBridge Buy: Expanding AI Data Center Footprint with 1GW+ GPUs in Texas/Wisconsin

What Does the $4.04B Acquisition Entail?

SoftBank’s $4.04B cash deal for DigitalBridge (a digital-infrastructure manager with $108B AUM) adds immediate and planned AI compute capacity: 1GW+ liquid-cooled GPU capacity across 10 Texas sites (via Vantage Data Centers) and a joint Wisconsin "Starlink"-type campus with OpenAI. A $25B+ construction budget targets 10 Texas data centers, with total U.S. compute footprint projected to reach 7GW by 2028 (including New Mexico and Ohio sites). The deal includes a 16% premium ($16/share) and builds on SoftBank’s $22.5B 2025 OpenAI investment, enabling shared GPU farm usage.

Why the U.S. Focus for AI Compute?

The U.S. dominates AI-infrastructure spend (>80% of 2025 global AI-training expenditure), attracting SoftBank with abundant low-cost renewable power, mature fiber networks, and favorable regulation. This shifts SoftBank from equity-centric tech bets to a "capacity-owner" model, capturing recurring, contract-backed revenues from hyperscalers (e.g., Microsoft) and AI model developers. The acquisition also adds inflation-linked, low-volatility cash streams aligned with institutional investor demand for yield.

How Will Finances Be Impacted?

The $4.04B cash payment reduces SoftBank’s liquid reserves by ~5% (from ~$80B). Expected EBITDA uplift: ~$1.2B/yr (12% margin on compute assets), raising net profit margin by ~0.6 percentage points. Liquid-cooled design cuts energy costs by 20% (PUE=1.2 vs. industry average 1.5 for air-cooled), boosting operating margins. Post-deal debt-to-EBITDA ratio (2.8x) stays within SoftBank’s 3.5x target, preserving rating flexibility.

What’s the Expansion Timeline?

  • Deal signed: Dec 29, 2025 (completed); closing: Q1 2026 (pending U.S. antitrust and Japanese FSA approval).
  • Texas sites: 70% complete (Dec 2025), operational Q3 2026.
  • Wisconsin campus: Groundbreak post-closing (Q1 2026).
  • New Mexico/Ohio: Breakground mid-2026; full 7GW capacity: late 2028.

What’s Next for SoftBank’s AI Strategy?

By 2027, SoftBank aims for 5GW of U.S. liquid-cooled GPU capacity (serving ~15% of projected U.S. AI-training demand). Compute-service revenue could grow from <5% (2025) to ~12% of total revenue by 2029. Planned moves include: two Midwest edge-site acquisitions (0.5GW each), redeploying cash into green-energy PPAs (wind/solar) for ESG goals, and a joint venture with a Gulf sovereign fund to supply offshore GPU capacity for Middle East AI workloads.


Microsoft’s StrongARMed AI Agents: Automating ARM64 Migration for Cobalt Chips

Microsoft’s Project StrongARMed marks a strategic pivot: deploying AI agents to automate x64-to-ARM64 code migration across Azure and Windows E+D divisions, with eyes on scaling Cobalt 100/200 adoption. Launched in a coordinated December 2025 push—including $45M funding, a 25-engineer team, and public architecture disclosure—the project leverages industry trends (95% of developers use AI weekly) to turn AI from demo to production tool.

Why Is Microsoft Accelerating ARM64 Migration With AI Agents?

The move ties directly to Cobalt silicon’s potential: Cobalt 100 already powers 12% of Azure VMs, while Cobalt 200 (slated for Q2 2026) promises 1.6x compute per-watt and 1.4x price-performance over x86. Microsoft aims to double ARM-only workload share by Q2 2026 and cut migration costs by ≥30%—critical for delivering TCO advantages to Azure customers.

How Do StrongARMed’s Tools Deliver Measurable Engineering Savings?

At its core, StrongARMed combines speed with rigor. A program-analysis engine scans 10M+ lines of code daily, flagging non-portable files and slashing manual inspection effort by 85%. AI agents Chronicle (refactoring) and Bandish (code synthesis), fine-tuned on 200B tokens of Microsoft code, generate PRs in ~4 seconds per function—cutting migration engineering time by 30%. A human-in-the-loop verification loop drops false positives from 12% to <2%, keeping post-migration production incidents below 0.5% (below Azure’s SLA benchmarks). Even with an 18% CI runtime increase, manual review hours fall by 30%, ensuring net savings.

What Risks Does Microsoft Need to Mitigate?

Three key challenges loom. LLM hallucination in patches is addressed via dual-stage equivalence testing, fuzzing, and mandatory human sign-off for performance-critical modules. ABI drift between Cobalt 100/200 is managed with silicon-specific agent profiles and continuous benchmark feedback. Supply-chain vulnerabilities—targeting 49% of AI tool users—are mitigated through signed binaries, reproducible builds, and Azure Key Vault-protected model weights.

What’s Next for StrongARMed and Enterprise AI?

The roadmap is aggressive: Q1 2026 will migrate 40% of remaining Azure services (AKS, Cosmos DB) to Cobalt 200, aiming for 55% ARM share and 12% more TCO reduction. Q2 2026 will open-source the StrongARMed SDK, inviting external ISVs to adopt ARM. By Q4 2026, a unified agent platform across Azure, Windows 12, and Copilot Studio could deliver 2.5x ROI—cementing Microsoft’s lead in enterprise agentic AI. For now, the project is a concrete proof point: AI isn’t just transforming code—it’s transforming how cloud giants scale efficient, secure infrastructure.


AMD-OpenAI MI300X Partnership: Can It Challenge Nvidia’s AI Data Center Lead?

AMD and OpenAI’s collaboration hinges on a rare "single-vendor" solution: MI300X accelerators paired with ROCm 6.0 software. At CES 2026, AMD’s Lisa Su framed this as a direct response to the industry’s need for seamless scalability in next-gen AI data centers. The stack—combining integrated GPU-CPU architecture, 64GB HBM2E (with 3.2TB/s bandwidth), and ROCm 6.0’s unified memory—targets the critical compute-bandwidth bottleneck for training 200B+ parameter LLMs. For hyperscalers, this eliminates the fragmentation of mixing hardware from multiple vendors, a longstanding barrier to efficient scaling.

How Does MI300X Measure Up to Nvidia in LLM Training?

Technical specs support AMD’s competitiveness. The MI300X delivers 512 TFLOPs of FP16 performance—matching Nvidia’s H100—and promises a 30-40% reduction in wall-time for training 175B-parameter models (per AMD’s Q2 2026 benchmarks). Power efficiency is another edge: 350W peak power (vs. ~400W for H100) aligns with ESG goals and data center PUE targets, which are increasingly mandatory for large-scale deployments. However, Nvidia’s HBM4-based H200/H300 (slated for 2027) could narrow this gap if AMD fails to secure HBM4 supply chains promptly.

What Risks Threaten AMD’s AI Growth?

Three hurdles stand out. First, MI300X’s HBM2E is a "mid-term bridge" (per JEDEC data), requiring AMD to adopt HBM4 by 2027 to avoid bandwidth limitations. Second, competition is heating up: Nvidia’s $20B Groq acquisition boosts inference performance, while Intel’s Xeon 18A push targets the same data center contracts. Third, while AMD’s 10% equity offer to OpenAI (tied to ≥6GW deployment by 2030) aligns financial incentives, success depends on OpenAI meeting its 2030 target—especially as Nvidia retains its ecosystem advantage of tools and partner loyalty.

Can Equity Incentives Secure OpenAI’s Long-Term Commitment?

The equity clause is a unique differentiator in a capital-heavy AI hardware market. For OpenAI, scaling AMD GPUs reduces compute costs (a top expense for training 500B+ parameter models), while AMD gains a guaranteed revenue stream. The first production clusters (Q2 2026, ≥4GW capacity for OpenAI’s "Stargate" sites) are a make-or-break test. If they deliver on the 30-40% wall-time reduction, the equity conversion (triggered by 6GW deployment) could make AMD OpenAI’s primary silicon supplier—undermining Nvidia’s current dominance.


Vantage’s $25B Liquid-Cooled AI Campus: 10 Facilities for Frontier Supercomputers & H100 Clusters

Vantage Data Centers is investing $25 billion to build 10 liquid-cooled data centers in Texas—engineered for NVIDIA H100 GPU clusters and Frontier-class exascale supercomputers—targeting 1 GW of compute power to become one of the world’s largest AI-ready sites. The project leverages strategic financing and partnerships to address key industry challenges, from power efficiency to capital risk.

What Is Vantage Building, and Where?

The Texas campus focuses on 10 liquid-cooled facilities near existing Frontier-class supercomputer zones, with:

  • CAPEX: Over $25 billion (land, power-grid upgrades, cooling plants).
  • Compute: 1 GW total, with racks ≥200 kW—optimized for NVIDIA H100 (and future Blackwell) GPUs.
  • Cooling: Direct-to-die liquid-cooling (PUE ≤1.15, >90% GPU utilization).
  • Timeline: Groundbreaking Q2 2026, first rack Q3 2026, full campus Q4 2027.
  • Tenants: OpenAI, Microsoft, Oracle, AtlasEdge (edge provider), and sovereign-cloud partners (e.g., Kuwait Azure).

How Is Vantage Financing This $25B Project?

Strategic deals reduce capital risk:

  • SoftBank/DigitalBridge: SoftBank’s $4.04B acquisition of DigitalBridge (a $108B AUM manager) creates a financing conduit, with DigitalBridge allocating multi-billion-dollar tranches to Vantage.
  • Impact: Buffers against equity volatility and enables favorable long-term debt, avoiding heavy public debt reliance.

Why Liquid Cooling Is Critical for AI Compute?

Liquid cooling solves GPU-dense workload challenges:

  • Efficiency: Supports >200 kW racks with PUE <1.2, cutting cost-per-TFLOP vs. air-cooled rivals.
  • Scalability: Aligns with exascale benchmarks, ranking the campus in the global top 10 for AI compute.
  • Future-Proofing: Explicit H100 compatibility covers LLM training, diffusion models, and future NVIDIA GPUs.

What’s Vantage’s Strategic Edge?

The project differentiates in a crowded market:

  • Premium Rates: Liquid-cooled racks command $12–$15k/month—higher than air-cooled competitors.
  • Ecosystem Stack: OpenAI (Starlink campus) and AtlasEdge partnerships create edge-core AI integration, shortening enterprise time-to-value.
  • Regional Stimulus: 1 GW demand drives Texas utility upgrades and green-energy PPAs, aligning with ESG goals.

Can Texas Grid Handle 1GW of AI Power?

Energy planning addresses the 1 GW load (≈8 million homes):

  • Grid Impact: ERCOT forecasts 3–5% peak demand growth by 2026–2027; utilities plan 500 MW solar + storage.
  • Alternatives: DOE studies on repurposed naval reactors (≈450 MW each) could provide low-carbon baseload.

What Risks Could Delay the Project?

Key challenges and mitigations:

  • Grid Constraints (Medium): Utility PPAs for renewables/storage + micro-reactor exploration.
  • Regulatory Pushback (Low-Medium): Community outreach, environmental assessments, noise abatement.
  • GPU Shortages (Medium): Long-term NVIDIA agreements + AMD MI250X as a backup.
  • Cooling Failures (Low): Redundant loops and real-time thermal analytics.

What’s Next for the AI Campus?

The roadmap targets steady growth:

  • 2026–2027: ≥70% occupancy of first four facilities (hyperscalers/sovereign clouds).
  • 2028: 1 GW capacity supporting ≥150 PFLOP training—enough for three concurrent exascale projects.
  • 2029–2030: Edge-core ecosystem (AtlasEdge + OpenAI) could launch “AI-as-low-latency-service” (ALaaS), generating $2–3B in annual revenue.