128 GT/s Copper Hits 3.4" Wall: Rogers Laminate Tax Looms Over AI Boards
TL;DR
- PrismML's Bonsai 8B model achieves 6x throughput gains on RTX 4090 with just 1.15GB memory footprint
- PCIe 7.0 introduces PAM6.0 encoding and 128b/130b FLIT, enabling 128 GT/s data rates with 3.4-inch copper trace limits
- Applied Digital reports 139% YoY revenue growth to $2.1B, driven by HPC hosting expansion and two new 150MW data center facilities under development
⚡️ 1-Bit 8B-Parameter LLM Fits iPhone, Runs 6× Faster Than FP16 Models
8.2-billion brain in your pocket: 1.15 GB model runs 6× faster than today’s giants ⚡️ Same IQ/GB as 10 normal LLMs. Your next iPhone just got a 65k-token memory. Ready to ditch the cloud?
PrismML’s Bonsai 8B, released last week, crams an 8-billion-parameter brain into a 1.15 GB file—smaller than a Netflix episode—and still scores 65.7 on the graduate-level MMLU-Pro benchmark. On an off-the-shelf RTX 4090 it spits out 368 tokens every second, six times faster than the same model in 16-bit form while drawing only one-fifth the energy. The entire model plus its working memory fits inside 6 GB of VRAM, leaving room for a browser, Slack, and a game of Rocket League.
How does this work?
Custom CUDA and Metal kernels decompress each 1-bit weight on the fly inside the GPU core, eliminating the usual “unpack-then-compute” memory shuffle. A companion Turbo1Bit compressor shrinks the key-value cache by 2.4×, so a MacBook Air with 8 GB of RAM can juggle 65,000-token documents—roughly this newspaper’s Sunday edition—without breaking a sweat.
Impacts
Speed: 368 tokens/s → real-time code completion and live voice-to-text that keeps pace with human speech.
Energy: 4–5× lower wattage per token → a laptop battery lasts an extra two hours of continuous AI use.
Privacy: 1.15 GB fits on a phone → no cloud required, so medical notes and legal docs stay on device.
Cost: inference on a $1,599 gaming GPU rivals what $8/hour cloud instances delivered last year.
Response & gaps
Meta’s Llama 3.1 8B still edges Bonsai on raw accuracy (71.0 vs 70.3 average), and Qwen 3 tops both at 79.3. Yet Bonsai’s 10× higher “IQ per gigabyte” has already sparked ports to llama.cpp, MLX, and vLLM. The main hitch: only Nvidia and Apple chips have tuned kernels today; AMD and Intel users wait.
Outlook
- Summer 2026: fine-tuned Bonsai variants for medicine and law ship at ≤ 6 GB, cutting specialty-software licensing costs 60 %.
- Q1 2027: 16 B and 32 B “Bonsai-X” models appear, staying under 12 GB with hybrid 1–2-bit layers.
- 2028: first Android phones ship with 1-bit AI co-processors, enabling offline multilingual assistants on $400 handsets.
Bottom line: Bonsai 8B proves the fastest way to make AI bigger is to make it smaller. If the trend holds, tomorrow’s flagship models will arrive not on warehouse-scale GPUs but in your pocket, sipping watts while thinking in ones and zeros.
⚡ PCIe 7.0 Demands 3.4-Inch Copper Limit or Signal Collapse at 128 GT/s
128 GT/s on copper? Only if your traces are shorter than a 🥤 straw—3.4"—or the signal dies at 32 dB. AI rigs love the 12.8 TB/s rush, but retimers burn 250 mW and add 70 ns. Will your next mobo pay for Rogers-grade laminate?
PCIe 7.0, now entering its final draft, trades the old two-level bit for a six-level pulse. The payoff is a x16 slot that can move 12.8 TB every second—enough to fill two 4K Blu-rays in the blink of an eye. The catch: the copper linking chip to slot must stay under 3.4 inches and lose no more than 32 dB, a margin tighter than the tolerance on a Swiss watch spring.
How the pulse packs more bits
PAM6 squeezes 2.58 bits into each voltage step, up from PAM4’s 2 bits. A 128b/130b “FLIT” wrapper adds only 1.5 % overhead, one-tenth of the 20 % tax levied by PCIe 3’s 8b/10b code. Together they push the line rate to 128 GT/s while cutting swing voltage 23 %, but every millimeter of trace now behaves like a tiny antenna bleeding signal.
Impacts: what shrinks, what swells
- Materials: standard FR-4 blows the 32 dB budget at 2 inches → boards must shift to low-loss laminates (tan δ ≤ 0.0015), raising raw PCB cost ~30 %.
- Retimers: next-gen chips add 40–70 ns and 230 mW per lane → latency-sensitive NICs and GPUs may prefer shorter, retimer-free topologies.
- Connectors: each 0.1 mm longer stub adds 0.4 dB loss at 30 GHz → high-density pins need gold-plated, shortened contacts to stay below 0.5 dB per pair.
- Design reach: 3.4-inch ceiling forces CPU-to-slot placement within a credit-card span → motherboard layouts will shrink or move to mid-board risers.
Industry response: laminate rush, connector race
Major fabs (TTM, Ibiden) are sampling 0.0012-loss substrates; Molex and Amphenol have 6.0/7.0 connector families claiming ≤0.3 pF parasitics but no field data beyond 50,000 cycles. Meanwhile, Synopsys’ early 256 GT/s demo signals that PAM6 silicon is viable, yet no retimer supplier has taped out a production-grade 128 GT/s repeater.
Timelines: when the pulse reaches the rack
- Q4 2026: spec frozen; first server back-planes show 90 % link-up on 3.5-inch traces.
- 2027–2028: enterprise SSDs and AI accelerators adopt on-package PHY, skipping retimers; desktop boards remain PCIe 6.0.
- 2029: ≤200 mW retimers and $/sq-in laminate prices fall 40 % → mainstream motherboards extend traces to 5 inches.
- 2030: CPUs integrate PAM6 equalization; total slot-to-GPU latency drops 15 ns versus PCIe 6, setting the table for PCIe 8 PAM8.
Bottom line
PCIe 7.0’s 128 GT/s leap will not ride today’s cheap fiberglass. The 3.4-inch copper rule rewrites board geography, favors laminate suppliers, and rewards vendors who can place silicon closer to the slot. If the laminate supply scales and retimer wattage keeps falling, the consumer PC of 2029 will treat terabytes like today’s USB treats megabytes—fast, invisible, and taken for granted.
🤯 139 % Revenue Shock: North Dakota Data-Factory Race
$2.1 B revenue, 139 % jump—Applied Digital just 10×’d a North Dakota wheat field into a 600 MW AI factory 🤯. That’s already 1⁄6 of Delta Forge 1 churning cash while 1.5 GW more waits in the wings. Who’s next to plug into the prairie?
Applied Digital’s Q3 2026 ledger shows $2.1 billion revenue, up 139 % from a year ago, after flipping its crypto rigs for 600 MW of AI-grade compute at Polaris Forge 1. Two more 150 MW halls—Delta Forge 1 and a sister site—are already steel-up in North Dakota and billing their first 100 MW as lease revenue. Roughly 1.5 GW of hyperscaler contracts (≈ $16 B over 15 yr) sit in the pipeline, enough to power 450 000 homes if switched to the grid.
Where the growth lands
- Cash: $2 B war-chest → funds staged construction without fresh equity dilution.
- Debt: ~$5 B senior notes → annual service ≈ $275 M, pressing coverage to 1.1× today but projected 1.3× by Q2 2027.
- Margin: gross profit up 126 %, yet EPS still –$0.11 as depreciation outruns rent; forward P/E >500×.
What could still trip the build-out
- Construction: mid-2027 commissioning leaves only 18 months to lock contractors and suppliers already booked by AWS, Google.
- Finance: 2031 note maturity coincides with expected rate trough—refinance early or pay ≈ 150 bps premium later.
- Competition: hyperscalers eyeing the same $0.05 kWh North Dakota wind; land values near Williston up 22 % since 2024.
Outlook: revenue vs. risk
- Q4 2026–Q2 2027: Delta Forge 1 50 % online → lifts annual run-rate past $2.4 B, cuts grid-import volatility for regional utilities by 8 %.
- 2028: 1.5 GW contracted → $1.5–$2 B steady revenue, debt-to-EBITDA <2.5× if notes refinanced.
- 2030: full 900 MW greenfield → potential $3 B top line, market cap >$12 B, barring construction slip or power-price shock.
Applied Digital proves a pivot can outrun a cycle: in 18 months it morphed from bitcoin exposure to an AI landlord whose quarterly intake rivals the entire 2020 state budget of North Dakota. For investors, the question is no longer “can demand fill the halls?” but “can concrete, copper and capital markets keep pace?”
In Other News
- Taiwan hosts 4th International Geothermal Conference, showcasing EGS and SGS technologies to unlock 700+ experts and $10B+ potential energy capacity
- AWS launches S3 Files, turning S3 buckets into mountable NFS filesystems with 1ms latency and 25,000 concurrent connections
- New quantum networking breakthrough uses van der Waals crystals and graphene resonators to resolve frequency mismatch between quantum computers
Comments ()