AI Infra Brief｜Big Partnerships & Faster Inference (Mar. 6, 2026)

March 6, 2026 — AI infrastructure sees multiple blockbuster partnerships, breakthrough inference performance and cost optimizations, and continued progress in sovereign AI and open source ecosystems.

🧭 Key Highlights

🤝 AMD and Meta sign $100B compute partnership

🚀 CoreWeave deploys GB200 clusters for Perplexity

💰 Akamai claims 86% lower inference costs

🔧 Together AI releases FlashAttention-4 and ThunderAgent

🌐 Red Hat and Telenor build sovereign AI factory in Norway

⚡ Elasticsearch search speed up 8x

Compute & Cloud Infrastructure

🤝 AMD and Meta sign $100B multi-year pact targeting 6 GW AI capacity

According to Techspective, AMD and Meta set a multi-year, $100B alignment targeting up to 6 GW of AI capacity, co-engineering MI450 GPUs and 6th Gen EPYC CPUs for Meta’s Helios racks — first 1 GW shipment expected H2 2026.

This marks a milestone partnership for AMD in the AI compute market, signaling a reshaping of the data center GPU landscape.

🚀 CoreWeave signs multi-year deal to power Perplexity on GB200 NVL72

According to Mlq, CoreWeave signs a multi-year deal to power Perplexity inference on NVIDIA GB200 NVL72 clusters via CoreWeave Kubernetes Service and W&B Models — among the first at-scale GB200 deployments.

The GB200 NVL72 is NVIDIA’s next-gen inference flagship, interconnecting 72 Blackwell GPUs via NVLink-C2C.

💰 Akamai deploys thousands of Blackwell GPUs, 86% lower inference costs

According to Datacenterknowledge, Akamai begins deploying thousands of NVIDIA Blackwell GPUs, DPUs, and servers across 4,000+ locations, claiming up to 2.5x lower latency and 86% lower inference costs vs. hyperscalers.

Edge CDN giant entering AI inference validates the economics of distributed inference.

⚡ NVIDIA Blackwell Ultra: 50x inference performance

According to X, NVIDIA highlights Blackwell Ultra with up to 50x inference performance and 35x lower cost — framed as enabling real-time agentic experiences.

Model Inference & Serving

🔧 Together AI: FlashAttention-4, ThunderAgent, and ATLAS-2

According to Together AI blog, FlashAttention-4 delivers 2.7x vs. Triton and 1.3x vs. cuDNN 9.13; open-sourced ThunderAgent achieves 1.5–3.6x throughput with 4.2x lower disk use; ATLAS-2 delivers 40% higher sustainable throughput.

💡 Distil Labs: Small distilled models enable 10x cost reduction

According to Distil Labs, small distilled specialists can achieve 10x cost reductions, e.g., Text2SQL from $24/M to $3.00/M requests.

🏭 Huawei unveils AI Data Platform integrating knowledge base, KV cache, and memory bank

According to Digitimes, Huawei unveils an AI Data Platform integrating knowledge base, KV cache, and memory bank; reports 95%+ retrieval accuracy and intelligent KV cache tiering for multi-agent inference.

📊 Vast Data introduces CUDA-accelerated AI data stack

According to Storagenewsletter, Vast Data introduces a CUDA-accelerated AI data stack with cuVS vector search, NVIDIA CMX, and BlueField-4 DPUs to speed shared KV cache access and TTFB for long-context, multi-agent serving.

Data Path & Edge Computing

🔍 Elasticsearch: 8x faster search, bfloat16 vectors

According to official reports, Elasticsearch details up to 8x faster search vs. OpenSearch, bfloat16 vectors in 9.3, adaptive early termination, and up to 12x faster vector indexing via NVIDIA cuVS.

🗄️ Oracle AI Database 26ai adds in-database vector search

According to Dbta, Oracle AI Database 26ai adds in-database AI Vector Search for secure, lower-ops semantic retrieval.

🚀 KX launches KDB.AI Server Edition

According to Hpcwire, KX launches KDB.AI Server Edition for high-performance RAG with multimodal support and LangChain compatibility.

National & Industrial AI

🇳🇴 Red Hat and Telenor debut sovereign AI Factory in Norway

According to Datacenterknowledge, Red Hat and Telenor debut a sovereign AI Factory in Norway on OpenShift AI with H100 DGX, supporting RAG and agentic workflows with LlamaStack.

🌍 VEON and MeetKai explore sovereign, locally deployed AI

According to Taiwannews, VEON and MeetKai sign an MoU to explore sovereign, locally deployed AI across VEON markets.

📱 Samsung and Vodafone validate Europe’s first AI-native vRAN call

According to Samsung News, Samsung and Vodafone validate Europe’s first AI-native vRAN call on Intel Xeon 6, using CognitiV NOS for AI-driven automation.

📡 ZTE’s AIR MAX highlights AI-native mobile networks

According to Lightreading, ZTE’s AIR MAX highlights AI-native mobile networks with energy and accuracy gains and multi-agent tooling (Co-Sight 2.0, Co-Claw).

Open Source Ecosystem

🍎 Qwen3.5-122B-A10B MoE runs locally on Apple M3 Ultra

According to X, Qwen3.5-122B-A10B MoE runs locally on Apple M3 Ultra via MLX with hybrid DeltaNet/linear attention enabling 1M context and smaller KV cache; community notes cost avoidance.

🔧 MistralAI SDK adoption grows

According to X, MistralAI SDK adoption grows on KV cache affinity improvements, GPT-5.4 support, and fixes; engagement signals focus on latency and stability.

🛠️ Multiple agent tools released

According to community reports, multiple open source projects released: Paperclip (agent orchestration), AI-DevOps-Orchestrator (MLOps), NebulaGraph (knowledge graphs), Jido 2.0 (Elixir agents), Multicorn Shield (agent governance), GLiNER2 (IE on CPU-first), OpenTitan shipping in production Chromebooks, and Tech 42’s Agent Starter Pack on AWS Marketplace.

🔄 vLLM and vLLM-Omni receive routine maintenance

According to GitHub GitHub, vLLM and vLLM-Omni receive routine maintenance; vLLM-Omni targets omni-modality model support and optimizations.

🔍 Infra Insights

Today’s core trends: Billion-dollar partnerships validate long-term demand, Inference optimization moves from algorithms to hardware-software co-design, Sovereign AI accelerates across Europe.

AMD and Meta’s $100B pact is the largest AI compute partnership since NVIDIA, with 6 GW targeting equivalent to millions of GPUs, validating cloud providers’ long-term commitment to AI capacity. Akamai’s 86% cost reduction and Together AI’s multiple optimizations point in the same direction: inference competition is shifting from model accuracy to performance per dollar.

Huawei and Vast Data’s AI data platforms, Elasticsearch’s 8x search acceleration, show hardware-software co-design permeating the data layer. Red Hat-Norway and Samsung-Vodafone sovereign AI projects indicate Europe is building an independent AI infrastructure ecosystem through local deployment and industry standards.