AI Infra Brief | Throughput Gains and Mega-Rounds Reshape AI Infrastructure (2026.02.13)

February 13, 2026 marks a dual wave of throughput breakthroughs and mega-rounds in AI infrastructure. From 8x reasoning cost reduction to $30B funding, from specialized inference architectures to fully autonomous operations, the industry is comprehensively elevating AI capacity and performance through technological innovation and capital injection.

🧭 Key Highlights

⚡ Nvidia introduces dynamic memory sparsification, 8x reasoning cost reduction, 5x throughput lift

🔄 Together AI unveils CPD architecture, 35-40% throughput gain for long-context apps

🚀 OpenAI releases GPT-5.3-Codex-Spark, real-time coding over 1000 tok/s

💰 Anthropic raises $30B Series G at $380B valuation (largest ever)

💰 Nscale secures $1.4B debt facility, deploying ~200K NVIDIA GB300 GPUs

🌐 Cisco unveils Silicon One G300 (102.4 Tbps) AI-native networking

🛒 AuraSell launches AI-native GTM OS, unifying marketing and sales workflows

🤖 Monaco emerges with $35M Series A for AI-native sales platform

Infrastructure Breakthroughs

⚡ Nvidia Introduces Dynamic Memory Sparsification, 8x Reasoning Cost Reduction, 5x Throughput Lift

According to VentureBeat, Nvidia introduced Dynamic Memory Sparsification, compressing KV cache to cut reasoning memory costs by 8x and lift single-server throughput by 5x on Qwen3-8B while matching vanilla accuracy. Released via the KVPress library with Hugging Face and FlashAttention compatibility.

Inference cost optimization is critical for AI deployment. Nvidia’s breakthrough provides a new path for long-context reasoning.

🔄 Together AI Unveils CPD Architecture, 35-40% Throughput Gain for Long-Context Apps

According to MEXC, Together AI unveiled Cache-Aware Prefill-Decode Disaggregation (CPD), splitting inference into specialized node types with a three-level KV-cache hierarchy, delivering 35-40% higher throughput for long-context applications on NVIDIA B200 GPUs.

Inference architecture specialization is key for performance gains. CPD achieves breakthroughs through hardware-software co-optimization.

🚀 OpenAI Releases GPT-5.3-Codex-Spark, Real-Time Coding Over 1000 tok/s

According to OpenAI official blog, OpenAI released GPT-5.3-Codex-Spark, a real-time coding model exceeding 1,000 tokens/second on Cerebras WSE-3, with an 80% roundtrip overhead reduction via persistent WebSockets.

Real-time coding is crucial for AI-assisted programming. Specialized hardware and model collaboration achieve throughput breakthroughs.

Funding and Partnerships

💰 Anthropic Raises $30B Series G at $380B Valuation (Largest Ever)

According to Anthropic official announcement, Anthropic raised $30B Series G at a $380B post-money valuation to expand infrastructure and frontier research, diversifying compute across AWS Trainium, Google TPUs, and NVIDIA GPUs.

This is one of the largest AI infrastructure funding rounds to date, reflecting capital market confidence in AI’s long-term growth.

💰 Nscale Secures $1.4B Debt Facility, Deploying ~200K NVIDIA GB300 GPUs

According to Nscale press release, Nscale signed a $1.4B delayed draw term loan (backed by GPUs) to deploy approximately 200,000 NVIDIA GB300 GPUs across hubs in Norway, Portugal, Iceland, and the UK.

GPU-backed debt financing is an emerging model in AI infrastructure. Nscale is building large-scale AI compute clusters in Europe.

🌐 Cisco Highlights AI-Native Infrastructure: Silicon One G300 (102.4 Tbps)

According to Markets Chronicle, Cisco highlighted AI-native infrastructure moves: Silicon One G300 (102.4 Tbps), 800G/1.6T liquid-cooled switching, and AgenticOps leveraging Splunk.

Networking is a critical bottleneck for AI data centers. Cisco’s AI-native networking lineup targets AI cluster connectivity needs.

📧 Sinch Partners with Lovable for Scalable Communications in AI-Native Apps

According to The Fast Mode, Sinch partnered with Lovable to embed Mailgun-scale communications into Lovable Cloud for AI-native applications.

Communication infrastructure is crucial for AI application deployment. The Sinch-Lovable partnership lowers communication integration barriers for AI apps.

AI-Native Platforms

🛒 AuraSell Launches AI-Native GTM OS Atop Salesforce/HubSpot

According to SiliconAngle, AuraSell launched an AI-native GTM (Go-To-Market) operating system built on Salesforce/HubSpot to unify sales, marketing, and success workflows.

AI-native platforms are elevating from tools to operating systems. AuraSell’s GTM OS targets AI-native business processes.

🤖 Monaco Emerges with $35M Series A for AI-Native Sales Platform

According to The AI Insider, Monaco emerged from stealth with $35M Series A to build an end-to-end AI-native sales platform for startups.

AI-native sales platforms are an important B2B AI application scenario. Monaco focuses on sales process automation for startups.

📊 Matia Raises $21M Series A for Unified Data Infrastructure and “AI Data Engineer”

According to The AI Insider, Matia closed $21M Series A for a unified data infrastructure platform and an “AI data engineer” product.

Data infrastructure is a critical bottleneck for AI deployment. Matia lowers data engineering barriers through AI-native automation.

🔒 SEALSQ Outlines Quantum-Resilient Physical AI Infrastructure

According to QuiverQuant, SEALSQ outlined quantum-resilient physical AI infrastructure spanning PQC MCUs, HSMs, and SEALCOIN.AI.

Quantum security is a frontier issue for AI infrastructure. SEALSQ targets AI infrastructure security in the post-quantum cryptography era.

Open Source

🔀 ClawRoute — Local Proxy Routing Simple vs Complex LLM Tasks; Claims 60-90% Cost Reduction. MIT

According to Reddit discussion, ClawRoute is a local proxy that routes to different models based on task complexity, claiming 60-90% cost reduction. MIT licensed.

Model routing is key for reducing inference costs. ClawRoute achieves intelligent routing via local proxy.

🥭 Mango Lollipop — CLI Lifecycle Messaging Generator Using Claude Code; AARRR Matrix. MIT

According to GitHub repository, Mango Lollipop is a CLI lifecycle messaging generator built using Claude Code, based on AARRR (Acquisition, Activation, Retention, Referral, Revenue) matrix. MIT licensed.

AI-native development tools are emerging. Mango Lollipop demonstrates Claude Code’s application in CLI tool generation.

🧠 ISSA-Repository — Framework for Persistent AI Identity with Episodic Memory and Self-Correction Loop. MIT

According to GitHub repository, ISSA-Repository is a framework for persistent AI identity with episodic memory and self-correction loop. MIT licensed.

Persistent identity is key for AI agents. ISSA-Repository achieves AI identity continuity through episodic memory and self-correction.

🗳️ Polis — Civic Deliberation with AI-Assisted Dialogue to Surface Consensus and Minority Views. Open

According to Polis website, Polis is a civic deliberation platform with AI-assisted dialogue to surface consensus and minority views. Open licensed.

AI in democratic deliberation is emerging. Polis facilitates public discussion through AI technology.

🔍 Alibaba Zvec — Embedded Vector DB for On-Device RAG; SQLite-Like Simplicity. Open

According to GitHub repository, Alibaba Zvec is an embedded vector database for on-device RAG, offering SQLite-like simplicity. Open licensed.

On-device AI is key for reducing deployment costs. Zvec supports on-device RAG through embedded vector databases.

📈 Linear RNNs Library — PyTorch Linear RNNs with Accelerated Kernels; Accepted to EACL SRW 2026. Open

According to Reddit discussion, Linear RNNs Library is a PyTorch linear RNN library with accelerated kernels; paper accepted to EACL SRW 2026. Open licensed.

Linear RNNs are frontier in sequence modeling. This library improves practicality through accelerated kernels.

🖼️ Z-Image-ncnn-vulkan — Z-Image Inference via ncnn + Vulkan on Consumer Hardware. Apache-2.0

According to GitHub repository, Z-Image-ncnn-vulkan achieves Z-Image inference on consumer hardware via ncnn + Vulkan. Apache-2.0 licensed.

AI inference on consumer hardware is important for open-source community. This project achieves cross-platform acceleration through Vulkan.

📊 Langfuse — Open-Source LLM Observability with Tracing, Cost Monitoring, OpenTelemetry-Native. Open

According to Confident AI report, Langfuse is open-source LLM observability with tracing, cost monitoring, and OpenTelemetry-native integration. Open licensed.

LLM observability is crucial for production deployment. Langfuse achieves standardized observability through OpenTelemetry integration.

Hot Threads

🔓 OpenClaw Security Scan: 15% of Community Skills with Malicious Instructions; “Delegated Compromise” Risk

According to Reddit discussion, OpenClaw security scan of 18,000 instances found 15% of community skills with malicious instructions; “Delegated Compromise” risk identified.

AI agent security is a frontier issue. The OpenClaw scan reveals security risks in the community ecosystem.

📊 Measuring AI Agent ROI: Six Maturity Signals; Proposal for an “Agent Bus” Coordination Layer

According to HPCwire, article discusses AI agent ROI measurement, proposing six maturity signals and suggesting an “Agent Bus” coordination layer.

AI agent ROI is core concern for enterprises. Maturity frameworks and coordination layer architectures are key to agent engineering.

🛡️ SEO/GEO/AEO Poisoning Defenses for RAG and Agents; Provenance Emphasized

According to Penligent report, article discusses SEO/GEO/AEO poisoning attack defenses for RAG and agent systems, emphasizing data provenance importance.

Search poisoning is an emerging threat for AI systems. Defenses need to cover full pipeline from data sources through reasoning.

💢 Claude Code v2.1.20 UI Change Backlash Over File Path Transparency

According to Dev.to report, Claude Code v2.1.20 UI change sparked community backlash; controversy centers on file path transparency.

Developer tool UI design impacts user experience. The Claude Code controversy reflects trade-offs between transparency and simplicity.

📊 Matplotlib Closed AI-Generated PR (#31132), Renewing Governance Debate

According to Merchmindai report, Matplotlib closed AI-generated PR (#31132), sparking broad debate on open-source project governance.

AI-generated code review is an emerging issue for open-source governance. Matplotlib’s decision represents a conservative stance.

Production Milestone

🚗 Waymo Began Fully Autonomous Operations with Its 6th-Gen Driver; Metro Phoenix Factory Shifting Toward Tens of Thousands per Year

According to Waymo official blog, Waymo began fully autonomous operations with its 6th-gen Driver; Metro Phoenix factory shifting toward tens of thousands of units per year via OEM partnerships.

Fully autonomous production deployment is an AI application milestone. Waymo’s scale-up operations mark L4 autonomous driving entering commercialization phase.

🔍 Infra Insights

Today’s news points to core trends in AI infrastructure: throughput breakthroughs and mega-rounds.

Regarding throughput breakthroughs, the industry shows full-stack optimization from hardware to software: Nvidia’s dynamic memory sparsification (8x reasoning cost reduction), Together AI’s CPD architecture (35-40% long-context throughput gain), OpenAI’s GPT-5.3-Codex-Spark (real-time coding over 1000 tok/s). This indicates AI inference performance gains rely not just on hardware scaling, but also architectural and algorithmic co-optimization.

In terms of mega-rounds, Anthropic’s $30B Series G ($380B valuation), Nscale’s $1.4B GPU-backed debt facility indicate capital markets view AI as a long-term structural opportunity, not a cyclical bubble. Monaco ($35M), Matia ($21M) AI-native application platform financing shows investment extending from infrastructure layer to application layer.

AI-native platforms are forming new competitive landscapes: AuraSell’s GTM OS, Monaco’s sales platform, Matia’s data infrastructure—all AI-nativizing traditional software workflows. This refactoring isn’t simple AI feature addition, but redefining application morphology from the operating system level.

Open-source layer—ClawRoute (60-90% cost reduction), Mango Lollipop (CLI generation), ISSA-Repository (persistent identity)—demonstrates open-source community innovation vitality in lowering AI deployment barriers. Security issues (OpenClaw scan, SEO poisoning defenses) and governance issues (Matplotlib PR review) emergence marks AI technology entering social negotiation phase.

Throughput breakthroughs lower unit intelligence costs; mega-rounds support infrastructure construction; AI-native platforms refactor application morphology—AI infrastructure is moving from “experimental validation” to “scale-out deployment”.