AI Infra Brief｜WebSocket Agent Era and Rise of Sovereign LLMs (2026.02.25)

February 25, 2026 — Agent infrastructure enters the “stateful connection” era as OpenAI launches WebSocket mode, marking a paradigm shift from stateless LLM calls to persistent agent sessions. Simultaneously, the rise of reasoning diffusion models and sovereign LLMs signals diversification and regionalization of AI infrastructure.

🧭 Key Highlights

🔌 OpenAI launches WebSocket mode for long-chain agent optimization

⚡ Inception Labs releases Mercury 2 reasoning diffusion model

🇮🇳 India launches Sarvam-30B/105B sovereign LLMs

🛡️ Anthropic accuses Chinese models of “capability extraction”

🗄️ Oracle AI Database 26ai goes GA

🌐 Cloudflare releases vinext Next.js replacement

🔧 Multi-agent concurrent workflow solutions emerge

Agent Infrastructure and Runtime

🔌 OpenAI: WebSocket Mode for Long-Chain Agents

According to OpenAI documentation, OpenAI has launched WebSocket mode for the Responses API, designed for long workflows and multi-tool call scenarios. Official data shows a 40% reduction in execution time for 20+ tool call scenarios.

Core essence: Transition from “request-response” to “persistent connections,” as agent infrastructure begins optimizing “control plane latency.”

This represents an architectural signal: from stateless LLM calls → stateful agent sessions. For developers building agent runtimes, workflow engines, or custom orchestration layers, this is a significant infrastructure shift to monitor.

🔧 Multi-Agent Concurrent Conflict Resolution

According to MyClaw Newsletter analysis, multiple Claude Code sessions can overwrite each other. The solution is using the --worktree flag to create independent git worktrees for each agent.

1
2
claude --worktree my-feature
claude --worktree --tmux

This is an agent-level code isolation model worth applying to multi-agent orchestration scenarios.

📊 Agent Evaluation Framework Evolution

According to Daily Dose of Data Science LLMOps Part 9, traditional MLOps assumptions fail in the LLM era: models are no longer self-controlled (API models), inputs are natural language, outputs are non-deterministic, requiring new evaluation frameworks.

Implicit focus: Evaluation becomes the core bottleneck of LLM productization, aligning with the philosophy that “Infrastructure is not compute, but control capability.”

🔬 Instruction Files Actually Hinder Agents

Research shows that instruction files like AGENTS.md and CLAUDE.md reduce success rates and increase costs by 20%. The reason: models are now sufficiently capable at autonomous navigation within codebases.

Implicit trend: From “explicit rule-driven” → “context retrieval-driven,” context engineering > static instruction.

Models and Inference

⚡ Inception Labs: Mercury 2 Reasoning Diffusion Model

According to Business Wire, Inception Labs launched Mercury 2, a large language model that generates via parallel denoising rather than autoregression. Reported throughput is approximately 1,000 tokens/sec on NVIDIA Blackwell—5x faster than Claude 4.5 Haiku and GPT-5 Mini—with 128K context, real-time agent loops, and voice capabilities.

🇮🇳 India: Sarvam Sovereign LLMs

According to Drishtiias, at the India AI Impact Summit 2026, Sarvam AI unveiled Sarvam-30B and Sarvam-105B (MoE with ~9B active parameters, 128K context), trained across 22 Indic languages and English math/code using NVIDIA NeMo, open-sourced for sovereign adoption.

This is part of India’s $1.2B IndiaAI Mission, which launched five sovereign LLMs in February 2026, including GPU subsidies, startup funding, large-scale compute, and the MANAV Vision ethical governance framework.

🛡️ Anthropic Accuses Chinese Models of “Capability Extraction”

According to multiple reports, Anthropic has accused Chinese labs (such as DeepSeek, Moonshot AI, MiniMax) of extracting capabilities through massive account interactions with Claude (16M+). Anthropic announced strengthened API verification and security mechanisms, while critics like Elon Musk pointed out the “irony of their copyright stance.”

Strategic implications: Model capability “distillation extraction” has become a gray area competition norm. API access control, authentication, and usage pattern analysis will become key capabilities of the next phase of LLM infrastructure. “Reasoning capability” is becoming an asset-class resource.

Databases and Data Engineering

🗄️ Oracle: AI Database 26ai GA

According to DBTA, Oracle AI Database 26ai Enterprise Edition for Linux x86-64 is GA for on-prem deployment, adding unified hybrid vector search, first-class AI agents, and Model Context Protocol support; replaces Oracle Database 23ai with no upgrade required.

❄️ Snowflake: Cortex Code Expansion

According to DBTA, Snowflake expanded Cortex Code to span external sources like dbt and Apache Airflow for natural-language code generation and optimization across heterogeneous pipelines.

🏔️ Cloudera: AI Inference Platform On-Prem

According to DBTA, Cloudera extended its AI Inference platform and Cloudera Data Warehouse with Trino to on-prem deployments.

Infrastructure and Deployment Tools

🌐 Cloudflare: vinext Next.js Replacement

According to Cloudflare blog, Cloudflare’s vinext (a Next.js replacement built with Claude) reports 4.4x faster builds, 57% smaller bundles, native Workers deploys, and traffic-aware pre-rendering.

🔍 Qdrant 1.17: Vector-Native Relevance Feedback

According to DBTA, Qdrant 1.17 adds vector-native relevance feedback queries for iterative RAG refinement.

💾 IBM: Agentic-AI-Driven Storage

According to DBTA, IBM introduced agentic-AI-driven FlashSystem for autonomous storage operations.

Enterprise Networking and Agent Platforms

🔌 Cisco: Agentic AI Adoption Framework

According to DBTA, Cisco outlined an agentic AI adoption framework across protection, interaction governance, and resilient connectivity, and highlighted Nexus One with Isovalent Cilium for low-latency AI networking.

⚡ Vercel: Chat SDK Cross-Platform Deployment

According to X, Vercel’s Chat SDK ships a single codebase to deploy agents across Slack, Discord, Teams, GitHub, and Linear with streaming.

🌊 Liquid AI & Together AI: LFM2-24B-A2B Serverless Deployment

According to X, Liquid AI and Together AI offer serverless deployment for LFM2-24B-A2B with a 99.9% reliability SLA.

☁️ Daytona: Agent-Native Cloud Infra

According to X, Daytona announced agent-native cloud infra for secure, stateful agent runtimes.

Decentralized AI Infrastructure

💵 Circle: Joins Agentic AI Foundation

According to X, Circle joined the Agentic AI Foundation to position USDC for agent payments.

🔗 RelAI & OnFinality: Cross-Chain Coordination

According to X, RelAI and OnFinality integrated to coordinate agent actions across chains with native settlement.

📱 Acurast & Base: Mobile Node Verifiable Execution

According to X, Acurast’s integration with Base brings 225,000+ mobile nodes for verifiable AI execution.

⛓️ Ritual: Protocol-Layer Model Compute

According to X, Ritual advanced “enshrined compute” to run models verifiably at the protocol layer.

🚀 OpenServAI: Solana AI-Native Layer

According to X, OpenServAI promotes an AI-native layer on Solana powered by the $SERV token.

🔓 Bless: Permissionless Edge-Native Compute

According to X, Bless targets permissionless, edge-native compute from idle devices.

🔍 Infra Insights

Today’s news points to three core shifts in AI infrastructure: agent infrastructure moving from stateless calls to stateful sessions, diversified exploration of inference model architectures, and simultaneous rise of sovereign AI and decentralized infrastructure.

OpenAI’s WebSocket mode represents a paradigm shift in agent runtime, transitioning from traditional request-response models to persistent connections, critical for latency optimization in long-chain agent workflows. Inception Labs’ Mercury 2, through diffusion rather than autoregressive generation, represents another exploration of model architecture paradigms, achieving significant improvements in inference speed.

India’s Sarvam series of sovereign LLMs and progress in multiple decentralized AI projects (Circle, RelAI, Ritual, OpenServAI) demonstrate that AI infrastructure is undergoing a dual movement toward regionalization and decentralization. The GA releases of AI-native databases from Oracle, Snowflake, and Cloudera indicate traditional data infrastructure is fully embracing vector retrieval and agent capabilities.

Anthropic’s accusations against Chinese models for “capability extraction” reveal that API access control and authentication are becoming key capability boundaries for LLM infrastructure. Research showing instruction files hinder agent performance reinforces the trend that “context retrieval is superior to static instruction.”

Cloudflare’s vinext and Daytona’s agent-native infrastructure showcase new deployment patterns and edge computing potential. Overall, AI infrastructure is evolving from general compute to specialized capabilities, from centralized services to multi-center ecosystems, and from stateless calls to stateful agent sessions.