AI Infra Brief｜Physical AI Capital Surge and Inference Speed Records (2026.02.27)

February 27, 2026 marks significant progress in “agent autonomy” and “sandbox isolation” for AI infrastructure. Perplexity Computer and Cursor Agents provide each agent with independent compute environments, with 30% of Cursor’s internal PRs now created by autonomous agents. Meanwhile, Qwen 3.5 Medium open-weight models were released, with the 35B model activating only 3B parameters per token. Union.ai and Encord collectively raised nearly $100 million, focusing on physical AI data infrastructure.

🧭 Key Highlights

💻 Perplexity Computer: 19-model orchestration system

🤖 Cursor Agents: 30% of internal PRs created by autonomous agents

🧠 Qwen 3.5 Medium open-sourced: 35B activating 3B parameters

🎮 Claude Code gets remote control feature

💰 Union.ai closes $38.1M Series A

🤖 Encord secures $60M Series C

⚡ Mercury 2 achieves 1000 tokens/second inference

📡 Qualcomm demonstrates AI-native 6G technology

Computing and Cloud Infrastructure

💰 Union.ai: $38.1M Series A to Commercialize AI Development Platform

According to Union’s official blog, Union.ai closed a $38.1M Series A to commercialize Union 2.0, an end-to-end AI development platform built on Flyte. The platform offers pure Python authoring, dynamic workflows, and crash-resilient pipelines for training, inference, and observability.

Union 2.0’s core value lies in integrating the entire AI development workflow, enhancing infrastructure reliability through dynamic workflows and crash recovery capabilities.

🤖 Encord: $60M Series C to Scale Physical AI Data Infrastructure

According to PR Newswire, Encord closed a $60M Series C to scale data infrastructure for physical AI. The platform manages multimodal sensor data across the full lifecycle of robotics and autonomy systems.

The rise of physical AI has created demand for specialized data infrastructure. Encord’s financing signals investors are shifting focus from general AI to embodied intelligence and robotics.

🔧 ElastixAI: FPGA Inference Platform Challenges GPUs

According to Engineering, ElastixAI introduced an FPGA-based inference platform as a drop-in alternative to GPUs for data center generative AI workloads, emphasizing efficiency through FPGA parallelism.

FPGA’s programmable nature may offer better energy efficiency than GPUs for specific workloads, providing diversified options for inference infrastructure.

🌐 VAST Data: AI Operating System Unifies Storage and Scheduling

According to SiliconAngle, VAST Data detailed its AI Operating System that unifies storage, a global file system and index, and a GPU-aware scheduler — positioning it as a potential foundation from cloud to edge.

VAST Data’s AI OS attempts to establish tighter coordination between storage and computing, providing end-to-end optimized infrastructure for AI workloads.

💻 Dell: AI Server Backlog Reaches $22B

According to Markets, Dell’s AI server backlog stands at approximately $22B alongside a 32% y/y ISG revenue increase. This data reflects continued strong enterprise demand for AI hardware.

Model Inference and Optimization

🚀 NVIDIA: TensorRT-LLM AutoDeploy Compresses Optimization Cycles

According to NVIDIA’s official X account, NVIDIA highlighted TensorRT-LLM AutoDeploy to compress optimization cycles from weeks to days.

AutoDeploy’s core value lies in lowering the engineering barrier for inference optimization, enabling more teams to quickly deploy high-performance inference services.

⚡ InceptionLabsAI: Mercury 2 Hits 1000 Tokens/Second

According to X platform discussion, InceptionLabsAI’s Mercury 2 — a diffusion-based, non-autoregressive LLM — claims approximately 1000 tokens/second on Blackwell, suggesting a different acceleration stack versus conventional autoregressive decoding.

The application of diffusion models in text generation remains exploratory. Mercury 2’s performance indicates non-autoregressive methods may have advantages in inference speed.

Agentic Infrastructure and Development Tools

💻 Perplexity Computer: 19-Model Orchestration Agentic System

According to official Perplexity announcements, Perplexity Computer is a multi-model agentic system featuring 19 AI models, multiple parallel sub-agents, and one orchestrator, achieving zero tab-switching autonomous workflows. Claude Opus 4.6 sits at the center as the core reasoning engine, routing subtasks to different models and orchestrating multiple agents.

Key features of Perplexity Computer include: multi-model dynamic routing (each subtask automatically assigned to the most suitable model), sandboxed execution (every task runs in an isolated virtual environment), persistent memory & connectors (remembers past work across sessions and connects to hundreds of external services), and usage-based pricing (Max subscribers get 10,000 monthly credits).

🤖 Cursor Agents: Each Agent Gets Its Own Virtual Machine

According to official Cursor announcements, over 30% of PRs merged internally at Cursor are now created by agents running autonomously inside cloud sandboxes, and that capability is now available to everyone. Each cloud agent gets their own isolated virtual machine with a full development environment, the ability to interact with the software they’re building, and tools to produce artifacts like videos, screenshots, and logs to prove their work.

Cursor Agents’ key highlights include: parallel execution in isolated VMs (each agent runs independently, eliminating resource conflicts), self-validating output (agents don’t just write code — they build, run, and interact with the software inside their sandbox, iterating until the output is verified), multi-platform access (accessible from desktop app, web, mobile, Slack, and GitHub), and remote desktop control (directly control the agent’s VM desktop).

🧠 Qwen 3.5 Medium: 35B Model Activating Only 3B Parameters

According to the Qwen team’s official announcement, Alibaba’s Qwen team dropped the Qwen 3.5 Medium Series — four models (35B-A3B, 122B-A10B, 27B, and Flash), where the headline act is a 35B model activating just 3B parameters per token that outperforms last gen’s 235B flagship.

Key features include: hybrid Gated DeltaNet + MoE architecture, 1M token context, natively multimodal. The 27B dense model ties GPT-5 mini on SWE-bench; 122B-A10B dominates tool use benchmarks. Flash API at $0.10/M input tokens with built-in tool calling. All open-weight, Apache 2.0 licensed.

🎮 Claude Code Gets Remote Control

According to Anthropic’s official announcement, Claude Code now has a Remote Control feature that lets you continue a local coding session from your phone, tablet, or any browser. Just run /remote-control and scan a QR code. Everything still runs on your machine (nothing moves to the cloud), and conversations stay synced across all connected devices. Currently available as a research preview on Pro and Max plans.

📚 Simon Willison: Agentic Engineering Patterns Guide

According to Simon Willison’s personal blog, he has started publishing Agentic Engineering Patterns, a growing guide for developers to get the best results out of coding agents like Claude Code and OpenAI Codex. The first two chapters cover why writing code is essentially cheap now and how red/green TDD helps agents produce more reliable output. He plans to add 1-2 chapters a week. Notably, every word is written by him, not an LLM.

🔒 Google Cracks Down on OpenClaw Usage

According to community reports, Google is cracking down on AI Pro/Ultra subscribers who used third-party tools like OpenClaw to pipe their Antigravity tokens into external apps, effectively turning a $249/mo subscription into unlimited API access. Google says it’s protecting service quality from “malicious usage.”

Data Path and Enterprise Deployment

📋 Capxel: LLM-LD Open Standard Makes Websites AI-Readable

According to MarTech Series, Capxel proposed LLM-LD, an open standard to make websites AI-readable via a .well-known index and structured entities, with three conformance levels and 100+ sites reportedly live.

LLM-LD attempts to solve the structured content problem for AI Agent access to the web, similar to how sitemaps help crawlers, but optimized for LLMs.

🛣️ Path: AI-Native Software Platform for Unified Application Development

According to Yahoo Finance, Path launched an AI-native software platform for building and evolving applications in a unified environment.

🛡️ NeuralTrust: Recognized in Gartner Guardian Agents Market Guide

According to Morningstar, NeuralTrust was recognized in Gartner’s Market Guide for Guardian Agents, covering runtime protection, automated red teaming, evaluation, and observability for LLM apps and agents.

Guardian Agents represent a new paradigm for LLM security, monitoring and protecting AI system runtime behavior through dedicated security agents.

☁️ Solo.io: Kagent Framework Treats Agents as K8s First-Class Citizens

According to Virtualization Review, Solo.io outlined treating agents and skills as first-class Kubernetes resources using the kagent framework and agentgateway for governed connectivity.

Mapping agents to K8s resources means enterprises can manage AI agents like containers, providing governance capabilities for large-scale agent deployments.

🗄️ MongoDB: Positioned as Core AI Data Layer

According to Seeking Alpha, MongoDB is profiled as a core AI data layer amid Atlas growth.

Telecom and Edge Computing

📡 Qualcomm: Showcases AI-Native 6G Vision

According to RCR Wireless, Qualcomm showcased 6G demonstrations at MWC Barcelona — including Giga-MIMO, sub-band full duplex, context-aware comms, and distributed AI services — framing 6G as AI-native.

Qualcomm’s 6G vision shows next-generation mobile networks will evolve from “connecting devices” to “connecting intelligence,” with AI becoming an intrinsic part of the network protocol stack.

🏢 HCLTech: Demonstrates AI-Native Telecom Future at MWC

According to Newswire, HCLTech presented agentic fraud management, AI-powered OSS, and AIOps at MWC.

📶 Capgemini: AI-RAN Transforms 5G into Real-Time Edge Growth Platform

According to Capgemini’s official website, Capgemini described AI-RAN to turn 5G into a real-time edge growth platform on NVIDIA-accelerated hardware.

AI-RAN (AI Radio Access Network) represents telecom operators’ AI transformation path, optimizing wireless access networks through AI to enhance network efficiency and edge computing capabilities.

Market and Industry Dynamics

💰 The Tomorrow Company: Building AI-Native Financial Infrastructure Layer

According to Morningstar, The Tomorrow Company announced plans to build an AI-native financial infrastructure layer combining tokenized carbon utilities on Ethereum with an intelligence engine.

📊 Prompt Generation Tools Market to Reach $1.018B by 2031

According to PR Newswire, a prompt-generation tools market forecast projects growth from $456M (2024) to $1.018B by 2031 (12% CAGR).

⚡ WFR: Issues AI Infrastructure Power Rankings

According to Markets, WFR issued AI infrastructure power rankings in a new report.

Community Signals

👥 Block Layoffs Debate Links Workforce Reductions to AI Tools

According to Hacker News discussion, Block layoffs sparked debate linking workforce reductions to “intelligence tools” and AI-augmented productivity.

The impact of AI tools on enterprise organizational structure is emerging — while enhancing individual productivity, they may also reduce dependency on human workforce.

🏆 r/LocalLLM: Self-Hosted Leaderboard Consolidates Consumer Hardware Benchmarks

According to Reddit, the r/LocalLLM self-hosted leaderboard consolidates consumer-hardware benchmarks, including high-TPS Apple Silicon runs.

💚 Open Source Endowment: $693K Fund Launches for Critical OSS

According to Reddit, Open Source Endowment launches with a $693K fund to support critical open-source software, including AI tooling.

🔍 Infra Insights

Today’s core trends: agents get isolated sandboxes, model architectures pursue efficiency, capital shifts to physical AI.

Perplexity Computer and Cursor Agents provide each agent with an independent virtual machine, with 30% of Cursor’s internal PRs now created by autonomous agents — sandbox isolation is solving the core challenge of agent safety and trustworthiness. Qwen 3.5 Medium’s “35B activating 3B” design and Mercury 2’s 1000 tokens/second point in the same direction: from blindly scaling parameter sizes to refined activation and non-autoregressive decoding.

Union.ai and Encord’s nearly $100M combined financing shows capital is flowing from general LLMs to vertical domains like physical AI and robotics. Google’s crackdown on OpenClaw usage suggests emerging AI infrastructure stacks are frictioning with existing platform control.