[IND] 6 min readOraCore Editors

NVIDIA and Microsoft unify agentic AI from PC to cloud

5 NVIDIA-Microsoft moves show how agentic AI now spans Windows devices, Azure, local deployment, and secure enterprise runtimes.

Share LinkedIn
NVIDIA and Microsoft unify agentic AI from PC to cloud

NVIDIA and Microsoft are linking PCs, cloud, local systems, and secure runtimes for agentic AI.

At Microsoft Build, NVIDIA said the stack now reaches from Windows devices to Azure and local deployment, with one benchmark showing Microsoft Fabric SQL running up to 6x faster than a CPU baseline.

ItemWhere it runsKey spec
RTX SparkWindows devices1 petaflop AI performance, up to 128GB unified memory
DGX Station for WindowsWindows desktopsUp to 20 petaflops FP4, up to 748GB coherent memory
Microsoft Fabric Data WarehouseCloud data layerUp to 6x faster SQL execution vs CPU baseline
Azure Local with RTX PRO 6000 Blackwell Server EditionLocal and sovereign deploymentsMultinode support, vLLM runtime
Vera Rubin on AzureAI factoriesUp to 10x inference throughput per megawatt

1. RTX Spark for Windows agents

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

NVIDIA and Microsoft are positioning RTX Spark as a personal AI machine for developers who want agents to run natively on Windows. The pitch is simple: build, tune, and run local agents on a laptop or small desktop without depending on the cloud for every step.

NVIDIA and Microsoft unify agentic AI from PC to cloud

RTX Spark targets a practical middle ground between consumer PCs and full datacenter gear. NVIDIA says it delivers 1 petaflop of AI performance, up to 128GB of unified memory, and all-day battery life, while keeping full AI and graphics performance unplugged.

  • Purpose-built Windows PCs for personal agents
  • Ships from Microsoft Surface, ASUS, Dell, HP, Lenovo, and MSI
  • Includes CUDA, RTX, DLSS, and TensorRT support

2. DGX Station for Windows for enterprise workflows

DGX Station for Windows is the bigger sibling for teams that need a deskside AI system for enterprise apps and long-running workflows. NVIDIA says it is built for always-on agents, with enough memory and compute to handle frontier-scale models locally.

According to NVIDIA, the system uses the GB300 Grace Blackwell Ultra Desktop Superchip, offers up to 748GB of coherent memory, and reaches 20 petaflops of FP4 performance. That makes it a fit for model development, local inference, and heavier agent pipelines that cannot wait on round trips to the cloud.

  • Expected from ASUS, Dell, GIGABYTE, HP, MSI, and Supermicro in Q4
  • Runs NVIDIA OpenShell secure runtime
  • Supports models up to 1 trillion parameters

3. NVIDIA models in Microsoft Foundry

Microsoft Foundry is becoming the place where enterprises compose agent systems from multiple model families instead of betting on one model for every job. NVIDIA says its open models now sit alongside Anthropic and OpenAI models in Foundry Agent Service, with built-in identity and governance.

NVIDIA and Microsoft unify agentic AI from PC to cloud

The most notable addition is Nemotron 3 Ultra, an open frontier reasoning model aimed at coding, research, and enterprise workflows. NVIDIA also points to Nemotron 3.5 ASR for speech recognition and Nemotron 3.5 Content Safety, plus Cosmos 3 for physical AI and Earth-2 weather models for forecasting and risk analysis.

  • Nemotron models available on Foundry managed compute
  • Anthropic Claude runs natively on NVIDIA GB300 Blackwell Ultra systems on Azure
  • Agent Toolkit and NemoClaw blueprints are available for production agents

4. Fabric and Azure Local for faster data and local control

Agentic systems need a data layer that can keep up with repeated queries, reasoning loops, and retrieval calls. NVIDIA says its accelerated computing is now built into Microsoft Fabric Data Warehouse, where Microsoft’s internal tests showed SQL execution up to 6x faster than a CPU baseline and up to 7x faster than three other cloud data warehouse providers in high-concurrency workloads.

For teams that need to keep data on site or close to the edge, Microsoft is also bringing Foundry Local on Azure Local to the RTX PRO 6000 Blackwell Server Edition platform. Paired with Nemotron models, it supports multinode deployments and the vLLM runtime for manufacturing, energy, sovereign data centers, and other latency-sensitive uses.

Use case fit: - Fabric Data Warehouse: cloud analytics and agent queries - Azure Local: on-prem, hybrid, and sovereign deployments - vLLM: scaled inference where latency matters

5. OpenShell and Vera Rubin for secure agents and AI factories

As agents move from suggestions to actions, NVIDIA is pushing a security model where each agent runs in its own sandbox and every outbound call is checked against policy before it reaches files, networks, or credentials. That is the job of OpenShell, now integrated into GitHub Copilot and released as open source under Apache 2.0.

The other half of the story is the datacenter. Microsoft says Fairwater Wisconsin is live and validated for NVIDIA Vera Rubin, which can slot into Azure without retrofits. NVIDIA says the platform can deliver up to 10x inference throughput per megawatt and cut cost per agentic token by an order of magnitude, while Confidential Computing protects models and data at scale.

  • OpenShell is model-agnostic
  • Policies are written as code and versioned in the repo
  • Vera Rubin works alongside Blackwell in Azure data centers

How to decide

If you want local development and fast iteration, start with RTX Spark. If your team needs heavier Windows-based model work, DGX Station for Windows is the stronger fit. If your priority is enterprise orchestration across data, governance, and model choice, Microsoft Foundry and Fabric are the center of gravity.

For regulated or latency-sensitive deployments, Azure Local with RTX PRO 6000 Blackwell Server Edition is the better path. If your concern is agent safety, OpenShell matters most. If you are planning large-scale inference infrastructure, Vera Rubin and the AI factory stack are the pieces to watch.