NVIDIA has used its latest GTC keynote to lay out a vision for the future of AI infrastructure, unveiling the new Vera Rubin platform – a fully integrated system designed to power the next wave of agentic AI at scale.
At the heart of the announcement is a shift away from isolated chips and servers toward tightly coupled, system-level design. The Vera Rubin platform brings together seven new components – the Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the newly integrated Groq 3 LPU – into what NVIDIA describes as a single, giant AI supercomputer. This unified architecture is engineered to support the entire AI lifecycle, from pretraining and post-training to real-time inference for agentic systems.
According to NVIDIA CEO Jensen Huang, this marks a pivotal moment for the industry, signalling the arrival of “agentic AI” and the beginning of what he describes as the largest infrastructure buildout in history. Industry leaders appear to agree, with companies like Anthropic and OpenAI highlighting the need for increasingly powerful and efficient infrastructure to support complex reasoning models and mission-critical AI applications.
From chips to AI factories
A key theme underpinning the keynote was the evolution of AI infrastructure into POD-scale systems and AI factories. Rather than deploying standalone hardware, organisations are increasingly building large-scale, rack-based systems that operate as unified computing environments. These AI factories are designed to maximise performance, efficiency, and scalability while lowering costs and energy consumption.
The Vera Rubin platform is central to this transition. Built through deep co-design across compute, networking, and storage – and supported by an ecosystem of more than 80 partners – it enables multiple racks to function as a single, coherent system.
One example is the NVL72 rack, which integrates 72 Rubin GPUs and 36 Vera CPUs connected via NVLink 6. This configuration delivers significant efficiency gains, including the ability to train large models using a fraction of the GPUs required by previous architectures, while achieving up to 10x higher inference throughput per watt and dramatically reducing cost per token.
Complementing this is the Vera CPU rack, designed specifically for reinforcement learning and agentic workloads. With 256 CPUs in a dense, liquid-cooled configuration, it provides the scalable, high-performance environment needed to validate and refine AI models, delivering faster and more efficient performance than traditional CPU systems.
Accelerating inference and memory
NVIDIA also introduced the Groq 3 LPX rack, targeting low-latency inference for large-scale agentic systems. By combining LPUs with GPUs, the system enables joint computation across AI model layers, delivering up to 35x higher inference throughput per megawatt. This architecture is optimised for trillion-parameter models and long-context applications, opening new opportunities for high-value AI services.
To support the growing memory demands of these systems, NVIDIA unveiled the BlueField-4 STX storage rack, an AI-native storage solution designed to extend GPU memory across the entire POD. Leveraging the new DOCA Memos framework, it accelerates key-value cache processing – critical for large language models – boosting inference throughput while improving energy efficiency.
Meanwhile, the Spectrum-6 SPX Ethernet rack addresses the challenge of high-speed connectivity within AI factories, delivering low-latency, high-throughput communication between racks. Its use of co-packaged optics improves both power efficiency and system resiliency compared to traditional networking approaches.
Efficiency, resiliency, and power optimisation
Beyond raw performance, NVIDIA is placing strong emphasis on energy efficiency and operational resilience. The newly announced DSX platform introduces capabilities such as dynamic power provisioning, enabling up to 30% more AI infrastructure to be deployed within existing power constraints.
Additional software tools, including DSX Flex, allow AI factories to interact dynamically with power grids, potentially unlocking vast amounts of previously unusable energy capacity. Combined with tightly integrated cooling, networking, and compute systems, this approach aims to ensure reliable, high-performance operation under continuous workloads.
Dynamo: the operating system of AI factories
On the software side, NVIDIA introduced Dynamo 1.0, an open source platform designed to orchestrate inference at scale. Acting as a distributed “operating system” for AI factories, Dynamo manages GPU and memory resources across clusters, optimising performance for complex, bursty workloads.
The platform improves efficiency by intelligently routing tasks, reducing memory bottlenecks, and minimising wasted computation. In benchmark tests, it has demonstrated up to a sevenfold increase in inference performance on NVIDIA Blackwell GPUs, while lowering token costs and increasing overall system productivity.
Dynamo is also being integrated into a wide range of open source frameworks, including LangChain and vLLM, further accelerating adoption across the AI ecosystem.
Designing the AI factories of the future
To support the deployment of these large-scale systems, NVIDIA introduced the Vera Rubin DSX AI Factory reference design alongside the Omniverse DSX Blueprint. Together, these tools provide a comprehensive framework for designing, building, and operating AI factories.
Using digital twin technology powered by NVIDIA Omniverse, organisations can simulate entire AI factory environments – modelling everything from power consumption and thermal behaviour to network performance – before physical deployment. This enables faster time to market, improved efficiency, and reduced risk in large-scale infrastructure projects.
Industry leaders including Cadence, Siemens, Schneider Electric, and Vertiv are already contributing to these frameworks, underscoring the scale and complexity of the emerging AI infrastructure ecosystem.