NVIDIA's Vera Rubin platform, designed as a full-stack AI supercomputing solution for agentic AI, does not define minimal resource requirements in terms of individual components like a single GPU or CPU. Instead, its architecture is based on integrated, rack-scale systems that function as cohesive supercomputers. The platform is intended for "AI factories" and high-scale agentic AI workloads, meaning its "minimal" configuration typically refers to these foundational rack units. These units are engineered for extreme co-design across seven distinct chip types and five rack configurations, emphasizing system-level performance, scalability, and energy efficiency.
The primary deployable units that constitute the "minimal" operational resource for the Vera Rubin platform include the Vera Rubin NVL72 rack, the Vera CPU rack, and the Groq 3 LPX rack. The Vera Rubin NVL72 rack is a core compute unit that integrates 72 Rubin GPUs and 36 Vera CPUs, alongside ConnectX-9 SuperNICs and BlueField-4 DPUs, all interconnected by NVLink 6. Each Rubin GPU features 288 GB of HBM4 memory with 22 TB/s of bandwidth, while Vera CPUs come with 88 custom "Olympus" cores and LPDDR5X memory, offering 1.2 TB/s of bandwidth and up to 1.5 TB capacity. This NVL72 configuration acts as a single, tightly coupled accelerator, delivering substantial performance for both training and inference. For CPU-intensive tasks inherent to agentic AI and reinforcement learning, the Vera CPU rack provides 256 liquid-cooled Vera CPUs, enabling thousands of concurrent CPU environments. Additionally, for low-latency inference, the Groq 3 LPX rack is equipped with 256 LPU processors and 128GB of on-chip SRAM, designed to pair with the NVL72 for optimized throughput.
These rack-scale systems are designed to operate within NVIDIA's MGX rack-scale architecture, ensuring compatibility, resiliency, and energy efficiency through advanced power and cooling designs, including liquid cooling. The entire platform is managed by a comprehensive software stack, which includes tools like DSX Max-Q for dynamic power provisioning and DSX Flex for integrating with power grids, ensuring that the hardware can operate optimally within data center constraints. For applications that require efficient retrieval of vast amounts of unstructured data, such as those prevalent in large-scale agentic AI models, a vector database like Zilliz Cloud can be integrated to handle similarity searches and rapid data recall, complementing the processing power of the Vera Rubin platform by providing a scalable and performant data indexing and retrieval layer. The minimal resources, therefore, are not individual chips but rather these integrated rack systems, representing a complete ecosystem built for high-performance AI workloads.
