Home > Products > New NVIDIA switches allow trillion-parameter GPUs and AI Infrastructures

Artificial Intelligence

New NVIDIA switches allow trillion-parameter GPUs and AI Infrastructures

19th March 2024

NVIDIA

Harry Fowle

0 0

NVIDIA have announced a new wave of networking switches, the X800 series, designed for trillion-parameter GPUs and AI.

The world’s first networking platforms capable of end-to-end 800Gb/s throughput, NVIDIA Quantum-X800 InfiniBand and NVIDIA Spectrum-X800 Ethernet push the boundaries of networking performance for computing and AI workloads. They feature software that further accelerates AI, Cloud, data processing and HPC applications in every type of data centre, including those that incorporate the newly released NVIDIA Blackwell architecture-based product lineup.

“NVIDIA Networking is central to the scalability of our AI supercomputing infrastructure,” said Gilad Shainer, senior vice president of Networking at NVIDIA. “NVIDIA X800 switches are end-to-end networking platforms that enable us to achieve trillion-parameter-scale generative AI essential for new AI infrastructures.”

Initial adopters of Quantum InfiniBand and Spectrum-X Ethernet include Microsoft Azure and Oracle Cloud Infrastructure.

“AI is a powerful tool to turn data into knowledge. Behind this transformation is the evolution of data centres into high-performance AI engines with increased demands for networking infrastructure,” said Nidhi Chappell, Vice President of AI Infrastructure at Microsoft Azure. “With new integrations of NVIDIA networking solutions, Microsoft Azure will continue to build the infrastructure that pushes the boundaries of Cloud AI.”

Coreweave is also among early adopters.

Next standard for extreme performance

The Quantum-X800 platform sets a new standard in delivering the highest performance for AI-dedicated Infrastructure. It includes the NVIDIA Quantum Q3400 switch and the NVIDIA ConnectX-8 SuperNIC, which together achieve an industry-leading end-to-end throughput of 800Gb/s. This is 5x higher bandwidth capacity and a 9x increase of 14.4Tflops of In-Network Computing with NVIDIA’s Scalable Hierarchical Aggregation and Reduction Protocol (SHARPv4) compared to the previous generation.

The Spectrum-X800 platform delivers optimized networking performance for AI Cloud and enterprise infrastructure. Utilizing the Spectrum SN5600 800Gb/s switch and the NVIDIA BlueField-3 SuperNIC, the Spectrum-X800 platform provides advanced feature sets crucial for multi-tenant generative AI Clouds and large enterprises.

Spectrum-X800 optimizes network performance, facilitating faster processing, analysis, and execution of AI workloads, thereby expediting the development, deployment, and time to market of AI solutions. Designed specifically for multi-tenant environments, Spectrum-X800 ensures performance isolation for each tenant's AI workloads to maintain optimal and consistent performance levels, enhancing customer satisfaction and service quality.

NVIDIA software support

NVIDIA provides a comprehensive suite of network acceleration libraries, software development kits and management software to optimize performance for trillion-parameter AI models.

This includes NVIDIA Collective Communications Library (NCCL), which extends GPU parallel computing tasks to the Quantum-X800 network fabric, taking advantage of its powerful In-Network Computing capabilities with SHARPv4 supporting FP8, supercharging performance for large model training and generative AI.

NVIDIA’s full-stack software approach provides advanced programmability, making data centre networks more flexible, reliable and responsive, ultimately increasing overall operational efficiency and supporting the needs of modern applications and services.

Ecosystem momentum

Next year, Quantum-X800 and Spectrum-X800 will be available from a wide range of leading infrastructure and system vendors around the world, including Aivres, DDN, Dell Technologies, Eviden, Hitachi Vantara, Hewlett Packard Enterprise, Lenovo, Supermicro and VAST Data.