NVIDIA and Microsoft have unveiled blueprints for a new hyperscale GPU accelerator to drive AI cloud computing. The new HGX-1 hyperscale GPU accelerator is an open-source design released in conjunction with Microsoft’s Project Olympus.
HGX-1 does for cloud-based AI workloads what ATX – Advanced Technology eXtended – did for PC motherboards when it was introduced more than two decades ago. It establishes an industry standard that can be rapidly and efficiently embraced to help meet surging market demand.
The new architecture is designed to meet the exploding demand for AI computing in the cloud – in fields such as autonomous driving, personalised healthcare, superhuman voice recognition, data and video analytics, and molecular simulations.
“AI is a new computing model that requires a new architecture,” said Jen-Hsun Huang, founder and chief executive officer of NVIDIA. “The HGX-1 hyperscale GPU accelerator will do for AI cloud computing what the ATX standard did to make PCs pervasive today. It will enable cloud- service providers to easily adopt NVIDIA GPUs to meet surging demand for AI computing.”
“The HGX-1 AI accelerator provides extreme performance scalability to meet the demanding requirements of fast-growing machine learning workloads, and its unique design allows it to be easily adopted into existing data centres around the world,” wrote Kushagra Vaid, general manager and distinguished engineer, Azure Hardware Infrastructure, Microsoft, in a blog post.
For the thousands of enterprises and startups worldwide that are investing in AI and adopting AI-based approaches, the HGX-1 architecture provides unprecedented configurability and performance in the cloud.
Powered by eight NVIDIA Tesla P100 GPUs in each chassis, it features an innovative switching design – based on NVIDIA NVLinkTM interconnect technology and the PCIe standard – enabling a CPU to dynamically connect to any number of GPUs. This allows cloud service providers that standardise on the HGX-1 infrastructure to offer customers a range of CPU and GPU machine instance configurations.
Cloud workloads are more diverse and complex than ever. AI training, inferencing and HPC workloads run optimally on different system configurations, with a CPU attached to a varying number of GPUs.
The highly modular design of the HGX-1 allows for optimal performance no matter the workload. It provides up to 100x faster deep learning performance compared with legacy CPU-based servers, and is estimated at one-fifth the cost for conducting AI training and one-tenth the cost for AI inferencing.
With its flexibility to work with data centres across the globe, HGX-1 offers existing hyperscale data centres a quick, simple path to be ready for AI.
Collaboration to Bring Industry Standard to Hyperscale
Microsoft, NVIDIA and Ingrasys (a Foxconn subsidiary) collaborated to architect and design the HGX-1 platform. The companies are sharing it widely as part of Microsoft’s Project Olympus contribution to the Open Compute Project, a consortium whose mission is to apply the benefits of open source to hardware and rapidly increase the pace of innovation in, near and around the data centre and beyond.
Sharing the reference design with the broader Open Compute Project community means that enterprises can easily purchase and deploy the same design in their own data centres.
NVIDIA is joining the Open Compute Project to help drive AI and innovation in the data centre. The company plans to continue its work with Microsoft, Ingrasys and other members to advance AI-ready computing platforms for cloud service providers and other data centre customers.