- cross-posted to:
- [email protected]
- nvidia
- cross-posted to:
- [email protected]
- nvidia
Here are some initial benchmarks of the Grace CPU performance while the Hopper GPU benchmarks will be coming in a follow-up article.
NVIDIA’s GH200 combines the 72-core Grace CPU with H100 Tensor Core GPU and support for up to 480GB of LPDDR5 memory and 96GB of HBM3 or 144GB of HBM3e memory. The Grace CPU employs Arm Neoverse-V2 cores with 1MB of L2 cache per core and 117MB of L3 cache.
On a geo mean basis across all the benchmarks conducted, the GH200 Grace CPU performance nearly matched the Intel Xeon Platinum 8592+ Emerald Rapids processor. The Arm Neoverse-V2 based Grace CPU tended to be much faster than the 128-core Ampere Altra Max AArch64 server.
Overall the NVIDIA GH200 CPU benchmarking was quite fascinating to see its early potential. There still are some workloads not too well optimized for AArch64 and in some cases the higher core counts and dual socket configurations available with Intel Xeon Emerald Rapids and AMD EPYC Genoa(X) / Bergamo could drive the results much higher.
This is the best summary I could come up with:
GPTshop.ai is building what they aim to be “the ultimate high-end desktop” as a supercomputer built around the GH200 focused on AI and HPC workloads.
Their system uses the GH200 Grace Hopper Superchip dual 2000+ Watt power supplies, QCT motherboard, and can be configured with multiple SSDs as well as various NVIDIA Bluefield/Connect-X adapters and more.
Pricing with the GH200 does not come cheap with the currently available GPTshop.ai GH200 576GB model starting out at 47,500 € (~$41k USD due to no taxes when shipped outside the EU).
For the purposes of this testing Ubuntu 23.10 with Linux 6.5 was used for having an up-to-date kernel as well as the GCC 13 stock compiler.
The toolchain versions are close to what will be found in Ubuntu 24.04 LTS in April and using Ubuntu 23.10 is worthwhile for a leading-edge look at the NVIDIA GH200 Linux performance as well as against the other Intel Xeon Scalable, AMD EPYC, and Ampere Altra Max processors for this comparison.
So, unfortunately, for this article are just the initial raw CPU performance benchmark numbers while working on figuring out any way for being able to nicely read any available power metrics.
The original article contains 653 words, the summary contains 196 words. Saved 70%. I’m a bot and I’m open source!