The latest MLPerf results for the Intel Gaudi 2 accelerator and 5th Gen Intel Xeon showcase how Intel is pushing the boundaries of generative AI performance within its product lineup and alongside its ecosystem partners.
Today, MLCommons released results from the industry-standard MLPerf v4.0 benchmark for inference. Intel’s performance with Intel® Gaudi® 2 accelerators and 5th Gen Intel® Xeon® Scalable processors featuring Intel® Advanced Matrix Extensions (Intel® AMX) underscores the company’s dedication to enabling “AI Everywhere” through a diverse range of competitive solutions. The Intel Gaudi 2 AI accelerator stands out as a strong performer in generative AI (GenAI) alongside Nvidia H100, offering impressive performance-per-dollar metrics. Additionally, Intel is the sole server CPU vendor to submit MLPerf results, with the 5th Gen Xeon delivering a 1.42x improvement over the 4th Gen Intel® Xeon® processors in MLPerf Inference v3.1.
“We are continuously enhancing AI performance across our range of accelerators and CPUs on industry-standard benchmarks. These results highlight our commitment to providing AI solutions that meet the diverse and evolving needs of our customers. Both Intel Gaudi and Xeon products offer customers deployable options with significant price-to-performance advantages.”
–Zane Ball, Intel corporate vice president and general manager, DCAI Product Management
Why It Matters: Intel’s MLPerf results build on previous rounds, offering customers a standardized benchmark to evaluate AI performance.
About the Intel Gaudi 2 Results: The Intel® Gaudi® software suite continues to expand model coverage for popular large language models (LLMs) and multimodal models. In MLPerf Inference v4.0, Intel submitted Gaudi 2 accelerator results for cutting-edge models like Stable Diffusion XL and Llama v2-70B.
Driven by high demand for Hugging Face Text Generation Inference (TGI), Gaudi’s Llama results utilized the TGI toolkit, supporting continuous batching and tensor parallelism to enhance real-world LLM scaling efficiency. Notably, Gaudi 2 achieved 8035.0 and 6287.5 offline and server tokens-per-second on Llama v2-70B, respectively. For Stable Diffusion XL, Gaudi 2 achieved 6.26 and 6.25 offline samples-per-second and server queries-per-second, respectively. These results reinforce Intel Gaudi 2’s competitive price/performance ratio, a key factor in total cost of ownership (TCO) considerations.
About the Intel 5th Gen Xeon Results: Following hardware and software enhancements, Intel’s 5th Gen Xeon results showed a geomean improvement of 1.42x compared to 4th Gen Intel Xeon processors in MLPerf Inference v3.1. For instance, the GPT-J submission on 5th Gen Xeon, with software optimizations like continuous batching, demonstrated a 1.8x performance increase compared to v3.1. Similarly, DLRMv2 exhibited about 1.8x performance gains and 99.9 accuracy due to optimizations like MergedEmbeddingBag utilizing Intel AMX.
Intel is proud of its collaboration with OEM partners – Cisco, Dell, Quanta, Supermicro, and WiWynn – who have submitted their own MLPerf results. Intel has also submitted MLPerf results for four generations of Xeon products since 2020, with Xeon serving as the host CPU for various accelerator submissions.
How to Try AI Solutions on Intel Developer Cloud: 5th Gen Xeon processors and Intel Gaudi 2 accelerators are available for evaluation on the Intel® Developer Cloud. This platform enables users to run both small- and large-scale training (LLM or GenAI) and inference production workloads, manage AI compute resources, and more.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW