Mark Zuckerberg flexes that Meta’s cluster of Nvidia H100 chips is bigger than the competition

Zuckerberg said Meta’s Llama 4 AI models are training on a GPU cluster “bigger than anything that I’ve seen reported for what others are doing.”

Meta

Mark Zuckerberg says Meta’s Llama 4 AI models are training on the biggest GPU cluster in the industry.
During Meta’s earnings call, he said the cluster is “bigger than 100,000 H100s.”
Elon Musk has said xAI is using 100,000 of Nvidia’s H100 GPUs to train chatbot Grok.

Elon Musk has talked up his AI startup’s huge inventory of in-demand Nvidia chips. Now, it’s Mark Zuckerberg’s turn to flex.

A lot of computing power is going into training Meta’s forthcoming Llama 4 AI models — more than anything currently offered by the competition, according to Zuckerberg.

During Meta’s Q3 earnings call Wednesday, the Meta CEO said Llama 4 is “well into its development” and being trained on a GPU cluster bigger than that of any of its rivals.

“We’re training the Llama 4 models on a cluster that is bigger than 100,000 H100s or bigger than anything that I’ve seen reported for what others are doing.”

That 100,000 number may refer to Musk’s AI startup, xAi, which recently launched its “Colossus” supercomputer this summer. The Tesla CEO has called it the biggest supercomputer in the world and said xAI is using 100,000 of Nvidia’s H100 graphics processing units, or GPUs, to train its chatbot Grok.

Nvidia’s H100 chip, also known as Hopper, is highly sought after by tech giants and AI startups for computing power and training large language models. It costs an estimated $30,000 to $40,000 per chip.

The number of H100s that a company has amassed has factored into hiring top AI talent. Perplexity CEO Aravind Srinivas said in a podcast interview that the topic came up when he had tried to poach someone from Meta.

“I tried to hire a very senior researcher from Meta, and you know what they said? ‘Come back to me when you have 10,000 H100 GPUs,'” Srinivas said in March.

Meta released its Llama 3 models in April and July. Zuckerberg added in the earnings call Wednesday that Meta’s Llama 4 models will have “new modalities, capabilities, stronger reasoning” and be “much faster.” The smaller models will probably be ready to launch sooner, likely in early 2025, he said.

Asked about Meta’s big spending on AI, Zuckerberg said the company was building out its AI infrastructure faster than expected and that he’s “happy the team is executing well on that,” even if it means higher costs, which is “maybe not what investors want to hear.”

Meta expects its capital expenditures to continue to grow into next year as it scales up its AI infrastructure.

The Meta CEO didn’t say exactly how big the company’s H100 chip cluster is. Meanwhile, Musk tweeted earlier this week that xAI will soon double its cluster size in the coming months to 200,000 H100 and H200 chips.

Read the original article on Business Insider

source:https://www.businessinsider.com/mark-zuckerberg-meta-nvidia-h100-chip-cluster-llama-4-2024-10