Nvidia CEO Jensen Huang speaks during a press conference at The MGM during CES 2018 in Las Vegas on January 7, 2018.
Mandel Ngan | AFP | Getty Images
Software that can write snippets of text or draw pictures that look like they were created by a human has sparked a gold rush in the tech industry.
Companies like Microsoft and Google are scrambling to integrate the latest artificial intelligence into their search engines, as billion-dollar rivals like OpenAI and Stable Diffusion race ahead and release their software to the public.
Many of these applications are powered by a roughly $10,000 chip that has become one of the most important tools in the AI industry: the Nvidia A100.
For now, the A100 has become the “workhorse” for AI professionals, said Nathan Benaich, an investor who publishes a newsletter and reports covering the AI industry, including a partial list of supercomputers using the A100. According to New Street Research, Nvidia has 95% of the market for GPUs that can be used for machine learning.
A100 is ideal for machine learning models that work with tools like ChatGPT, Bing AI or Stable Diffusion. It is able to perform many simple calculations simultaneously, which is important for training and using neural network models.
The A100 technology was originally used to display complex 3D graphics in games. It’s often referred to as a GPU or a GPU, but these days Nvidia’s A100 is configured and focused on machine learning tasks and runs in data centers, not blazing-fast gaming PCs.
Large companies or startups working on software such as chatbots and image generators need hundreds or thousands of Nvidia chips and either purchase them themselves or provide access to the computers from a cloud provider.
Hundreds of GPUs are necessary for training artificial intelligence models such as large language models. Chips must be powerful enough to quickly process terabytes of data and recognize patterns. After that, GPUs like the A100 are also needed for “inference,” or using the model to generate text, make predictions, or identify objects in photos.
That means AI companies need access to a lot of A100s. Some entrepreneurs in the space even view the number of A100s they have access to as a sign of progress.
“A year ago we had 32 A100s,” said Stability AI CEO Emad Mostake wrote on Twitter in January. “Dream big and pack the GPUs, kids. Brrr.” Stability AI is the company that helped develop Stable Diffusion, an image generator that gained attention last fall and is reportedly valued at more than $1 billion.
The AI stable now has access to more than 5,400 A100 GPUs, according to one estimate from the State of AI report, which shows and tracks which companies and universities have the largest collection of A100 GPUs — though that doesn’t include cloud providers that don’t publish their numbers publicly.
Nvidia is riding the AI train
Nvidia will benefit from the AI hype cycle. During the fourth fiscal quarter on Wednesday earnings report, although overall sales fell 21%, investors lifted the stock about 14% on Thursday, mainly because the company’s artificial intelligence chip business (reported as data centers) rose 11% to more than $3.6 billion in sales for the quarter, showing continued growth.
Nvidia shares are up 65% in 2023, outperforming the S&P 500 and other semiconductor stocks.
Nvidia CEO Jensen Huang couldn’t stop talking to analysts on Wednesday, suggesting that the recent boom in artificial intelligence is at the heart of the company’s strategy.
“The activity around the AI infrastructure that we’ve built and the activity around the inference using Hopper and Ampere to influence large language models in the last 60 days has just increased,” Huang said. “There’s no question that whatever our outlook is for this year as we go into the year has changed pretty dramatically as a result of the last 60, 90 days.”
Ampere is Nvidia’s codename for the A100 generation of chips. Hopper is the code name for the new generation, including the H100, which has recently started shipping.
We need more computers
Nvidia A100 processor
Nvidia
Compared to other types of software, such as serving a web page, which periodically uses processing power in microseconds, machine learning tasks can consume all of a computer’s processing power, sometimes within hours or days.
This means that companies that find themselves with a popular AI product often need to purchase more GPUs to handle peak periods or improve their models.
These GPUs don’t come cheap. In addition to a single A100 on a card that can be inserted into an existing server, many data centers use a system that includes eight A100 GPUs working together.
It’s easy to see how the price of the A100 could increase.
For example, a New Street Research evaluation found that the OpenAI-based ChatGPT model on Bing search could require 8 GPUs to deliver an answer to a question in less than one second.
At this rate, Microsoft would need more than 20,000 8-GPU servers just to roll out the model to Bing for everyone, suggesting that Microsoft’s feature could cost $4 billion in infrastructure costs.
“If you’re at Microsoft and you want to scale it to the scale of Bing, it could be $4 billion. If you want to scale at the scale of Google, which serves 8 or 9 billion queries every day, you really need to spend $80 billion on DGX,” said Antoine Chacaivan, a technology analyst at New Street Research. “The numbers we got are huge. But they just a reflection of the fact that each individual user moving to such a large language model requires a huge supercomputer while using it.’
The latest version of Stable Diffusion, an image generator, was trained on 256 A100 GPUs, or 32 machines with 8 A100s each, for a total of 200,000 computing hours, according to information published by Stability AI.
At market price, it cost $600,000 to train the model alone, Stability AI CEO Mostak said on Twitter, suggesting that in tweet exchange the price was unusually low compared to the competition. This does not take into account the cost of “inference” or model deployment.
Huang, Nvidia’s CEO, told CNBC’s Cathy Tarasova that the company’s products are actually inexpensive compared to the amount of computing these models need.
“We took what would otherwise be a $1 billion data center with processors and scaled it down to a $100 million data center,” Huang said. “Now $100 million, if you put it in the cloud and share it with 100 companies, it’s next to nothing.”
Huang said Nvidia’s GPUs allow startups to train models at a much lower cost than if they used traditional computer processors.
“Now you could build something like a large language model like GPT for about $10 million to $20 million,” Huang said. “It’s really, really affordable.”
New competition
Nvidia isn’t the only company making GPUs for artificial intelligence. AMD and Intel have competing GPUs, and such large cloud companies Google and Amazon design and deploy proprietary chips specifically designed for artificial intelligence workloads.
However, “AI hardware remains heavily consolidated for NVIDIA,” according to the State of AI compute report. As of December, more than 21,000 open source AI articles said they use Nvidia chips.
Most researchers The State of AI Compute Index used the V100, an Nvidia chip that came out in 2017, but the A100 quickly rose in 2022 to become the third most used Nvidia chip, just behind consumer graphics chips priced at $1,500 or less, originally designed for gaming .
The A100 also has the distinction of being one of the few chips to have export controls due to national defense considerations. Last fall, Nvidia said in an SEC filing that the U.S. government imposed licensing requirements that barred exports of the A100 and H100 to China, Hong Kong and Russia.
“The US government noted that the new license requirement will address the risk that covered products may be used or diverted for a ‘military end use’ or ‘military end user’ in China and Russia,” Nvidia said in a statement. Nvidia previously said it had adapted some of its chips for the Chinese market to comply with US export restrictions.
The biggest competition for A100 may be its successor. The A100 was first introduced in 2020, an eternity ago in chip cycles. The H100, due in 2022, is entering mass production — in fact, Nvidia posted more revenue from its H100 chips in the quarter ended in January than the A100, it said Wednesday, even though the H100 costs more per unit.
Nvidia says the H100 is the first data center GPU optimized for transformers, an increasingly important technique used by many of the latest and greatest AI applications. Nvidia said Wednesday it wants to make artificial intelligence training more than 1 million percent faster. This could mean that eventually AI companies won’t need as many Nvidia chips.