Cutting Through The Hype On AI Servers
AI has been studied for decades, and generative AI has been used in chatbots as early as the 1960s. However, the release on November 30, 2022, of the ChatGPT chatbot and virtual assistant took the IT world by storm, making GenAI a household term and starting off a stampede to develop AI-related hardware and software.
One area where the general AI and GenAI push is starting to get strong is in AI servers. AI servers are defined by analyst firm IDC as servers that run software platforms dedicated to AI application development, applications aimed primarily at executing AI models, and/or traditional applications that have some AI functionality.
IDC in May estimated that AI Servers accounted for about 23 percent of the total market in 2023, a share that will continue to grow going forward. IDC also forecasts that AI server revenue will reach $49.1 billion by 2027 on the assumption that GPU-accelerated server revenue will grow faster than revenue for other accelerators.
[Related: Accenture Places $3B AI Bet Following Multiple Acquisitions]
The difference between AI servers and general-purpose servers is not always so clear, according to vendors and sellers.
When many people talk about AI servers, especially with the boom of GenAI, are GPU-rich systems, and especially when it comes to systems typically designed for training and fine-tuning models, said Robert Daigle, director of Lenovo’s global AI business.
“[But] there’s also a lot of general-purpose servers that are used for AI workloads,” Daigle told CRN. “And as you get out of generative AI, and even out of deep learning and into traditional machine learning, a lot of the machine learning workloads still run on the CPU.”
Dominic Daninger, vice president of engineering at Nor-Tech, a Burnsville, Minn.-based custom system builder and premier-level Nvidia channel partner which both builds AI servers and sells other manufacturers’ models, told CRN that there are basically two types of AI servers, those aimed at training and, once the training is done, those aimed at inferencing.
AI servers don’t necessarily require GPUs to run, but they provide much better performance than CPUs do, Daninger said.
At the same time, he said, it is also important to note that not every server with GPUs is AI-focused. Workloads such as simulation models or liquid flow dynamics are done using GPUs without AI.
AI Servers Or Not?
The line between AI servers and non-AI servers can be tricky, and depends on workload, said Michael McNerney, senior vice president at San Jose, Calif.-based Supermicro.
“I think we have eight different major segments everywhere from LLM large-scale training all the way down to edge inference servers which are going to be pole-mounted or wall-mounted boxes on a factory floor,” McNerney told CRN. “We really see AI almost become sort of a feature of the systems, especially as you get down to the edge where those boxes get used for different things based on their configurations. Every server can become an API server at some point depending on the kind of the workload it’s running.”
AI is the dominant workload on GPU-based servers, particularly on those with the highest configurations which are typically used for LLM or large-scale inference, while midrange rackmount configurations handle a majority of inference workloads, McNerney said.
Lenovo has about 80 server platforms certified as AI-ready for both GenAI and the broad spectrum of AI, Daigle said.
“We’ve done things like increase our GPU and accelerator support across those product lines and run benchmarks on them such as MLPerf so customers can see the performance of those systems and how we’ve improved performance and empower AI workloads,” he said. “And then there’s the software stack that we enable to run on those. We have over 60 AI companies in our independent software vendor ecosystem. That allows us to enable over 165 enterprise-grade AI solutions.”
Going forward, there will continue to be a delineation between AI servers and general-purpose servers, Daigle said.
“There’s still a lot of traditional workloads that customers have to support in their IT environment, in addition to adding AI-enabled infrastructure,” he said. “So I think we’ll continue to see systems designed for those traditional IT workloads in addition to its expansion into AI.”
Looking ahead, Daninger said he expects Intel and AMD will invest in AI-focused technology, but will find it hard to catch up with Nvidia.
“One of the things we’ve learned is, Nvidia has put so much work put into CUDA and the various libraries needed to really implement AI,” he said. “Plus Nvidia has made huge gains in the hardware end of things. Companies like Intel or AMD will have to move fast to beat Nvidia on the hardware end of things, but another holdback is it will take many years to develop all the code to utilize these things. Nvidia has a long lead on that.”
McNerney said that with large AI workloads, clusters of AI servers are important, which will lead to increased use of liquid cooling.
“We think we’re going to go from less than 1 percent of deployments using liquid cooling to up to 30 percent in that large scale cluster space just because of the efficiency, the performance, and the cost savings,” he said.