Nvidia makes the case for leaner, smarter language models in AI

Researchers at Nvidia are challenging the norm of relying on huge language models. They suggest that smaller language models (SLMs), those with fewer than 10 billion parameters, deliver impressive performance while saving both money and energy.

The numbers speak for themselves: while the market for large language model APIs hit $5.6 billion in 2024, the cost of maintaining cloud infrastructure soared to $57 billion. In contrast, running a 7-billion-parameter model can cost 10 to 30 times less than using its larger counterparts.

Nvidia highlights examples such as Microsoft’s Phi-2 model, which matches the reasoning and coding abilities of 30-billion-parameter models but operates 15 times faster. Their own Nemotron-H models—topped at 9 billion parameters—offer comparable accuracy with lower computational demands.

The case for SLMs becomes even stronger when you consider that many AI tasks are repetitive and narrowly defined. Nvidia recommends a step-by-step approach—starting with data collection, then task clustering, selecting suitable SLMs, followed by fine-tuning and continuous improvement—to facilitate a smooth transition away from overly large models.

If you’ve ever wrestled with the steep costs and energy use of advanced AI infrastructure, this shift might feel like a breath of fresh air. Nvidia’s research suggests that between 40% and 70% of queries in open-source agents could be managed by SLMs, offering a more sustainable and cost-effective solution.

Even as Nvidia remains a key player in the large language model arena, their push for SLMs shows a commitment to long-term efficiency and environmental care. They’re eager to hear feedback from the community and plan to share some of the responses online.