Bigger not always better in AI, boutique models are coming

Bigger not always better in AI, boutique models are coming

Bigger not always better in AI, boutique models are coming PlatoBlockchain Data Intelligence. Vertical Search. Ai.

Interview Companies don’t need to splash millions of dollars to train AI as software improvements and open source models drive down costs.

That’s the pitch from a venture selling its services to businesses that are looking to develop new AI products, but don’t have the resources to create their own proprietary models from scratch. At the moment, they can access models from a wide range of APIs offered by machine learning startups or choose off-the-shelf systems from cloud providers. Now there are other alternatives too, like partnering with a vendor that can help them customize private or open source models.

The latter option is becoming more favorable over time as training and inference costs decrease, and companies want to keep their data private and secure, Naveen Rao, CEO and co-founder of MosaicML, explained to The Register. Rao was former VP and general manager of the AI Products Group at Intel, and led efforts developing an AI chip that was later dropped from Intel’s product line.

Rao has taken his knowledge of machine learning and hardware to a new startup focused on helping enterprises train and run their own generative AI systems at low costs.

MosaicML recently released a series of open source large language models (LLMs) based on its MPT-7B architecture, made up of seven billion parameters. It has a context window stretching to 64,000 tokens, meaning it can process text from hundreds of pages of documents in one go. Unlike most LLMs, such as Meta’s LLaMA model, which can only be used for research purposes, the MPT-7B supports commercial applications.

“There’s definitely a lot of pull for this kind of thing, and we did it for several reasons,” Rao told The Register. “One was we wanted to have a model out there that has permission for commercial use. We don’t want to stifle that kind of innovation.

“We also showed it as a demonstration of how much this costs. If a customer came to us and said train this model, we can do it for $200,000 and we still make money on that. So I think what’s important here is that this is a real business number – it’s not in the order of tens or millions of dollars.”

MosaicML claimed it offers more powerful models than MPT-7B in-house, and can help businesses develop their own private models that can be hosted on various cloud platforms or fine-tune open source ones. Their data is not shared with the startup, and they own the model’s weights and its IP, Rao said.

“Commercial APIs are a great prototyping tool. I think with ChatGPT-type services people will use them for entertainment, and maybe some personal stuff, but not for companies. Data is a very important moat for companies. Companies want to protect their data, right? If anything, they want to do it more now that you can actually activate that data with large models. That wasn’t necessarily true five years ago, but now it really is. The value of that data actually went up,” Rao said.

Hardware failures and the GPU crunch

MosaicML has built software tools to train and run AI models more efficiently to keep costs low. Rao said low-level software improvements to optimize communication between GPUs allows the company to squeeze as much computing power as possible from chips, and make the training process run more smoothly.

“GPUs actually fail quite often,” he said. “If you’re training on, let’s say, 1,000 GPUs, and let’s say you’re paying $2 per hour per GPU, you’re burning $2,000 per hour. If a node goes down, and there’s a manual intervention required, it took you five hours [to fix], you’ve just burned $10,000 with no work, right? These are the magnitude of things. Automating that whole process from five hours of manual intervention to 15 minutes of automatic resumption saves you a ton of money.”

MosaicML, for example, trained MPT-7B in 9.5 days and suffered four hardware failures during the process. Training large language models is difficult, and requires careful orchestration. The data has to be processed by a cluster of chips in sync, and the model’s weights are updated until its performance plateaus. Training runs unexpectedly crash, and developers often have to restart the process.

“Sometimes it just blows up. It almost looks like a node failure. You have to back everything up and sort of restart it. I think the packaging of memory along with the chip [is complex]. When there’s a lot of heat in the systems, different kinds of failures start to manifest. You get these slowdowns of throughput and then sometimes they just die,” Rao told us.

By avoiding these problems, companies can afford to build their own AI models. Rao said MosaicML allows them to get a taste of machine learning by working with a smaller model. OctoML, a startup focused on deploying models into production, have built their own custom multi-modal system based on one MosaicML’s MPT-7B Instruct language model.

The system dubbed InkyMM is also open source, and allows developers using OctoML’s platform to experiment and build an application quickly. Companies can use these tools to find a market fit for their product without having to fork out a massive upfront investment that might not be worth it in the end, OctoML co-founder and CEO Luis Ceze told The Register.

There are costs to keep AI up and running that businesses must consider too. “The economics have to be favorable. It really comes down to how optimized you can make it. Every time you type into ChatGPT, it’s doing an inference call and it’s spitting out words. Each one of those is basically running on a web server of eight GPUs. That server costs $150,000, approximately, to build. So there’s a real hard cost to this,” Rao said.

“Really optimizing that stack, hacking multiple requests together and utilizing the hardware effectively is the name of the game. It’s much more about making it super efficient so that you’re not unnecessarily wasting GPU time on a smaller scale per request.”

There’s another benefit to companies training or fine-tuning custom models on their own private data too. They can control what these systems ingest, and shape their behaviors. Fine-tuning a pre-trained system on smaller specialized datasets and careful prompting can boost its accuracy, making them less prone to generating false facts.

“This idea of ultimate customization is something that we can now enable because the costs have come down to a point where you can actually make this true,” Rao said.

MosaicML has since launched its larger MPT-30B open source model with 30 billion parameters. The company says it’s the first publicly known large language model trained using Nvidia’s H100 GPUs. It reportedly cost $570,000 to train MPT-30B using 512 of Nvidia’s latest chips over 11.6 days. The price goes up to $700,000 and takes 28.3 days with Nvidia’s previous-generation A100 chips to train MPT-30B to the same precision.

Although startups like MosaicML are helping to drive down training and inference costs, there is another problem they can’t directly solve: chip shortages. Right now, it’s difficult for developers to secure the compute needed to build or run their models. They have to request time upfront to rent resources, and can end up waiting months. “We’re going to live in this world of GPU shortages for at least two years, maybe five. That’s kind of my estimate,” Rao said. ®

Time Stamp:

More from The Register