As companies dive headlong into AI model building, they’re spending massive amounts of money on a very new, very complex technology that—at least so far—presents as many questions as it answers. Trying to find the right model among the many currently available can be challenging.
“What we’re seeing is that no one model is sufficient to solve or address all use cases,” said Ankur Mehrotra, GM of SageMaker at AWS. “Customers are finding that one model may be better at building a particular kind of user experiences—let’s say a chat-based application—while another generative AI model may be better at assisting with coding or software development.”
As the executive in charge of a platform that many companies use to build their generative AI models, Mehrotra understands the AI sector as well as anyone. Watch my extended interview with him to hear his thoughts about how companies are strategizing to build better AI models, along with a range of other AI-related topics.
Watch the full interview or jump to select interview highlights below.
AI Models Require Major Compute and Major Support
The long list of companies that have used AWS SageMaker to train and deploy their generative AI models includes AI pioneers like Perplexity AI, Hugging Face, and AI21 Labs. These companies come to AWS (and other top cloud companies) because they need massive compute power to train their AI models. On the AWS platform, an AI model training task gets distributed across a large number of compute instances, which are powered by Nvidia GPUs or AWS’s own silicon, Tranium.
As the model building process has evolved, more professionals are now required to create advanced models. “A few years ago, AI was mostly a data scientist activity, but over the years the number of personas involved in building AI-based solutions has really increased,” Mehrotra said. “We now have machine learning engineers get involved; they became responsible for taking these models and deploying them into production. And then other business stakeholders get involved to help convert a business problem into an ML problem, and then data engineers get involved to help prepare the data.”
This evolution has prompted SageMaker to continually evolve its toolset. “We’re really focused on ‘working backwards’ from our customers – understanding the need and building the right tool for the right job and the right persona.”
Shifting From Models to Model Systems
There’s a major trend developing in the world of artificial intelligence model building: even as many generative AI models are getting larger and more powerful, there are also plenty of smaller, highly focused models being created. Companies are thinking less about a one-size-fits-all model and more about niche business scenarios.
“When I talk to customers, what I hear is that they now foresee having to use multiple models,” Mehrotra said. “Some may be task-specific and others are more generalized, working together to achieve their goals. And the ability to do that quickly and safely and securely is very important to them.”
In essence, the development of AI models is turning into the process of building AI model systems.
“For example, one of our customers is deploying a set of different models where one model is responsible for redacting PII from text, then another model is taking that text and summarizing it,” he said. “So we are going to see that customers will think of these systems as model systems and use a combination of different models that are deployed together. We are also seeing trends where customers want the data to be co-located with these models [to help] create these model systems they’re using in production.”
(These comments have been edited for length and clarity.)
For more information about generative AI providers, read our in-depth guide: Generative AI Companies: Top 20 Leaders