This blog is part III of a series. See part I and part II.
Have you ever embarked on an AI project, confident in its potential, only to be blindsided by unexpected costs or performance issues? Picture this: you've painstakingly crafted an AI proof-of-concept, only to discover that running it would cost you more than the revenue it could ever create. Or perhaps you've developed a complex system meticulously designed for functionality, only to find that it operates at a frustratingly slow pace, deterring user engagement. These scenarios are all too familiar to those developing generative AI systems. Let's start by examining a crucial piece of advice you'll likely hear when discussing GenAI difficulties: "Just pick a different model."On the face of it, this sounds like a “quick fix,” and it's tempting to try - there are a lot of models available now, from APIs hosting large proprietary models to ultra-lightweight open-source LLMs you could run on a laptop. In fact, at the time of publication, available LLMs that are realistically deployable number at least into the high hundreds. Choosing a model requires a fairly nuanced understanding of your project's specific needs and the landscape of available AI models.
Before discussing “alternate” models, let's address the elephant in the room: the major players in the AI field. These behemoths form the foundation of model selection, even as the list of candidates keeps growing. The key is to be able to compare each model you are considering to a well-understood benchmark model in the context of your project, and the large and broadly used contenders (The OpenAI GPT series, Claude, Gemini, etc) can provide that benchmark for you.
As discussed in the reason for using an actual model as a practical benchmark rather than published performance stats is that the success of your LLM work isn’t just a matter of answering a general class of questions. It hinges on factors like cost, latency, and reliability, as well as the variety and type of inputs. Your needs go beyond the theoretical capabilities of a model and demand practical alignment with the requirements of your project. And that’s a lot of variables for a set of published scores to encapsulate.
For example, if your goal is to sift through a JSON list, you don't really need a model fine-tuned for code generation. It may well perform worse for that use. Similarly, if you're crafting a “friendly” chatbot to enhance user interaction with your brand, extensive access to a broad knowledge base might not be your primary concern.
So, how do you go about selecting the right model? Let's break it down into three broad steps:
Start by familiarizing yourself with a "big player" model. Consider this your baseline, your go-to for daily tasks and technical insights. Chances are you use it daily on all sorts of tasks, and this familiarity lays the groundwork for your AI journey.
Next, consider or discuss with your team the realm of open-source LLM runner packages (Llama.cpp, Oobabooga’s text-gen WebUI, and Ollama are some current favorites). Setting up one of these packages on a local machine is very simple nowadays, and it empowers you to experiment with various models effortlessly. This hands-on approach transforms your spare desktop into a versatile testing ground, simplifying the model selection process.
Armed with a robust baseline model you use for diverse applications and a local tool for trying out most new models that come out, you are ready for the fun part: empirical experimentation. Continuously explore different models alongside your baseline, comparing their performance head-to-head. Don’t spend a lot of time on this - experiment with sending a prompt you’re already working on to a few of the new small models as part of the prompt engineering process. Often, these results will be surprising - a particular small model may be unexpectedly great at a particular subtask you have, while another may be aggravatingly tough to work with. But with a testing setup, trying out new models is often a matter of changing a single name in code, and you’ll already have a fairly good grasp on what your well-understood benchmark model could do for your task.
This approach can make it easy to decide when to deploy smaller or local models or even when to use the inexpensive “end” of the larger APIs. Once you find that “ideal” small model that handles part of your pipeline, there are a few more things to consider. As mentioned in the first article, it’s important to compare costs and latency early in your development process. There are a lot of hosting endpoints available for open-source models, and trying to compare them all would be a formidable task. Even so, calculating total costs across multiple possible services may reveal surprising insights, nudging you toward the solution that offers your pipeline the best scalability and efficiency.
The careful reader will notice that this article doesn’t offer “the best models to use in 2024” or “10 hot GenAI trends you ought to be following.” There are a hundred articles with titles like that, but I hope I’ve given you something more useful - a way to arrive at the best solution without devoting much of your professional time to the process or getting lost in a sea of infinite variables. As a side benefit, you may acquire the image of the “crazy AI expert” when you can discuss the strengths and weaknesses of new models, with not much more time expenditure than you were already using on testing prompts.
Cost and effectiveness aren’t the only reasons to use an open-source or local LLM, of course. There are more considerations - for example, leveraging generative models in an environment with sensitive data, where an API you don’t directly control may be inappropriate. Perhaps you need to write a knowledge retrieval system that functions locally on handheld architecture, so you need the smallest and fastest possible embedder. Whatever your reasons, this basic “recipe” is a useful and important competency for your generative AI product development team. If your organization doesn’t have such a function, then my suggestion is to go and become that person. Go forth and become the AI champion your organization needs. Your team will thank you.
To learn more about MVL, read our manifesto and let’s build a company together.
We are with our founders from day one, for the long run.