This past January 24, Madrona Venture Lab’s Launchable: Foundation Models hosted a speaker panel titled “What’s on the Horizon”, presented by Perkins Coie. Moderated by Jon Turow from Madrona, panel speakers Irina Rish, Ludwig Schmidt, and Saurabh Baji spoke about the future of foundational models, open source AI, and tips for builders at the Launchable event.
Rish is a professor in the Computer Science and Operations Research department at the University of Montreal, a core member of the Quebec AI institute, and holds the Canadian AI Chair and the Canadian Excellence Research Chair on autonomous AI. Schmidt is an assistant professor in Computer Science at the University of Washington, as well as a leader of the Lyon Project, and Baji is the Senior Vice President of Engineering at Cohere.
Turow opened the group questions by asking about how to turn ideas into reality, and how a founder with an idea could evaluate whether any idea is realistic to build on foundation models now, in the future, or at all possible.
The panelists agreed that if the idea hasn’t yet appeared in research papers, the idea is more likely to be a science project than a near-term reality. Schmidt added that some of the likelihood is dependent on the technological expertise of a team, while Rish countered that it could depend highly on the existing code base and whether it could be scaled to work with the idea in mind. Baji recommended trying out ideas, since access to foundational models has never been easier and exploring what the models are capable of is the best thing to do.
The panelists were excited at how quickly foundational model technologies are developing and how they seem to change every day. Rish pointed out that these changes are not only happening in proprietary models and APIs but also in open source AI, which now has more compute and more training available. Rish said that “the dynamics [between the open source community and for-profit companies] have started in and interesting direction interesting because open source is applying some peer pressure to companies to release models and interfaces faster.” Agreeing, Turow called out the Helm (“Holistic Evaluation of Language Models’) paper from Stanford, a model comparison exercise showing that proprietary models currently win on accuracy but highlighting a closing gap. While the panelists agreed that open source is catching up to proprietary models, without a clear winner they suggested that people should focus on what they can build and how to make products easier to use. “There are possibilities for other interesting paradigms to emerge as well, in terms of partnership with open source,” Baji said. “The core point I see is while you do have proprietary models having advantage today, you can't just count on that to stay and definitely not for long.”
In response to whether the models will get better with more parameters, the panelists first discussed the large model boom where everyone focused on building larger models, but that will only help if the builder increases the amount of data that they train it on, not just in quality but also diversity. Some small models are beating higher models, such as Chinchilla. Another advantage to making models larger is to make them better but for a fixed amount of compute.
When asked about testing an idea and how to ensure its quality and reliability, Baji said that the builder should work on customer-specific evaluations and use benchmarks for testing, and that they should get enough data to use as a proxy to test and build a prototype to get in users’ hands. He also talked about the importance of context specific data, saying that data from users is the best thing to focus on.
Rish raised the problem of testing models and what to build on top of them. Testing robustness and applicability could be hard, she said, but that interactions between the model and testers could help find where models can be improved and could help collect new data to target potential weaknesses. Rish advised builders to start with simple test cases, then stress test with more diverse models. Builders should also look at downstream tasks, and grow the complexity or number and look at scaling to see performance metrics.
Turow closed with a twofold question, asking the panel what builders should know about the science of foundation models that the community has not adopted yet and also what service provider teams would like to know from the builders.
For builders, Rish thought that the most important things to know are the strengths and limitations of the models that they’ll be using. On the research side, Rish and Baji said that builders should ask where the models failed or were less effective so that they can find places for improvement.
For Turow, clever scientists in the AI space have imagination and are able to discover how AI can work and under what conditions. He mentioned Jasper AI as an example of generating marketing copy and highlighted that extended context is not necessary, that a human marketer will review the content and discard absurdities, but the process involves understanding what AI can do and imagining, dreaming, and daring to push its limitations.
“That’s the art versus the science. I think even the scientists sometimes are surprised by things that suddenly start to work at the moment.” said Schmidt.
We are with our founders from day one, for the long run.