Foundation models are a class of machine learning model that are impressively flexible. They consume a target media (e.g. text or images) along with surrounding text. After training, they are often used for “generative” tasks, where a user supplies an input prompt (e.g. “A photo of an astronaut riding a horse”) and the model outputs a matching result.
Fig 1. Image produced by Stable Diffusion given the prompt “a photo of an astronaut riding a horse.”
These models have already been trained on many media types (see Fig 2), and advancements are being announced every week.
Fig 2. A sample of AI papers by media type
The creative production process
The experience of working with one of these models is exciting- especially if you’ve ever struggled against the time and cost constraints associated with making something both creative and complex (like a movie or video game). Let’s examine what this process usually looks like using an example: a video game creative director designing a boss interaction.
Our creative director can visualize something that doesn’t yet exist- in this case a video game boss battle. At first, the details are fuzzy, but broad strokes are taking shape.
They’ll write some notes and sketch some pictures. They may describe the origin and motivation of the boss character and they may sketch out the rough layout of the battle stage.
They’ll pull together a team of experts. Character artists, composers, and level designers will all need to absorb the vision and generate drafts of actual content like character portraits and unskinned levels. It will take many long iterations over weeks or months to pull the concept together. Working through all the details can be exhausting, but this is what it takes to build something both creative and complex.
Intention is all you need
In a generative task, a foundation model takes the role of the technical domain experts in our example. You, the user, are the creative director. It is your imagination and creative intention that drive the process.
The exciting part is that this interaction can take place in an hour or two instead of over weeks, and the model can produce hundreds of options instead of a handful. You can explore the creative space in real time, by tweaking your prompts or supplying additional training context to the models. You can iteratively seek a concrete representation of your imagination.
Furthermore, once you’ve hit on the right combination (for example, the specific prompt needed to produce an enemy species of centipede in your video game), you can automatically generate large numbers of high-quality variations to make use of down the road. You can assemble your centipede army in an afternoon.
This combination of generative AI and automation promise to drastically cut the time and money needed to run creative production processes. We expect a future where small indie teams are producing movies and video games faster and with higher quality than today’s massive teams. This is a radical democratization of the creative process which will allow us to feed our insatiable hunger for interesting, engaging, and entertaining digital content. It should also allow for a broader range of voices to be heard.
Opportunities and obstacles
The field is moving rapidly, and there are a growing number of opportunities for entrepreneurs in this space.
Applications must be built for every creative domain. General solutions will appear first, but they won’t address the many details, complexities, and stakeholders present in each industry. Finding great product market fit requires a deep understanding of the day-to-day needs and constraints of users.
A movie or a videogame is much more than just a collection of images and sounds. Not only is there nuance in exactly how the pieces are arranged, there are also technical realities governing their existing tools and distribution channels. A character model in a game must be rigged so it can be run by the target game engine, for example, and the final executable must meet the requirements of its app store.
Additionally, there is missing infrastructure for the long term use and maintenance of foundation models. Prompt engineering is a nascent domain, characterized by experimentally navigating the nuances of how prompts affect the output of a model. How will applications manage the changes to engineered prompts as new versions of models are released? QA is another broad domain for attention, because generative models can sometimes respond with inappropriate, inaccurate, biased, or disturbing content. How can applications embrace the positives of these methodologies while protecting their users from the downsides? These are questions whose answers will launch brand new businesses.
Launchable
At Madrona Venture Labs, we help entrepreneurs build their businesses from day one. We execute alongside founders, and we are currently busy experimenting with foundation models with the constant goal of finding unique, venture scale businesses.
We’re hosting Launchable: Foundation Models January 23-29th to bring together entrepreneurs, technologists, and leaders in the field. Please apply today and get started building your own AI assisted Creativity company.
We are with our founders from day one, for the long run.