This blog is part IV of a series. See parts I, II and III.
In our previous articles, we discussed ways to improve the performance and predictability of GenAI systems. Among these were prompt chaining. Prompt Chaining is a popular technique discussed at length in many places online. You may be familiar with the idea that model performance suffers when prompts become too complex or multifaceted, leading to undesirable output.
Prompt chaining, as a remedy to this often frustrating problem, becomes an essential technique to maximize the performance and output from your models. It’s an important tool in the GenAI/Data Science toolbox and leverages GenAI's strengths while mitigating many of its limitations. But as we’ll see, this technique is more than just breaking up prompts in a way that maximizes human readability. Let’s dive into how this technique works and why it’s so important.
Prompt chaining involves breaking down larger complex tasks handled by a single prompt into smaller, more manageable sub-tasks, each handled by a separate prompt. The smaller prompts are then “chained” together, usually by feeding the output of one prompt into the input of the next prompt.
Let’s illustrate the concept of prompt chaining with a concrete example. Here’s a scenario you may have encountered: Say you prompt a model to perform text extraction and structuring, but the model has difficulty with the complexity of the prompt, as in this actual case using ChatGPT (shortened just a bit for readability):
User Input: Read the following document and do these steps:
The document follows:
Mount Rainier is perhaps the single most impressive mountain in the 48 contiguous United States. It ranks fifth in height, a [………] in splendid isolation, with only 40 miles separating sea level at Puget Sound from its glacier-clad summit. No other peak nearby even remotely challenges its supremacy.
AI Result: {"investment_1": "No other peak nearby even remotely challenges its supremacy."}.
This seems like a clear “miss” from the perspective of prompt engineering. The sentence that was returned doesn’t have anything to do with investments! Since this was done on the Web/Chatbot-style interface of ChatGPT, it’s tempting to go ahead and ask the model what went wrong using a second prompt:
User Response: How is the statement you output related to investment?
AI Result: It appears there was a misunderstanding. The statement provided does not relate to investment. Let's re-evaluate the document and look for mentions of investment, running, and pets more accurately. Here is the revised output: {}
The model seems to immediately understand and correct the issue, leaving you to wonder, “Why couldn’t it do this task in the first place?” The common answer to this is that when a desired operation takes several logical steps or involves a lot of work, it often leads to lower accuracy and a more difficult interpretation of the results. Our example is fairly simple but still combines a few distinct pieces - information extraction and structuring (into a JSON object). Let’s see what happens if we separate those two stages such that only one type of action is done at a time:
Prompt 1: Read the following document and determine where the following topics are either explicitly mentioned or alluded to: investment, running, pets.
The document follows: Mount Rainier is perhaps the single most …
AI Response 1: In the provided document, the topics of "investment," "running," and "pets" are not explicitly mentioned or alluded to. The document focuses solely on describing Mount Rainier and its geographical features, such as its height, volume, glaciation, and its isolation from other peaks. Therefore, these topics are not addressed in any form within the text.
Prompt 2: Read the following document summary and do these steps: - If the document had sentences related to investment, running, of pets, then organize those into a json object, with the keys investment_1, ...etc, and the values corresponding to each sentence that referenced those topics. If a topic was not mentioned, leave those elements out. Document Summary: In the provided summary the topics of ……
AI Response 2: {}
A short prompt chain, with a very similar total token count to the single-prompt attempt, gave us what we wanted! But it is important to note that this isn’t just a matter of asking for less work from each prompt - it’s possible (for instance) to ask for a high number of extracted concepts and get a good result. Our fairly simple example needed a chain of prompts. This is because different “types” of work can present specific challenges for models to accomplish from a single prompt [1]. By dissecting the task, we can pair related functions effectively. The following breakdown offers a good rule of thumb on how to think about this:
This sort of separation can focus the model’s attention on one specific task at a time, ensuring more accurate and coherent outputs.
Let’s walk through a process framework step-by-step:
Implementing prompt chaining offers several advantages:
As AI evolves, techniques like prompt chaining will become increasingly standardized. For now, this is just another (albeit important) tool, in the same way that loops, objects, and other code structures form the building blocks of effective software engineering.
There are other applications of and approaches to prompt chaining. Another major use of the technique is breaking up long-workflow tasks (e.g., “respond to a customer service ticket” into many smaller well-defined tasks (“further expand the idea, research the issue, resolve each piece, etc..”). While this sort of semi-open-ended chaining is also useful, the issue of separating work by functional “stages” is often overlooked and can be a powerful way to enhance overall system performance.
You may be asking, how do I get started? There is no better than simply starting to apply prompt chaining to your projects, experimenting with different prompt segmentations, and observing the improvements as they apply to your real-world problems. Share your findings with the community where you can, and learn from others.
In the next article, we’ll discuss AI Multi-Agent systems, a related technology that could be thought of as a way to let the LLM define its own prompt chains using even more flexible logic.
[1]: This may be due to the stress these prompts place on the attention mechanisms. Attention is an important but fairly complex and multi-faceted concept that is applied in several ways in modern large language models. Earlier encoder/decoder models had difficulty taking in a long input set and utilizing that entire set to affect a similarly long output window. The attention mechanism enhances models by allowing the decoder to flexibly determine how relevant each part of the input sequence is to each part of the output through a weighted combination of all encoded input vectors, giving the highest weights to the most pertinent ones. This has allowed things like language translation to be much more effective compared to earlier approaches. However, this also means that conceptually different tasks will lend themselves to different “uses” of the attention mechanisms present in the modern LLM. For example, it’s likely that text summarization is similar to translation, where one area of the body-text input (plus the summarization request) is closely related to a specific corresponding area of the output. In fact, modifications to the attention mechanism have been proposed to enhance this particular task. However, tasks like creative writing will probably require each part of the output to be broadly based on most of the input, which is “at odds” with a more reductive task like summarization or extraction. This may be why seemingly straightforward prompts can sometimes “stump” even fairly complex models. However, while this explanation matches up with our findings in the application, there is still much room for study here, and the authors welcome robust debate around this point.
To learn more about MVL, read our manifesto and let’s build a company together.
We are with our founders from day one, for the long run.