CHEAT CODES

By Sean Robinson, with Jay Bartot and Keith Rosema • June 24, 2024

Gen AI - The journey from proof of concept to proof of value: part IV

This blog is part IV of a series. See parts I, II and III.

Optimizing Prompt Systems - “Getting it to do what you mean”

In our previous articles, we discussed ways to improve the performance and predictability of GenAI systems. Among these were prompt chaining. Prompt Chaining is a popular technique discussed at length in many places online. You may be familiar with the idea that model performance suffers when prompts become too complex or multifaceted, leading to undesirable output.

Prompt chaining, as a remedy to this often frustrating problem, becomes an essential technique to maximize the performance and output from your models. It’s an important tool in the GenAI/Data Science toolbox and leverages GenAI's strengths while mitigating many of its limitations. But as we’ll see, this technique is more than just breaking up prompts in a way that maximizes human readability. Let’s dive into how this technique works and why it’s so important.

Prompt chaining involves breaking down larger complex tasks handled by a single prompt into smaller, more manageable sub-tasks, each handled by a separate prompt. The smaller prompts are then “chained” together, usually by feeding the output of one prompt into the input of the next prompt.

Understanding The Mechanics of Prompt Chaining


Let’s illustrate the concept of prompt chaining with a concrete example. Here’s a scenario you may have encountered: Say you prompt a model to perform text extraction and structuring, but the model has difficulty with the complexity of the prompt, as in this actual case using ChatGPT (shortened just a bit for readability):

User Input: Read the following document and do these steps: 

  • Determine where the following topics are either explicitly mentioned or alluded to: investment, running, pets. 
  • Find the sentence from the document most typical of each mention. 
  • Structure the output as a json object, with the keys investment_1, ...etc 
  • The corresponding values should be the sentences most associated with a mention of the topic. 

The document follows: 
Mount Rainier is perhaps the single most impressive mountain in the 48 contiguous United States. It ranks fifth in height, a [………] in splendid isolation, with only 40 miles separating sea level at Puget Sound from its glacier-clad summit. No other peak nearby even remotely challenges its supremacy.

AI Result: {"investment_1": "No other peak nearby even remotely challenges its supremacy."}.

This seems like a clear “miss” from the perspective of prompt engineering. The sentence that was returned doesn’t have anything to do with investments!  Since this was done on the Web/Chatbot-style interface of ChatGPT, it’s tempting to go ahead and ask the model what went wrong using a second prompt:

User Response: How is the statement you output related to investment?

AI Result: It appears there was a misunderstanding. The statement provided does not relate to investment. Let's re-evaluate the document and look for mentions of investment, running, and pets more accurately. Here is the revised output: {}

The model seems to immediately understand and correct the issue, leaving you to wonder, “Why couldn’t it do this task in the first place?”   The common answer to this is that when a desired operation takes several logical steps or involves a lot of work, it often leads to lower accuracy and a more difficult interpretation of the results.  Our example is fairly simple but still combines a few distinct pieces - information extraction and structuring (into a JSON object).  Let’s see what happens if we separate those two stages such that only one type of action is done at a time:

Prompt 1: Read the following document and determine where the following topics are either explicitly mentioned or alluded to: investment, running, pets. 

The document follows: Mount Rainier is perhaps the single most …

AI Response 1: In the provided document, the topics of "investment," "running," and "pets" are not explicitly mentioned or alluded to. The document focuses solely on describing Mount Rainier and its geographical features, such as its height, volume, glaciation, and its isolation from other peaks. Therefore, these topics are not addressed in any form within the text.

Prompt 2: Read the following document summary and do these steps: - If the document had sentences related to investment, running, of pets, then organize those into a json object, with the keys investment_1, ...etc, and the values corresponding to each sentence that referenced those topics. If a topic was not mentioned, leave those elements out. Document Summary: In the provided summary the topics of ……

AI Response 2: {}

A short prompt chain, with a very similar total token count to the single-prompt attempt, gave us what we wanted! But it is important to note that this isn’t just a matter of asking for less work from each prompt - it’s possible (for instance) to ask for a high number of extracted concepts and get a good result. Our fairly simple example needed a chain of prompts.  This is because different “types” of work can present specific challenges for models to accomplish from a single prompt [1]. By dissecting the task, we can pair related functions effectively.  The following breakdown offers a good rule of thumb on how to think about this:

  • Extrapolation, selection, extraction, or elaboration in one prompt.
  • Summarization or structuring in another.
  • Result-checking or selection of chosen results in a third prompt.


This sort of separation can focus the model’s attention on one specific task at a time, ensuring more accurate and coherent outputs.

Practical Steps and Results

Let’s walk through a  process framework step-by-step:

  • Write Out the Entire Task: Begin by fully articulating the entire task. For instance, if the task is to analyze a document for sports-related content, you might specify: "Read through this document, find all paragraphs related to sports, return a JSON list of the sports mentioned, and count their occurrences." This is probably similar to how you’ve already tried to handle the task with a single prompt.
  • Segment the Task: Divide the task into three distinct phases based on the sort of work you need to do. Expansive phases, which require creativity and critical thinking,  reductive phases, which focus on structuring and organizing the information, and review phases, which validate that an answer is correct. Here, the expansive phase might identify relevant sports content, the reductive phase may structure this information into a JSON format, and a review phase would ensure the outputs match the initial text.
  • Create Distinct Prompts for Each Phase: Develop specific prompts tailored to each phase. Each prompt should be designed to guide the process in its phase effectively, maintaining focus and direction.
  • Test Individual Prompts: Before integrating the phases, test each prompt independently to ensure it functions correctly. This step is crucial to identify and resolve any issues within each phase, ensuring they perform as expected when combined.
  • Use Procedural Logic where possible: For repetitive or procedural elements of the task, apply simple code or logic. For example, if part of your workflow involved counting occurrences or arranging text in alphabetical order, these could be streamlined by using straightforward procedural code between LLM results. Compartmentalizing the workflow like this can enhance accuracy in handling routine tasks while avoiding areas where LLMs are known to have weaknesses.
  • Integrate and Evaluate: Once the individual prompts are refined and tested, integrate them into a cohesive workflow. Evaluate the combined process to ensure it works seamlessly. If any issues arise, “follow” the prompts through the system and evaluate which piece needs improvement.

The Benefits of Prompt Chaining

Implementing prompt chaining offers several advantages:

  • Improved Accuracy: By aligning each prompt with a specific task type, we reduce the cognitive load on the AI, resulting in more precise outputs.
  • Cost Efficiency: More efficient prompts mean reduced processing time and lower costs, especially when using high-end models (e.g., GPT series). While it’s tempting to think in terms of a single-prompt input and output token cost, it is routinely the case that a cheaper model can be used to provide at least some of the prompts in the chain.  Often (as in the case of the GPT series), this can lead to significant cost and time savings.
  • Scalability: This method is scalable across various applications, from data analysis to creative writing.

Prompt Chaining as a Valuable Tool for Prompt Engineering

As AI evolves, techniques like prompt chaining will become increasingly standardized. For now, this is just another (albeit important) tool, in the same way that loops, objects, and other code structures form the building blocks of effective software engineering.  

There are other applications of and approaches to prompt chaining. Another major use of the technique is breaking up long-workflow tasks (e.g., “respond to a customer service ticket” into many smaller well-defined tasks (“further expand the idea, research the issue, resolve each piece, etc..”).  While this sort of semi-open-ended chaining is also useful, the issue of separating work by functional “stages” is often overlooked and can be a powerful way to enhance overall system performance.  

You may be asking, how do I get started?   There is no better than simply starting to apply prompt chaining to your projects, experimenting with different prompt segmentations, and observing the improvements as they apply to your real-world problems. Share your findings with the community where you can, and learn from others.  

In the next article, we’ll discuss AI Multi-Agent systems, a related technology that could be thought of as a way to let the LLM define its own prompt chains using even more flexible logic.

[1]: This may be due to the stress these prompts place on the attention mechanisms.  Attention is an important but fairly complex and multi-faceted concept that is applied in several ways in modern large language models. Earlier encoder/decoder models had difficulty taking in a long input set and utilizing that entire set to affect a similarly long output window.  The attention mechanism enhances models by allowing the decoder to flexibly determine how relevant each part of the input sequence is to each part of the output through a weighted combination of all encoded input vectors, giving the highest weights to the most pertinent ones. This has allowed things like language translation to be much more effective compared to earlier approaches.  However, this also means that conceptually different tasks will lend themselves to different “uses” of the attention mechanisms present in the modern LLM.  For example, it’s likely that text summarization is similar to translation, where one area of the body-text input (plus the summarization request) is closely related to a specific corresponding area of the output.  In fact, modifications to the attention mechanism have been proposed to enhance this particular task.  However, tasks like creative writing will probably require each part of the output to be broadly based on most of the input, which is “at odds” with a more reductive task like summarization or extraction. This may be why seemingly straightforward prompts can sometimes “stump” even fairly complex models. However, while this explanation matches up with our findings in the application, there is still much room for study here, and the authors welcome robust debate around this point.

To learn more about MVL, read our manifesto and let’s build a company together.

Recent stories

View more stories

Let’s start a company together

We are with our founders from day one, for the long run.

Start a company with us