Book Review: “Prompt Engineering for Generative Al” by James Phoenix and Mike Taylor

Ever since ChatGPT was released to the public in November 2022, there has been quite a chatter on its implications for society, business and specialised professions. Naturally the GAI tech has been evolving very rapidly and 2 years is a lifetime! From a practitioner’s perspective, prompt engineering seems to be the right starting point to understand both the building blocks and capabilities of this fundamental technology. In this context, the book “Prompt Engineering for Generative Al” written by James Phoenix and Mike Taylor and O’Reilly as publishers is worth looking at

I picked up the book last year, but could only finish it last week. The book has the right amount of details for prompt engineers – goes into dev environments, libraries, dependency resolution, github repos and the like. Nevertheless from a product manager’s lens, you can just skim through lot of these details and still form a basic view of GAI and its building blocks

Following are my key takeaways. YMMV

In AI where the responses are non-deterministic – cost and latency are real factors again. After decades of Moore’s law making us complacent in expecting real-time computation at negligible cost, we are forced to consider cost and latency as limitations!
Prompt engineering is the process of discovering prompts that reliably yield useful or desired results. A prompt is the input you provide, typically text, when interfacing with an AI model like ChatGPT or Midjourney
LLMs are trained on essentially the entire text of the internet, and are then further fine-tuned to give helpful responses
There are basic 5 principles of prompting

Give Direction
- Describe the desired style in detail, or reference a relevant persona
Specify Format
- Define what rules to follow, and the required structure of the response. Specificity helps here because these models are capable of returning a response in almost any format
Provide Examples
- Insert a diverse set of test cases where the task was done correctly. If you’re providing zero examples, you’re asking for a lot without giving much in return. It also helps in domains wherein you are not a subject matter expert!
Evaluate Quality
- Identify errors and rate responses, testing what drives performance. Evals (evaluations) are a standardized set of questions with predefined answers or grading criteria that are used to test performance across models
Divide Labor
- Split tasks into multiple steps, chained together for complex goals

If you are struggling with what prompts to start with, a technique called pre-warming may help. This is akin to starting the conversation with ChatGPT asking for best practice advice, then asking it to follow its own advice!
- Using an AI model to generate a prompt for an AI model is meta prompting, and it works because LLMs are human-level prompt engineers
There’s a trade-off between reliability and creativity. The more examples you provide, and the lesser the diversity between them, the more constrained the response will be to match your examples
Iterating on and testing prompts can lead to radical decrease in the length of the prompt and therefore the cost and latency of your system
One can measure hallucinations by invention of new terms not included in the prompt’s context!
The real unlock in learning to work professionally with AI versus just playing around with prompting is realizing that every part of the system can be broken down into a series of iterative steps
In natural language processing (NLP) and LLMs, the fundamental linguistic unit is a token. Tokens can represent sentences, words, or even subwords such as a set of characters
Tokenization, the process of breaking down text into tokens, is a crucial step in preparing data for NLP tasks. Several methods can be used for tokenization, including Byte-Pair Encoding (BPE), WordPiece, and SentencePiece
- BPE is commonly used due to its efficiency in handling a wide range of vocabulary while keeping the number of tokens manageable.
The real breakthrough of transformer architectures was the concept of attention. Attention transformed this by allowing models to directly relate distant words to one another irrespective of their positions in the text
Transformer architecture helps in capturing both structure i.e. context and meaning i.e. semantics of word vectors. Ability to comprehend nuanced contextual meaning of words is the fundamental feature of transformers
Standard practices for text generation with ChatGPT
- Generating list (e.g. generate a list of cryptography innovations in the last 10 years which have helped in securing financial transactions over the Internet!)
- Use of words like “hierarchical” and “incredibly detailed” depending on the desired output
- Diverse output format generation like XML, JSON and Mermaid
- Use phrases like “explain it like I am five!” Cannot go wrong with this one. Also a very cliched product interview question!
- In agent-based systems like GPT-4, the ability to ask for more context and provide a finalized answer is crucial for making well-informed decisions
- Role prompting is a technique in which the AI is given a specific role or character to assume while generating a response. However, be careful with the quality of response!
  - Role prompting is used when one wants to elicit specific expertise, tailor response style or entourage creative responses
Techniques to avoid hallucinations (i.e. where the AI model makes something up)
- Instruct the model to answer using only the reference text OR to incorporate references from a given text in its response
- Ask the LLM to generate an inner monologue i.e. structure parts of the output
- Ask the LLM to critique the generated response. Self-evaluation is used in context other than your annual performance cycles!
- Provide your LLM with a small number of examples. This strategy can significantly influence the structure of your output format and enhance the overall classification accuracy
- Use a vector database for storing text data in a way that enables querying based on similarity or semantic meaning
- Look up only the most relevant records to pass into the prompt at runtime in order to provide the most relevant context to form a response. This practice is typically referred to as RAG (retrieval augmented generation). it’s commonly used in scenarios which are open-ended e.g. user talking to a chatbot without providing enough context or asking it something which is NOT in its training data!
LangChain is a versatile framework that enables the creation of applications utilizing LLMs and is available as both a Python and a TypeScript package
- Agent toolkits are a LangChain integration that provides multiple tools and chains together, allowing you to quickly automate tasks
- For developers and prompt engineers, understanding and harnessing short term memory can significantly elevate the user experience, fostering engagements that are meaningful, efficient, and humanlike. There are specific storage DB to retrieve long and short term memory for agents
Even as the token context window limit (i.e. maximum number of token that an LLM can process within a single request) within LLMs continues to increase, providing a specific number of k-shot examples helps you minimize API costs
- If the tasks are complex and the performance of the model with few-shot learning is not satisfactory, you might need to consider fine-tuning your model
Task decomposition is a crucial strategy for you to tap into the full potential of LLMs. By dissecting complex problems into simpler, manageable tasks, you can leverage the problem-solving abilities of these models more effectively and efficiently
Using chains gives you the ability to use different models. For example, using a smart model for the ideation and a cheap model for the generation usually gives optimal results. This also means you can have fine-tuned models on each step
Chain-of-thought reasoning (CoT) is a method of guiding LLMs through a series of steps or logical connections to reach a conclusion or solve a problem. This approach is particularly useful for tasks that require a deeper understanding of context or multiple factors to consider. Use the phrase “step-by-step”!
If your task entails a definitive action such as a simple search or data extraction, OpenAI functions are an ideal choice
If you require executions involving multiple sequential tool usage and deeper introspection of previous actions, ReAct (Reason and Act) comes into play
The central premise of Tree of Thought is to enable exploration across coherent text chunks, termed thoughts. These thoughts represent stages in problem-solving, facilitating the language model to undertake a more deliberate decision-making process
- By providing models the capacity to think, backtrack, and strategize, ToT is redefining the boundaries of AI problem-solving
Introduced in 2015, diffusion models (e.g. DALL-E) are a class of generative models that have shown spectacular results for generating images from text
- Diffusion models are trained by many steps of adding random noise to an image and then predicting how to reverse the diffusion process by denoising (removing noise)
- The prompt input by the user is first encoded into vectors; the diffusion model then generates an image matching these vectors, before the resulting image is decoded back into pixels for the user.
For image generation, each model and method has its own quirks and behaviors depending on its architecture, training method, and the data on which it was trained
Another image generation tool is Midjourney which has a distinctive community focus (Discord based)
DALL-E 3 is great at composition, and the integration with ChatGPT is convenient. Midjourney still has the best aesthetics, both for fantasy and photorealism. Stable Diffusion being open source makes it the most flexible and extendable model, and is what most AI businesses build their use cases on
Best practices for quick and desirable image generation
- Evoke an artist name, or use qualifiers like very beautiful, 4K, trending on artstation
- Use negative prompts e.g. –no cartoon
- Use weighted terms
- If you are struggling with something appearing in the image that you don’t want, try stronger negative weights instead of negative prompts.
- The quickest and easiest way to get the image you desire is to upload an image that you want to emulate.
- Use techniques like inpainting or outpainting (fancy words for describing which areas of an image you want GPT to focus on)
- Cut out redundant words in the prompt. You may not know which word is impacting the output negatively!
Stable Diffusion is an open source image generation model, so you can run it locally on your computer for free, if you have an NVIDIA or AMD GPU, or Apple Silicon, which powers the M1, M2, or M3 Macs
- Heavy users of Stable Diffusion typically recommend the AUTOMATIC1111 (pronounced “automatic eleven eleven”) web user interface, because it is feature-rich and comes with multiple extensions built by Stable Diffusion power users
- ControlNet is an advanced way of conditioning input images for image generation models like Stable Diffusion
Prompt editing is an advanced technique that gets deep into the actual workings of the diffusion model. Interfering with what layers respond to what concepts can lead to very creative results if you know what you’re doing and are willing to undergo enough trial and error

All in all, a great book. Since the topic is complex and unironically contextual, this summary may be confusing in parts. I’d suggest you buy and read the book instead! Can be ordered from Amazon!

Share this:

Related

Leave a comment Cancel reply