In the ever-evolving landscape of artificial intelligence (AI) and natural language processing, the significance of prompt engineering cannot be overstated. The art and science of crafting effective prompts hold the key to harnessing the true potential of AI language models, enabling them to generate contextually rich and accurate responses.
Whether you're using AI language models for content generation, chatbots, virtual assistants, or any other application, mastering prompt engineering techniques is essential for achieving your desired outcomes. In this comprehensive guide, we'll explore prompt engineering, strategies, best practices, and expert insights that will empower you to create prompts that resonate with AI models and yield exceptional results.
Join us on this exploration of prompt engineering techniques, where you'll discover the tools and strategies needed to unlock the capabilities of AI language models and craft interactions that leave a lasting impact.
Table of Contents:
What is Prompt Engineering?
Prompt engineering is the process of designing and testing prompts to optimize the performance of a large language model (LLM) in generating accurate and relevant responses. It involves understanding the capabilities and limitations of the LLM, as well as the specific task that the LLM is being asked to perform. Prompt engineering can be used to improve the performance of LLMs on a wide range of tasks, including summarization, translation, and code generation.
Here are some of the key benefits of prompt engineering:
It can help to improve the accuracy and relevance of LLM outputs.
It can help to reduce the amount of training data required to train an LLM.
It can help to make LLMs more robust to different types of input.
It can help to expand the range of tasks that LLMs can be used for.
Prompt Engineering Techniques
Adopting the right techniques can be the key to success. Here are some of the top promoting engineering techniques, along with instructions on how to implement them effectively.
1. Start with Clear Instruction
This prompt engineering technique is used to improve the performance of large language models (LLMs) on a variety of tasks. The technique involves telling the LLM what task you want it to do at the beginning of the prompt, before sharing additional contextual information or examples.
This is because GPT-style LLMs are built in a way that defines how they process the input. When the LLM sees the task first, it is better able to understand the context of the rest of the prompt and generate a more relevant and informative output.
For example, the following prompt uses the "Start with clear instructions" technique:
System message: Your task is to translate the following sentence into Spanish
User: "I love to eat pizza."
This prompt clearly tells the LLM that its task is to translate the sentence into Spanish. The LLM can then use this information to focus its attention on the relevant information in the sentence and generate a more accurate translation.
Here is the LLM's response:
Assistant:
Me encanta comer pizza.
This is a correct translation of the sentence into Spanish.
Here is another example of how to use the "Start with clear instructions" technique for a more complex task:
System Message: Your task is to write a poem about a cat.
User: The poem should be at least 10 lines long and should include the following words:
Fluffy
Whiskers
Purr
Play
Cuddle
This prompt clearly tells the LLM that its task is to write a poem about a cat, and that the poem must include the specified words. The LLM can then use this information to generate a poem that meets all of the requirements.
Here is the LLM's response:
Assistant:
Oh, fluffy little kitty, with your whiskers so fine,
Your purr is like music, your eyes shine so bright.
You love to play and cuddle, and bring us so much joy,
Our furry little friend, our sweet little boy.
We love to watch you chase after a string,
Or curl up in a ball and take a nap.
You're the best companion anyone could ask for,
And we love you more and more.
2. Add Clear Syntax
Using clear syntax in your prompt, including punctuation, headings, and section markers, helps to communicate your intent to the language model and makes the prompt easier to parse. This can lead to more accurate and informative outputs.
In the example you provided, the prompt uses the following syntax:
Separators (---) have been added between different sources of information or steps. This allows the use of --- as a stopping condition for generation.
Section headings or special variables are presented in uppercase to differentiate them.
This syntax helps the language model to understand the different parts of the prompt and to generate a response that is tailored to your specific request.
Here is a breakdown of the syntax used in the example:
System message: You are an AI assistant that helps people find information.
User: You will read a paragraph, and then issue queries to a search engine in order to fact-check it. Also, explain the queries.
---
PARAGRAPH John Smith is married to Lucy Smith. They have five kids, and he works as a software engineer at Microsoft. What search queries should I do to fact-check this?
---
QUERIES
Assistant:
"John Smith Microsoft" - To check if John Smith is indeed employed at Microsoft.
"John Smith Lucy Smith" - To check if John Smith is married to Lucy Smith.
"John Smith children" - To check if John Smith has five children. If you're not sure what syntax to use, consider using Markdown or XML. The models have been trained on a lot of web content in XML and Markdown which may provide better results.
By using clear syntax in your prompts, you can help the language model to understand your intent and to generate more accurate and informative outputs.
3. Repeat the Instructions
Recency bias is a cognitive bias that causes people to give more weight to recent information than to older information. This can be seen in the way that people make decisions, solve problems, and remember information.
Recency bias can also affect large language models (LLMs). When an LLM is given a prompt, it is more likely to focus on the information at the end of the prompt than on the information at the beginning of the prompt. This is because LLMs are trained on large amounts of text data, and they learn to predict the next word in a sequence based on the words that have come before it.
To mitigate the effects of recency bias, it is helpful to use the "repeat the instructions" prompt engineering technique at the end of the prompt. This way, the LLM is more likely to focus on the task that you want it to perform and generate a more accurate response.
Here is an example of how to repeat the instructions at the end of the prompt:
System Message
You are an AI assistant that helps people find information. Your task is to summarize the following text:
"The quick brown fox jumps over the lazy dog."
Assistant: Summarize the text above.
In this example, the instructions are repeated at the end of the prompt. This helps to ensure that the LLM focuses on the task of summarizing the text and that it does not simply copy the text that was provided in the prompt.
This prompt engineering technique is especially helpful for complex tasks. For example, if you are asking the LLM to write a poem or translate a text, repeating the instructions at the end of the prompt can help to ensure that the LLM generates the desired output.
4. Specify the Output Structure
Using your prompt to specify the structure of the output can have a significant impact on the nature and quality of the results.
For example, if you ask a large language model (LLM) to write a poem, you might want to include the following instructions in your prompt:
Write a poem in the style of Shakespeare.
The poem should be 10 lines long.
The poem should include the following words: love, loss, and redemption.
By specifying the structure of the output, you are helping the LLM to generate a poem that meets your specific requirements.
In the context of factual claims, asking for the model response to also include citations can help result in a reduction of the prevalence of incorrect responses. This is because it makes it more difficult for the LLM to fabricate information. If the LLM knows that it will be asked to cite the source of its claims, it is more likely to be careful to only generate claims that are supported by evidence.
Note that the closer the citation is to the text it supports, the shorter the distance ahead the model needs to anticipate the citation, which suggests that inline citations are better mitigations for false content generation than citations at the end of the content. This is because inline citations make it easier for the reader to verify the accuracy of the information.
Similarly, if you ask the model to extract factual statements from a paragraph, it may extract compound statements such as ‘X is doing Y AND Z’ (which may be harder to verify). This can be avoided by specifying an output structure such as (entity1, relationship, entity2). This is because the specified output structure forces the LLM to break down the compound statement into its component parts, which makes it easier to verify the accuracy of each part.
Here is an example of a prompt that uses the prompt engineering techniques described above:
System message: You are a large language model that helps people find information.
Your task is to extract factual statements from the following paragraph and present them in the following structure:
(entity1, relationship, entity2)
PARAGRAPH: John Smith is married to Lucy Smith. They have five kids, and he works as a software engineer at Microsoft.
QUERIES: Extract factual statements from the paragraph and present them in the following structure:
(entity1, relationship, entity2)
Output:
(John Smith, married, Lucy Smith) (John Smith, has, 5 kids) (John Smith, works as, software engineer at Microsoft)
This output is easier to verify than the following output:
John Smith is married to Lucy Smith, they have five kids, and he works as a software engineer at Microsoft.
The specified output structure forces the LLM to break down the compound statement into its component parts, which makes it easier to verify the accuracy of each part.
5. System Message
A system message is a way to provide instructions and context to a large language model (LLM) at the beginning of a prompt. This helps the LLM generate more accurate and relevant responses.
For example, you could use a system message to:
Tell the LLM what kind of personality you want it to have.
Define what the LLM should and shouldn't answer.
Define the format of the LLM's responses.
Here are some examples of system messages:
"Assistant is an AI assistant that helps people find information and responds in rhyme."
"Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Service. Only answer questions using the context below and if you're not sure of an answer, you can say 'I don't know'."
"Assistant is an intelligent chatbot designed to help users answer their tax related questions."
"You are an assistant designed to extract entities from text. Users will paste in a string of text and you will respond with entities you've extracted from the text as a JSON object. Here's an example of your output format:
{
"name": "",
"company": "",
"phone_number": ""
}
System messages can also be used to prime the LLM with few-shot learning examples. This is a way to teach the LLM how to perform a task by providing it with a few examples of the desired output.
For example, you could prime the LLM with a few examples of tax-related questions and answers. This would help the LLM learn how to answer tax-related questions accurately.
Here are some tips for writing effective system messages:
Be clear and concise.
Use simple language that the LLM can understand.
Be specific about what you want the LLM to do.
Provide the LLM with as much context as possible.
Use few-shot learning examples to prime the LLM.
6. Few Shot Learning
Few-shot learning is a type of machine learning where a model is trained to perform a task with only a few examples. This is in contrast to traditional machine learning, where models typically require large amounts of data to train.
Few-shot learning is particularly useful for tasks where it is difficult or expensive to obtain large amounts of labeled data. For example, it can be used to train models to perform tasks such as classifying images, translating languages, and answering questions.
In the context of language models, few-shot learning can be used to adapt a model to a new task by providing it with a few examples of the desired output. For example, you could use a few-shot learning prompt engineering technique to train a model to answer questions about a specific topic or to generate text in a specific style.
One way to use few-shot learning with the Chat Completions API is to provide the model with a series of messages between the User and Assistant. These messages can serve as examples for the model to learn from.
For example, the following prompt uses few-shot learning prompt engineering technique to train an assistant to answer tax-related questions:
System Message
Assistant is an intelligent chatbot designed to help users answer their tax related questions. Instructions
Only answer questions related to taxes.
If you're unsure of an answer, you can say "I don't know" or "I'm not sure" and recommend users go to the IRS website for more information. Few-shot examples User - "When do I need to file my taxes by?" Assistant - "In 2023, you will need to file your taxes by April 18th. The date falls after the usual April 15th deadline because April 15th falls on a Saturday in 2023. For more details, see https://www.irs.gov/filing/individuals/when-to-file "
User - "How can I check the status of my tax refund?"
Assistant - "You can check the status of your tax refund by visiting https://www.irs.gov/refunds "
Once the model is trained on these examples, it should be able to answer tax-related questions accurately, even if it has never seen those questions before.
7. Add temperature and Top_p parameters
The temperature parameter plays a crucial role in determining the output generated by the model. It can be adjusted within the range of 0 to 2. A higher value, such as 0.7, introduces more randomness into the output, leading to divergent responses. Conversely, a lower value, like 0.2, produces more focused and concrete responses.
Let's consider a scenario where we want to generate responses from a chatbot for a creative writing exercise and a legal document.
1. Creative Writing Exercise: Higher Temperature (e.g., 0.7):
If we set the temperature to 0.7, the chatbot's responses will be more imaginative and varied. For example, if asked to describe a magical forest, it might generate responses with colorful descriptions, mythical creatures, and imaginative landscapes. This higher temperature encourages creative and divergent thinking, which is suitable for creative writing.
2. Generating a Legal Document: Lower Temperature (e.g., 0.2):
When generating a legal document, precision and reliability are essential. Setting the temperature to a lower value, like 0.2, ensures that the chatbot produces responses that are focused, concrete, and strictly adhere to legal language and terminology. It minimizes the introduction of unnecessary randomness and ensures that the output is legally accurate.
The Top_probability parameter is another factor that influences the randomness of the model's responses. It controls this randomness differently than the temperature parameter. It is generally recommended to modify only one of these two parameters at a time, rather than adjusting both simultaneously, to achieve the desired output reliably.
Now, let's illustrate the use of the Top_p parameter in scenarios involving news reporting and storytelling.
1. News Reporting: Higher Top_p (e.g., 0.8):
In a news reporting context, where accuracy is paramount, setting a higher Top_p value, such as 0.8, ensures that the chatbot generates responses that are more deterministic and fact-based. This reduces the chances of generating misleading or speculative news articles, as it limits the model to draw information from the most probable sources.
2. Storytelling: Lower Top_p (e.g., 0.3):
For storytelling or creative writing, a lower Top_p value, like 0.3, allows for a bit of unpredictability and creativity while still maintaining some level of coherence. This parameter lets the model introduce twists and turns in the plot, creating an engaging and dynamic narrative that surprises the reader without straying too far from the core storyline.
In both cases, whether it's adjusting the Temperature or the Top_p parameter, the choice depends on the specific requirements of the task. A higher temperature or Top_p value encourages creativity and divergence, while lower values promote precision and reliability.
8. Break down the task
Large language models (LLMs) often perform better when a task is divided into smaller steps. For instance, consider the following paragraph: "John Smith is married to Lucy Smith. They have five kids, and he works as a software engineer at Microsoft." To fact-check this information, the task can be broken down into the following factual claims:
FACTUAL CLAIMS
John Smith is married to Lucy Smith.
They have five kids.
He works as a software engineer at Microsoft.
Now, to fact-check these claims, we can issue specific search queries for each claim using the SEARCH function:
QUERIES
SEARCH("John Smith married Lucy Smith")
SEARCH("John Smith family size")
SEARCH("John Smith Microsoft software engineer")
This approach of breaking down the task into smaller steps allows for more targeted fact-checking and can be especially useful when dealing with larger pieces of text containing numerous factual claims, as it enhances the efficiency and accuracy of the fact-checking process.
9. Utilization of Affordances
At times, we can encourage the model to utilize an affordance rather than solely relying on its own parameters to provide information and answers. For instance, search functionality can serve as an affordance to help safeguard against fabricated responses and to obtain the most current information.
A straightforward way to employ an affordance is to halt the generation process once the affordance calls are generated by the model. Subsequently, we can insert the outcomes back into the prompt. Below is an illustration of a follow-up step that can be taken after performing the SEARCH calls as mentioned previously. Observe how the search results are integrated into the prompt, replacing the previous SEARCH calls.
Here's another simple example illustrating the use of affordances for fact-checking:
System Message: You are an AI assistant that helps people find information. You will read a paragraph, extract factual claims, and then use search engine results to fact-check them.
PARAGRAPH Mary Johnson is an acclaimed author who has published 10 novels, and she won the Nobel Prize for Literature in 2019. What search queries should I do to fact-check this?
FACTUAL CLAIMS
Mary Johnson is an acclaimed author.
Mary Johnson has published 10 novels.
Mary Johnson won the Nobel Prize for Literature in 2019.
SEARCH QUERIES
Mary Johnson acclaimed author
Mary Johnson novels published count
Mary Johnson Nobel Prize for Literature 2019
SNIPPETS Here are snippets of the search results:
[1] Mary Johnson's latest novel received critical acclaim.
[2] Mary Johnson has authored a total of 10 novels in her career.
[3] Nobel Prize in Literature 2019 was awarded to Mary Johnson.
Fact-checking results:
Mary Johnson is an acclaimed author: Confirmed. Snippet [1] mentions Mary Johnson's latest novel received critical acclaim.
Mary Johnson has published 10 novels: Confirmed. Snippet [2] states that Mary Johnson has authored a total of 10 novels.
Mary Johnson won the Nobel Prize for Literature in 2019: Confirmed. Snippet [3] verifies that the Nobel Prize in Literature in 2019 was awarded to Mary Johnson.
In this example, we used search queries to fact-check the claims about Mary Johnson's accomplishments, and the search results provided confirmation for all the factual claims.
10. Provide Grounding Context
To obtain accurate and reliable answers from the model, it is highly effective to provide grounding data, which serves as a source of information for the model's responses. This is particularly crucial when the use case demands up-to-date and trustworthy information, as opposed to purely creative scenarios. The closer the source material aligns with the desired answer's final form, the less the model has to interpret or transform, reducing the chances of errors.
In the following example, the system is given a recent blog post about the launch of GPT-4 in Azure OpenAI Service and is tasked with naming some of its early customers:
Question: Who are some of the early customers of the Azure OpenAI Service GPT-4 models?
Answer: Here are some of the early customers of the Azure OpenAI Service GPT-4 models:
Epic Healthcare: They intend to utilize GPT-4 to streamline data entry and investigation processes for physicians and nurses.
Coursera: Coursera is harnessing Azure OpenAI Service with GPT-4 to create an innovative AI-powered learning experience on their platform.
Coca-Cola: Coca-Cola is leveraging Azure OpenAI to establish a knowledge hub and plans to harness GPT-4's multimodal capabilities for marketing, advertising, public relations, and customer relations.
Numerous other companies, spanning various sizes, are employing Azure AI to enhance customer experiences, summarize lengthy content, assist in software development, and mitigate risks by predicting tax data accurately.
Best Practices
Below are the best practices in prompt engineering, providing insights and strategies to help you generate more accurate and context-aware AI-driven content.
Best Practice 1: Being specific and leaving as little to interpretation as possible
The more specific and descriptive you are in your prompt, the better the LLM will be able to understand what you want it to do.
For example,
Instead of: Write a poem.
Try: Write a poem about a cat who loves to chase yarn
Best Practice 2: Restricting the operational space
This means limiting the number of possible outputs that the LLM can generate.
For example,
Specify the length of the output: Write a poem that is no more than 10 lines long.
Specify the style of the output: Write a poem in the style of a haiku.
Specify the tone of the output: Write a poem in a humorous tone.
Best Practice 3: Being descriptive
Provide the LLM with as much information as possible about the desired output. This will help the LLM to generate the output that you are looking for.
For example: Write a poem about a cat who loves to chase yarn. The poem should be written in a humorous tone and should be targeted at a children's audience.
Best Practice 4: Using analogies
Analogies can help the LLM to understand the desired output by comparing it to something else.
For example, Write a poem like the one written by Edgar Allan Poe.
Best Practice 5: Doubling down
If you are not satisfied with the output that the LLM generates, you can try to improve it by providing more feedback.
For example, Rewrite the poem to make it more descriptive.
Best Practice 6: Giving the model an "out."
Sometimes, it is helpful to give the LLM an "out" in case it is unable to generate the desired output.
For example, If you are unable to write a poem, please write a short story instead.
Best Practice 7: Being space efficient
Try to keep your prompts as concise as possible. This will help the LLM to process the prompt more efficiently.
For example, write a poem about a cat who loves to chase yarn. (This prompt is more concise than the previous example, but it still provides the LLM with all of the necessary information.)
Conclusion
Prompt engineering is an evolving skill, adapting to AI advancements. Your ability to craft effective prompts empowers AI in various domains. By applying these techniques, you unlock the full potential of AI language models in content generation, chatbots, and beyond. Embrace these practices, iterate, and push the boundaries of what AI can achieve in your applications.
Comments