Prompting Change: Exploring Prompt Engineering in Large Language Model AI and Its Potential to Transform Education

  • Original Paper
  • Published: 18 October 2023
  • Volume 68 , pages 47–57, ( 2024 )

Cite this article

  • William Cain   ORCID: orcid.org/0000-0002-9814-228X 1  

1879 Accesses

Explore all metrics

This paper explores the transformative potential of Large Language Models Artificial Intelligence (LLM AI) in educational contexts, particularly focusing on the innovative practice of prompt engineering. Prompt engineering, characterized by three essential components of content knowledge, critical thinking, and iterative design, emerges as a key mechanism to access the transformative capabilities of LLM AI in the learning process. This paper charts the evolving trajectory of LLM AI as a tool poised to reshape educational practices and assumptions. In particular, this paper breaks down the potential of prompt engineering practices to enhance learning by fostering personalized, engaging, and equitable educational experiences. The paper underscores how the natural language capabilities of LLM AI tools can help students and educators transition from passive recipients to active co-creators of their learning experiences. Critical thinking skills, particularly information literacy, media literacy, and digital citizenship, are identified as crucial for using LLM AI tools effectively and responsibly. Looking forward, the paper advocates for continued research to validate the benefits of prompt engineering practices across diverse learning contexts while simultaneously promoting potential defects, biases, and ethical concerns related to LLM AI use in education. It calls upon practitioners to explore and train educational stakeholders in best practices around prompt engineering for LLM AI, fostering progress towards a more engaging and equitable educational future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

prompt engineering research paper

Similar content being viewed by others

prompt engineering research paper

Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education

Yoshija Walter

prompt engineering research paper

The SEARCH for AI-Informed Wellbeing Education: A Conceptual Framework

prompt engineering research paper

How should we change teaching and assessment in response to increasingly powerful generative Artificial Intelligence? Outcomes of the ChatGPT teacher survey

Matt Bower, Jodie Torrington, … Mark Alfano

Abdous, M. (2023). How AI is shaping the future of higher ed. Retrieved April 4, 2023, from https://www.insidehighered.com/views/2023/03/22/how-ai-shaping-future-higher-ed-opinion

Attiah, K. (2023). Opinion | For writers, AI is like a performance-enhancing steroid. Retrieved January 13, 2023, from https://www.washingtonpost.com/opinions/2023/01/13/ai-writers-performance-enhancing-steroid/

Bates, T., Cobo, C., Mariño, O., & Wheeler, S. (2020). Can artificial intelligence transform higher education? International Journal of Educational Technology in Higher Education , 17 (1), 42, s41239–020-00218–x. https://doi.org/10.1186/s41239-020-00218-x

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , 610–623. https://doi.org/10.1145/3442188.3445922 .

Bowman, E. (2022). A new AI chatbot might do your homework for you. But it’s still not an A+ student. NPR . Retrieved January 18, 2023, from https://www.npr.org/2022/12/19/1143912956/chatgpt-ai-chatbot-homework-academia

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems , 33 , 1877–1901. Retrieved May 31, 2023, from https://papers.nips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html

Cain, W. (2023a). GPTeammate: A design fiction on the use of variants of the GPT language model as cognitive partners for active learning in higher education. In Society for Information Technology & Teacher Education International Conference (pp. 1293–1298). Association for the Advancement of Computing in Education (AACE).

Cain, W. (2023b). Supporting AI-enhanced active online learning experiences: A framework for design and assessment. In T. Martindale, T. B. Amankwatia, L. D. Cifuentes, & A. A. Piña (Eds.), Handbook of research in online learning . Brill.

Google Scholar  

Cardona, M. A., Rodríguez, R. J., & Ishmael, K. (2023). Artificial intelligence and the future of teaching and learning: Insights and recommendations . Retrieved May 28, 2023, from U.S. Department of Education, Office of Educational Technology. https://www2.ed.gov/documents/ai-report/ai-report.pdf

Carvalho, L., Martinez-Maldonado, R., Tsai, Y.-S., Markauskaite, L., & De Laat, M. (2022). How can we design for learning in an AI world? Computers and Education: Artificial Intelligence, 3 , 100053. https://doi.org/10.1016/j.caeai.2022.100053

Article   Google Scholar  

Chomsky, N., Roberts, I., & Watumull, J. (2023). Opinion | Noam Chomsky: The false promise of ChatGPT. The New York Times . Retrieved April 21, 2023, from https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html

Coy, P. (2023). Opinion | A.I. could actually be a boon to education. The New York Times . Retrieved May 3, 2023, from https://www.nytimes.com/2023/05/03/opinion/chatgpt-ai-khan-academy.html

Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., … Wright, R. (2023). “So what if ChatGPT wrote it?” multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management , 71 , 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642 .

Ekin, S. (2023). Prompt engineering for ChatGPT: A quick guide to techniques, tips, And best practices TechRxiv . https://doi.org/10.36227/techrxiv.22683919.v2

Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review , 1 (1). https://doi.org/10.1162/99608f92.8cd550d1

Fourtané, S. (2022). Artificial intelligence in higher education: Benefits and ethics. Fierce Education . Retrieved June 3, 2023, from https://www.fierceeducation.com/technology/artificial-intelligence-higher-education-benefits-and-ethics

Harwell, D. (2023). Tech’s hottest new job: AI whisperer. No coding required. Washington Post . Retrieved February 26, 2023, from https://www.washingtonpost.com/technology/2023/02/25/prompt-engineers-techs-next-big-job/

Heaven, W. (2023). ChatGPT is going to change education, not destroy it. MIT Technology Review . Retrieved April 0, 2023, from https://www-technologyreview-com.cdn.ampproject.org/c/s/www.technologyreview.com/2023/04/06/1071059/chatgpt-change-not-destroy-education-openai/amp/

Heilweil, R. (2022). AI is finally good at stuff. Now what? Vox . Retrieved June 3, 2023, from https://www.vox.com/recode/2022/12/7/23498694/ai-artificial-intelligence-chat-gpt-openai

Ji, H., Han, I., & Ko, Y. (2022). A systematic review of conversational AI in language education: Focusing on the collaboration with human teachers. Journal of Research on Technology in Education , 0 (0), 1–16. https://doi.org/10.1080/15391523.2022.2142873

Jiang, J., Karran, A. J., Coursaris, C. K., Léger, P.-M., & Beringer, J. (2022). A situation awareness perspective on human-AI interaction: Tensions and opportunities. International Journal of Human–Computer Interaction , 1–18.

Kelley, K. J. (2023). Teaching actual student writing in an AI world. Inside Higher Ed . Retrieved January 19, 2023, from https://www.insidehighered.com/advice/2023/01/19/ways-prevent-students-using-ai-tools-their-classes-opinion

Kim, J., Lee, H., & Cho, Y. H. (2022). Learning design to support student-AI collaboration: Perspectives of leading teachers for AI in education. Education and Information Technologies, 27 (5), 6069–6104. https://doi.org/10.1007/s10639-021-10831-6

Kovanovic, V. (2023). The dawn of AI has come, and its implications for education couldn’t be more significant. The Conversation . Retrieved January 10, 2023, from http://theconversation.com/the-dawn-of-ai-has-come-and-its-implications-for-education-couldnt-be-more-significant-196383

Lo, L. S. (2023). The CLEAR path: A framework for enhancing information literacy through prompt engineering. The Journal of Academic Librarianship, 49 (4), 102720. https://doi.org/10.1016/j.acalib.2023.102720

Mollick, E. (2023). How to... Use ChatGPT to boost your writing [Substack newsletter]. One Useful Thing . Retrieved June 3, 2023, from https://oneusefulthing.substack.com/p/how-to-use-chatgpt-to-boost-your

Mollick, E. R., & Mollick, L. (2022). New modes of learning enabled by AI chatbots: Three methods and assignments. (SSRN Scholarly Paper 4300783). https://doi.org/10.2139/ssrn.4300783

Moore, R. L., Jiang, S., & Abramowitz, B. (2022). What would the matrix do?: A systematic review of K-12 AI learning contexts and learner-interface interactions. Journal of Research on Technology in Education , 1–14. https://doi.org/10.1080/15391523.2022.2148785

Nouhan, C., Scott, N., & Womack, J. (2021). Emergent role of artificial intelligence in higher education. IEEE Technology Policy and Ethics, 6 (3), 1–5. https://doi.org/10.1109/NTPE.2021.9778094

OpenAI. (2023). GPT-4 system card. OpenAI .

OpenAI. (n.d.). GPT Best Practices. OpenAI . Retrieved June 12, 2023, from https://platform.openai.com

Ortiz, S. (2023). This class requires ChatGPT usage, and the results are surprising. ZDNET . Retrieved February 21, 2023, from https://www.zdnet.com/article/this-class-requires-chatgpt-usage-the-results-are-surprising/

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer (arXiv:1910.10683). arXiv . Retrieved June 14, 2023, from http://arxiv.org/abs/1910.106833

Ramer, L. (2023). Adapt, evolve, elevate: ChatGPT is calling for interdisciplinary action. Times Higher Education . Retrieved February 22, 2023, from https://www.timeshighereducation.com/campus/adapt-evolve-elevate-chatgpt-calling-interdisciplinary-action

Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm (arXiv:2102.07350). arXiv . Retrieved March 5, 2023, from http://arxiv.org/abs/2102.07350

Shapiro, L. (2023). Opinion | Why I’m not worried about my students using ChatGPT. Washington Post . Retrieved February 6, 2023, from https://www.washingtonpost.com/opinions/2023/02/06/college-students-professor-concerns-chatgpt/

Short, C. E., & Short, J. C. (2023). The artificially intelligent entrepreneur: ChatGPT, prompt engineering, and entrepreneurial rhetoric creation. Journal of Business Venturing Insights, 19 , e00388. https://doi.org/10.1016/j.jbvi.2023.e00388

Srinivasan, V. (2022). AI & learning: A preferred future. Computers and Education: Artificial Intelligence, 3 , 100062. https://doi.org/10.1016/j.caeai.2022.100062

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural network s (arXiv:1409.3215). arXiv . Retrieved June 14, 2023, from http://arxiv.org/abs/1409.3215

Terry, O. K. (2023). Opinion | I’m a student. You have no idea how much we’re using ChatGPT . The Chronicle of Higher Education . Retrieved May 17, 2023, from https://www.chronicle.com/article/im-a-student-you-have-no-idea-how-much-were-using-chatgpt

Thongprasit, J., & Wannapiroon, P. (2022). Framework of artificial intelligence learning platform for education. International Education Studies, 15 (1), 76. https://doi.org/10.5539/ies.v15n1p76

Webb, M. E., Fluck, A., Magenheim, J., Malyn-Smith, J., Waters, J., Deschênes, M., & Zagami, J. (2021). Machine learning for human learners: Opportunities, issues, tensions and threats. Educational Technology Research and Development, 69 , 2109–2130.

Weise, K., & Metz, C. (2023). When A.I. chatbots hallucinate. The New York Times . Retrieved May 3, 2023, from https://www.nytimes.com/2023/05/01/business/ai-chatbots-hallucinatation.html

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT (arXiv:2302.11382). arXiv . https://doi.org/10.48550/arXiv.2302.11382

Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – Where are the educators? International Journal of Educational Technology in Higher Education, 16 (1), 39. https://doi.org/10.1186/s41239-019-0171-0

Zhang, K., & Aslan, A. B. (2021). AI technologies for education: Recent research & future directions. Computers and Education: Artificial Intelligence, 2 , 100025. https://doi.org/10.1016/j.caeai.2021.100025

Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2023). Large language models are human-level prompt engineers (arXiv:2211.01910). arXiv . https://doi.org/10.48550/arXiv.2211.01910

Download references

Author information

Authors and affiliations.

Learning, Design, and Technology, College of Education, University of Wyoming, 1000 E. University Ave., Education 337, Laramie, WY, 82071, USA

William Cain

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to William Cain .

Ethics declarations

Research involving human participants and/or animals.

This research did not involve either human participants and/or animals.

Informed Consent

This research did not contain any studies involving animal or human participants. No requirement for consent to publish was required.

Conflict of Interest

I declare I have no potential conflicts of interest in relation to the publication of the manuscript.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cain, W. Prompting Change: Exploring Prompt Engineering in Large Language Model AI and Its Potential to Transform Education. TechTrends 68 , 47–57 (2024). https://doi.org/10.1007/s11528-023-00896-0

Download citation

Accepted : 04 October 2023

Published : 18 October 2023

Issue Date : January 2024

DOI : https://doi.org/10.1007/s11528-023-00896-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Large language models (LLM AI)
  • Prompt engineering
  • Educational transformation
  • Find a journal
  • Publish with us
  • Track your research

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

dair-ai/Prompt-Engineering-Guide

Folders and files, repository files navigation, prompt engineering guide.

Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools.

Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, learning guides, lectures, references, and tools related to prompt engineering for LLMs.

🌐 Prompt Engineering Guide (Web Version)

We've partnered with Maven to deliver the following live cohort-based courses on prompt engineering:

LLMs for Everyone (Beginner) - learn about the latest prompt engineering techniques and how to effectively apply them to real-world use cases.

Prompt Engineering for LLMs (Advanced) - learn advanced prompt engineering techniques to build complex use cases and applications with LLMs.

Happy Prompting!

Announcements / Updates

  • 🎓 New course on Prompt Engineering for LLMs announced! Enroll here !
  • 💼 We now offer several services like corporate training, consulting, and talks.
  • 🌐 We now support 13 languages! Welcoming more translations.
  • 👩‍🎓 We crossed 3 million learners in January 2024!
  • 🎉 We have launched a new web version of the guide here
  • 🔥 We reached #1 on Hacker News on 21 Feb 2023
  • 🎉 The Prompt Engineering Lecture went live here

Join our Discord

Follow us on Twitter

Subscribe to our Newsletter

You can also find the most up-to-date guides on our new website https://www.promptingguide.ai/ .

  • Prompt Engineering - LLM Settings
  • Prompt Engineering - Basics of Prompting
  • Prompt Engineering - Prompt Elements
  • Prompt Engineering - General Tips for Designing Prompts
  • Prompt Engineering - Examples of Prompts
  • Prompt Engineering - Zero-Shot Prompting
  • Prompt Engineering - Few-Shot Prompting
  • Prompt Engineering - Chain-of-Thought Prompting
  • Prompt Engineering - Self-Consistency
  • Prompt Engineering - Generate Knowledge Prompting
  • Prompt Engineering - Prompt Chaining
  • Prompt Engineering - Tree of Thoughts (ToT)
  • Prompt Engineering - Retrieval Augmented Generation
  • Prompt Engineering - Automatic Reasoning and Tool-use (ART)
  • Prompt Engineering - Automatic Prompt Engineer
  • Prompt Engineering - Active-Prompt
  • Prompt Engineering - Directional Stimulus Prompting
  • Prompt Engineering - Program-Aided Language Models
  • Prompt Engineering - ReAct Prompting
  • Prompt Engineering - Multimodal CoT Prompting
  • Prompt Engineering - Graph Prompting
  • Prompt Engineering - Function Calling
  • Prompt Engineering - Generating Data
  • Prompt Engineering - Generating Synthetic Dataset for RAG
  • Prompt Engineering - Takling Generated Datasets Diversity
  • Prompt Engineering - Generating Code
  • Prompt Engineering - Graduate Job Classification Case Study
  • Prompt Engineering - Classification
  • Prompt Engineering - Coding
  • Prompt Engineering - Creativity
  • Prompt Engineering - Evaluation
  • Prompt Engineering - Information Extraction
  • Prompt Engineering - Image Generation
  • Prompt Engineering - Mathematics
  • Prompt Engineering - Question Answering
  • Prompt Engineering - Reasoning
  • Prompt Engineering - Text Summarization
  • Prompt Engineering - Truthfulness
  • Prompt Engineering - Adversarial Prompting
  • Prompt Engineering - ChatGPT
  • Prompt Engineering - Code Llama
  • Prompt Engineering - Flan
  • Prompt Engineering - Gemini
  • Prompt Engineering - GPT-4
  • Prompt Engineering - LLaMA
  • Prompt Engineering - Mistral 7B
  • Prompt Engineering - Mixtral
  • Prompt Engineering - OLMo
  • Prompt Engineering - Phi-2
  • Prompt Engineering - Model Collection
  • Prompt Engineering - Factuality
  • Prompt Engineering - Biases
  • Prompt Engineering - Overviews
  • Prompt Engineering - Approaches
  • Prompt Engineering - Applications
  • Prompt Engineering - Collections
  • Prompt Engineering - Tools
  • Prompt Engineering - Notebooks
  • Prompt Engineering - Datasets
  • Prompt Engineering - Additional Readings

We have published a 1 hour lecture that provides a comprehensive overview of prompting techniques, applications, and tools.

  • Video Lecture
  • Notebook with code

Running the guide locally

To run the guide locally, for example to check the correct implementation of a new translation, you will need to:

  • Install Node >=18.0.0
  • Install pnpm if not present in your system. Check here for detailed instructions.
  • Install the dependencies: pnpm i next react react-dom nextra nextra-theme-docs
  • Boot the guide with pnpm dev
  • Browse the guide at http://localhost:3000/

Appearances

Some places where we have been featured:

  • Wall Street Journal - ChatGPT Can Give Great Answers. But Only If You Know How to Ask the Right Question
  • Forbes - Mom, Dad, I Want To Be A Prompt Engineer
  • Markettechpost - Best Free Prompt Engineering Resources (2023)

If you are using the guide for your work or research, please cite us as follows:

MIT License

Feel free to open a PR if you think something is missing here. Always welcome feedback and suggestions. Just open an issue!

Sponsor this project

Contributors 176.

@omarsar

  • Jupyter Notebook 4.6%

[1] \fnm Nicolas \sur Langrené [2,1] \fnm Shengxin \sur Zhu

1] \orgname Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, \orgaddress \city Zhuhai \postcode 519087, \country China 2] \orgname Research Center for Mathematics, Beijing Normal University, \orgaddress \street No.18, Jingfeng Road, \city Zhuhai \postcode 519087, \state Guangdong, \country China

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

This paper delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). Prompt engineering is the process of structuring input text for LLMs and is a technique integral to optimizing the efficacy of LLMs. This survey elucidates foundational principles of prompt engineering, such as role-prompting, one-shot, and few-shot prompting, as well as more advanced methodologies such as the chain-of-thought and tree-of-thoughts prompting. The paper sheds light on how external assistance in the form of plugins can assist in this task, and reduce machine hallucination by retrieving external knowledge. We subsequently delineate prospective directions in prompt engineering research, emphasizing the need for a deeper understanding of structures and the role of agents in Artificial Intelligence-Generated Content (AIGC) tools. We discuss how to assess the efficacy of prompt methods from different perspectives and using different methods. Finally, we gather information about the application of prompt engineering in such fields as education and programming, showing its transformative potential. This comprehensive survey aims to serve as a friendly guide for anyone venturing through the big world of LLMs and prompt engineering.

1 Introduction

In recent years, a significant milestone in artificial intelligence research has been the progression of natural language processing capabilities, primarily attributed to large language models (LLMs). Many popular models, rooted in the transformer architecture [ 1 ] , undergo training on extensive datasets derived from web-based text. Central to their design is a self-supervised learning objective, which focuses on predicting subsequent words in incomplete sentences. Those models are called Artificial Intelligence-Generated Content (AIGC), and their ability to generate coherent and contextually relevant responses is a result of this training process, where they learn to associate words and phrases with their typical contexts.

LLMs operate by encoding the input text into a high-dimensional vector space, where semantic relationships between words and phrases are preserved. The model then decodes this representation to generate a response, guided by the learned statistical patterns [ 2 ] . The quality of the response can be influenced by various factors, including the prompt provided to the model, the model’s hyperparameters, and the diversity of the training data.

These models, such as GPT-3 [ 3 ] , GPT-4 [ 4 ] , along with many others (e.g., Google’s BARD [ 5 ] , Anthropic’s Claude2 [ 6 ] and Meta’s LLaMA-2 [ 7 ] ), have been utilized to revolutionize tasks ranging from information extraction to the creation of engaging content [ 8 ] . Related to AI systems, the application of LLMs in the workplace has the potential to automate routine tasks, such as data analysis [ 9 ] and text generation [ 10 ] , thereby freeing up time for employees to focus on more complex and rewarding tasks [ 11 ] . Furthermore, LLMs have the potential to revolutionize the healthcare sector by assisting in the diagnosis and treatment of diseases. Indeed, by analyzing vast amounts of medical literature, these models can provide doctors with insights into rare conditions, suggest potential treatment pathways, and even predict patient outcomes [ 12 ] . In the realm of education, LLMs can serve as advanced tutoring systems, and promote the quality of teaching and learning [ 13 ] . Those AI tools can analyze a student’s response, identify areas of improvement, and provide constructive feedback in a coherent and formal manner.

In real applications, the prompt is the input of the model, and its engineering can result in significant output difference [ 14 ] . Modifying the structure (e.g., altering length, arrangement of instances) and the content (e.g., phrasing, choice of illustrations, directives) can exert a notable influence on the output generated by the model [ 15 ] . Studies show that both the phrasing and the sequence of examples incorporated within a prompt have been observed to exert a substantial influence on the model’s behavior [ 15 , 16 ] .

The discipline of prompt engineering has advanced alongside LLMs. What originated as a fundamental practice of shaping prompts to direct model outputs has matured into a structured research area, replete with its distinct methodologies and established best practices. Prompt engineering refers to the systematic design and optimization of input prompts to guide the responses of LLMs, ensuring accuracy, relevance, and coherence in the generated output. This process is crucial in harnessing the full potential of these models, making them more accessible and applicable across diverse domains. Contemporary prompt engineering encompasses a spectrum of techniques, ranging from foundational approaches such as role-prompting [ 17 ] to more sophisticated methods such as “chain of thought” prompting [ 18 ] . The domain remains dynamic, with emergent research continually unveiling novel techniques and applications in prompt engineering. The importance of prompt engineering is accentuated by its ability to guide model responses, thereby amplifying the versatility and relevance of LLMs in various sectors. Importantly, a well-constructed prompt can counteract challenges such as machine hallucinations, as highlighted in studies by [ 19 ] and [ 20 ] . The influence of prompt engineering extends to numerous disciplines. For instance, it has facilitated the creation of robust feature extractors using LLMs, thereby improving their efficacy in tasks such as defect detection and classification [ 21 ] .

In this paper, we present a comprehensive survey on the prompt engineering of LLMs. The structure of the paper is organized as follows: Section 2 presents the foundational methods of prompt engineering, showcasing various results. It encompasses both basic and advanced techniques. Section 3 further explores advanced methodologies, including the use of external assistance. All the examples are generated on a non-multimodal language generative model, called “default GPT-4”, developed by OpenAI. Section 4 discusses potential future direction in prompt engineering. Section 5 provides insights into prompt evaluation techniques, drawing comparisons between subjective and objective assessment methods. Finally, Section 6 focuses on the broader applications of prompt engineering across various domains.

2 Basics of prompt engineering

By incorporating just a few key elements, one can craft a basic prompt that enables LLMs to produce high-quality answers. In this section, we discuss some essential components of a well-made prompt.

2.1 Model introduction: GPT-4

All of the output in the following sections are generated by GPT-4, developed by OpenAI [ 4 ] . Vast amounts of text data have been used to train GPT-4, whose number of parameters has been estimated to be several orders of magnitude larger than the 175 billion parameters that had been used for the earlier GPT-3 [ 3 ] . The architectural foundation of the model rests on transformers [ 1 ] , which essentially are attention mechanisms that assign varying weights to input data based on the context. Similar to GPT-3, GPT-4 was also fine-tuned to follow a broad class of written instructions by reinforcement learning from human feedback (RLHF) [ 22 , 23 ] , which is a technique that uses human preferences as a reward signal to fine-tune models.

When GPT-4 receives an input prompt, the input text will be firstly converted into tokens that the model can interpret and process. These tokens are then managed by transformer layers, which capture their relationships and context. Within these layers, attention mechanisms distribute different weights to tokens based on their relevance and context. After attention processing, the model forms its internal renditions of the input data, known as intermediate representations. These representations are then decoded back into human-readable text [ 24 ] .

A significant aspect of this process is the randomness function [ 25 ] . This function is influenced by two primary parameters, temperature and top- k 𝑘 k sampling. The first one, temperature [ 26 ] balances the randomness and determinism in the output. A higher temperature value results in more random outputs, while a lower value makes the output more deterministic. The second one, top- k 𝑘 k sampling [ 27 ] , limits the model’s choices to the top k 𝑘 k most probable tokens during each step of output generation. The final stage of this process is the output generation, where the model crafts the final text.

2.2 Giving instructions

The method of giving instructions, also considered as re-reading [ 28 ] , refers to the heuristics of human reading strategy. It has been observed that the output generated by GPT-4, introduced in Section 2.1 , tends to be excessively general when provided with fundamental instructions devoid of any supplementary description [ 29 , 30 ] . An example prompt is shown in Figure 1 . When the model is prompted with basic instruction, it faces a plethora of options, which makes the result quite broad. Hence, a comprehensive description is imperative to elicit more precise and relevant outputs  [ 31 ] .

I want to understand some cutting-edge aspects of technology.

2.3 Be clear and precise

The second basic prompt method is “to be clear and precise”. This involves formulating prompts that are unambiguous and specific, which can guide the model toward generating the desired output.

Most LLM architectures are derived from an extensive array of textual data. It can be conceptualized as a combination of insights from a myriad of authors. When presented with a broad or undetailed prompt, its output predominantly exhibits a generic nature, which, while being applicable across a range of contexts, may not be optimal for any specific application. In contrast, a detailed and precise prompt enables the model to generate content that is more aligned with the unique requirements of the given scenario, as it reduces the model’s uncertainty and guides it toward the correct response.

For instance, as shown in Figure 2 , instead of asking a vague requirement such as “I want to understand the cutting edge of technology.”, a more precise prompt would be “I want to understand the cutting edge of technology, specifically related to artificial intelligence and machine learning…”.

I want to understand the cutting edge of technology, specifically related to artificial intelligence and machine learning. Recently, artificial intelligence has become extremely popular, especially the large language models which have amazed me. Please analyze the possible reasons behind the rapid advancement of this technology from at least three perspectives.

2.4 Role-prompting

Role-prompting is another fundamental method in prompt engineering. It involves giving the model a specific role to play, such as a helpful assistant or a knowledgeable expert [ 32 ] . This method can be particularly effective in guiding the model’s responses and ensuring that they align with the desired output. For instance, if the model is prompted to act as a historian, it is more likely to provide a detailed and contextually accurate response when asked about a historical event [ 33 ] . Another brief example is shown in Figure 3 .

You are an expert in artificial intelligence specializing in large language models…

2.5 Use of triple quotes to separate

In prompt engineering, the use of triple quotes is a technique used to separate different parts of a prompt or to encapsulate multi-line strings. This technique is particularly useful when dealing with complex prompts that include multiple components or when the prompt itself contains quotes, which makes the model understand one’s instructions better [ 34 ] .

2.6 Try several times

Due to the non-deterministic nature of LLMs, it is often beneficial to try several times when generating responses. This technique, often referred to as “resampling”, involves running the model multiple times with the same prompt and selecting the best output. This approach can help overcome the inherent variability in the model’s responses and increase the chances of obtaining a high-quality output [ 26 ] .

2.7 One-shot or few-shot prompting

One-shot and few-shot prompting are two important techniques in prompt engineering. One-shot prompting refers to the method where the model is given a single example to learn from, while few-shot [ 35 ] prompting provides the model with multiple examples [ 36 ] . The choice between one-shot and few-shot prompting often depends on the complexity of the task and the capability of the model. For instance, for simple tasks or highly capable models, one-shot prompting might be sufficient. An example is shown in Figure 4 . However, for more complex tasks or less capable models, few-shot prompting can provide additional context and guidance, thereby improving the model’s performance.

Q: Yes or no: would a pear sink in water?

A: Yes , a pear would typically float in water, not sink. The reason for this is that pears, like many fruits, are composed primarily of water and have a density that is less than that of water. Objects that are less dense than water will float when placed in water. × \boldsymbol{\times}

A: The answer is No. \pdfrender StrokeColor=green,TextRenderingMode=2✓

However, as stated in [ 37 ] , “examples don’t always help”. [ 37 ] investigated the intricacies of how large generative language models, such as GPT-3, respond to prompts. One of the significant findings from this paper is that zero-shot prompts can, in certain scenarios, outperform few-shot prompts. This suggests that the role of few-shot examples might not be as much about teaching the model a new task (meta-learning) but rather guiding it to recall a task it has already learned. This insight is crucial as it challenges the conventional wisdom that more examples always lead to better performance [ 3 ] . In the context of one-shot or few-shot prompting, it is essential to understand that while examples can guide the model, they do not always enhance its performance. Sometimes, a well-crafted zero-shot prompt can be more effective than providing multiple examples [ 38 ] .

2.8 LLM settings: temperature and top-p

The settings of LLMs, such as the temperature and top- p 𝑝 p , play a crucial role in the generation of responses. The temperature parameter controls the randomness of the generated output: a lower temperature leads to more deterministic outputs [ 39 , 40 ] . The top- p 𝑝 p parameter, on the other hand, controls the nucleus sampling [ 26 ] , which is a method to add randomness to the model’s output [ 41 ] . Adjusting these parameters can significantly affect the quality and diversity of the model’s responses, making them essential tools in prompt engineering. However, it has been noted that certain models, exemplified by ChatGPT, do not permit the configuration of these hyperparameters, barring instances where the Application Programming Interface (API) is employed.

3 Advanced methodologies

The foundational methods from the previous section can help us produce satisfactory outputs. However, experiments indicate that when using LLMs for complex tasks such as analysis or reasoning, the accuracy of the model’s outputs still has room for improvement. In this section, we will further introduce advanced techniques in prompt engineering to guide the model in generating more specific, accurate, and high-quality content.

3.1 Chain of thought

The concept of “Chain of Thought” (CoT) prompting [ 18 ] in LLMs is a relatively new development in the field of AI, and it has been shown to significantly improve the accuracy of LLMs on various logical reasoning tasks [ 42 , 43 , 44 ] . CoT prompting involves providing intermediate reasoning steps to guide the model’s responses, which can be facilitated through simple prompts such as “Let’s think step by step” or through a series of manual demonstrations, each composed of a question and a reasoning chain that leads to an answer [ 45 , 46 ] . It also provides a clear structure for the model’s reasoning process, making it easier for users to understand how the model arrived at its conclusions.

[ 47 ] illustrates the application of CoT prompting to medical reasoning, showing that it can effectively elicit valid intermediate reasoning steps from LLMs. [ 48 ] introduces the concept of Self-Education via Chain-of-Thought Reasoning (SECToR), and argues that, in the spirit of reinforcement learning, LLMs can successfully teach themselves new skills by chain-of-thought reasoning. In another study, [ 49 ] used CoT prompting to train verifiers to solve math word problems, demonstrating the technique’s potential in educational applications. [ 50 ] proposed a multimodal version of CoT, called Multimodal-CoT, to handle more complex, multimodal tasks beyond simple text-based tasks, such as visual tasks, further expanding the potential applications of CoT.

3.1.1 Zero-shot chain of thought

The concept of “Zero-Shot Chain of Thought” (Zero-shot-CoT) prompting is an advanced iteration of the CoT prompting mechanism, where the “zero-shot” aspect implies that the model is capable of performing some reasoning without having seen any examples of the task during training.

In their research, [ 51 ] discovered that the augmentation of queries with the phrase “Let’s think step by step” facilitated the generation of a sequential reasoning chain by LLMs. This reasoning chain subsequently proved instrumental in deriving more precise answers. This technique is based on the idea that the model, much like a human, can benefit from having more detailed and logical steps to process the prompt and generate a response.

For instance, the standard prompt is illustrated in Figure 5 , while the appended phrase, “Let’s think step by step”, is depicted in Figure 6 . Observations indicate that the incorporation of “Let’s think step by step” enhances the logical coherence and comprehensiveness of the model’s response.

Imagine an infinitely wide entrance, which is more likely to pass through it, a military tank or a car?

Imagine an infinitely wide entrance, which is more likely to pass through it, a military tank or a car? Let’s think step by step.

3.1.2 Golden chain of thought

[ 52 ] introduced the “golden chain of thought”, providing an innovative approach to generating responses to instruction-based queries. This methodology leverages a set of “ground-truth chain-of-thought” solutions incorporated within the prompt, considerably simplifying the task for the model as it circumvents the necessity for independent CoT generation. Concurrently, a novel benchmark comprising detective puzzles has been designed, to assess the abductive reasoning capacities of LLMs, which is also considered an evaluation of the golden CoT. Finally, according to [ 52 ] ’s experiment, in the context of the golden CoT, GPT-4 exhibits commendable performance, boasting an 83% solve rate in contrast to the 38% solve rate of the standard CoT.

However, the characteristics of the Golden CoT requiring the “ground-truth chain-of-thought solutions” as an integral part of the prompt also signifies that the Golden CoT’s contribution to solving such problems is limited, despite its high solve rate of 83%.

3.2 Self-consistency

In the assessment of INSTRUCTGPT [ 53 ] and GPT-3 [ 3 ] on a new synthetic QA dataset called PRONTOQA, for Proof and Ontology-Generated Question-Answering [ 54 , 55 ] , it was observed that although the most extensive model exhibited capability in reasoning tasks, it encountered challenges in proof planning and the selection of the appropriate proof step amidst multiple options, which caused accuracy uncertainties [ 54 ] . Self-consistency in LLMs is an advanced prompting technique that aims to ensure the model’s responses are consistent with each other [ 56 , 18 ] , which greatly increases the odds of obtaining highly accurate results. The principle behind it is that if a model is asked to answer a series of related questions, the answers should not contradict each other.

The self-consistency method contains three steps. Firstly, prompt a language model using CoT prompting, then replace the “greedy decode” (1-Best) [ 25 , 57 ] in CoT prompting by sampling from the language model’s decoder to generate a diverse set of reasoning paths, and finally, marginalize out the reasoning paths and aggregate by choosing the most consistent answer in the final answer set.

It is noteworthy that self-consistency can be harmoniously integrated with most sampling algorithms, including but not limited to, temperature sampling [ 39 , 40 ] , top- k 𝑘 k sampling [ 58 , 59 , 25 ] , and nucleus sampling [ 26 ] . Nevertheless, such an operation may necessitate the invocation of the model’s Application Programming Interface (API) to fine-tune these hyperparameters. In light of this, an alternative approach could be to allow the model to generate results employing diverse reasoning paths, and then generate a diverse set of candidate reasoning paths. The response demonstrating the highest degree of consistency across the various reasoning trajectories is then more inclined to represent the accurate solution [ 60 ] .

Studies have shown that self-consistency enhances outcomes in arithmetic, commonsense, and symbolic reasoning tasks [ 61 , 2 ] . Furthermore, in practice, self-consistency can be combined with other techniques to further enhance the model’s performance. For example, a study found that combining self-consistency with a discriminator-guided multi-step reasoning approach significantly improved the model’s reasoning capabilities [ 62 ] .

3.3 Generated knowledge

The “generated knowledge” [ 63 ] approach in prompt engineering is a technique that leverages the ability of LLMs to generate potentially useful information about a given question or prompt before generating a final response. This method is particularly effective in tasks that require commonsense reasoning, as it allows the model to generate and utilize additional context that may not be explicitly present in the initial prompt.

As exemplified in Figure 5 , when posing the query to the model, “Imagine an infinitely wide entrance, which is more likely to pass through it, a military tank or a car?”, standard prompts predominantly yield responses that neglect to factor in the “entrance height”. Conversely, as delineated in Figure 7 and Figure 8 , prompting the model to first generate pertinent information and subsequently utilizing generated information in the query leads to outputs with augmented logical coherence and comprehensiveness. Notably, this approach stimulates the model to account for salient factors such as “entrance height”.

Generate two key analyses related to detailed size data on military tanks and cars, and then generate three key influencing factors regarding whether an object can pass through an infinitely wide entrance.

3.4 Least-to-most prompting

The concept of “least to most prompting” [ 64 ] is an advanced method that involves starting with a minimal prompt and gradually increasing its complexity to elicit more sophisticated responses from the language model. The foundational premise of this approach is the decomposition of intricate problems into a succession of more rudimentary subproblems, which are then sequentially addressed. The resolution of each subproblem is expedited by leveraging solutions derived from antecedent subproblems.

Upon rigorous experimentation in domains including symbolic manipulation, compositional generalization, and mathematical reasoning, findings from [ 64 ] substantiate that the least-to-most prompting paradigm exhibits the capacity to generalize across challenges of greater complexity than those initially presented in the prompts. They found that LLMs seem to respond effectively to this method, demonstrating its potential for enhancing the reasoning capabilities of these models.

3.5 Tree of thoughts

The “tree of thoughts” (ToT) prompting technique is an advanced method that employs a structured approach to guide LLMs in their reasoning and response generation processes. Unlike traditional prompting methods that rely on a linear sequence of instructions, the ToT method organizes prompts in a hierarchical manner, akin to a tree structure, allowing for deliberate problem-solving [ 65 ] . For instance, when tasked with solving a complex mathematical problem, a traditional prompt might directly ask LLMs for the solution. In contrast, using the ToT method, the initial prompt might first ask the model to outline the steps required to solve the problem. Subsequent prompts would then delve deeper into each step, guiding the model through a systematic problem-solving process.

[ 65 ] demonstrates that this formulation is more versatile and can handle challenging tasks where standard prompts might fall short. Another research by [ 66 ] further emphasizes the potential of this technique in enhancing the performance of LLMs by structuring their thought processes.

[ 5 ] introduces the “tree-of-thought prompting”, an approach that assimilates the foundational principles of the ToT frameworks and transforms them into a streamlined prompting methodology. This technique enables LLMs to assess intermediate cognitive constructs within a singular prompt. An exemplar ToT prompt is delineated in Figure  9 [ 5 ] .

3.6 Graph of thoughts

Unlike the “chain of thoughts” or “tree of thoughts” paradigms, the “graph of thoughts” (GoT) framework [ 67 ] offers a more intricate method of representing the information generated by LLMs. The core concept behind GoT is to model this information as an arbitrary graph. In this graph, individual units of information, termed “LLM thoughts”, are represented as vertices. The edges of the graph, on the other hand, depict the dependencies between these vertices. This unique representation allows for the combination of arbitrary LLM thoughts, thereby creating a synergistic effect in the model’s outputs.

In the context of addressing intricate challenges, LLMs utilizing the GoT framework might initially produce several autonomous thoughts or solutions. These individual insights can subsequently be interlinked based on their pertinence and interdependencies, culminating in a detailed graph. This constructed graph permits diverse traversal methods, ensuring the final solution is both precise and comprehensive, encompassing various dimensions of the challenge.

The efficacy of the GoT framework is anchored in its adaptability and the profound insights it can yield, particularly for intricate issues necessitating multifaceted resolutions. Nonetheless, it is imperative to recognize that while GoT facilitates a systematic approach to problem-solving, it also necessitates a profound comprehension of the subject matter and meticulous prompt design to realize optimal outcomes [ 68 ] .

3.7 Retrieval augmentation

Another direction of prompt engineering is to aim to reduce hallucinations. When using AIGC tools such as GPT-4, it is common to face a problem called “hallucinations”, which refer to the presence of unreal or inaccurate information in the model’s generated output [ 69 , 19 ] . While these outputs may be grammatically correct, they can be inconsistent with facts or lack real-world data support. Hallucinations arise because the model may not have found sufficient evidence in its training data to support its responses, or it may overly generalize certain patterns when attempting to generate fluent and coherent output [ 70 ] .

An approach to reduce hallucinations and enhance the effectiveness of prompts is the so-called retrieval augmentation technique, which aims at incorporating up-to-date external knowledge into the model’s input [ 71 , 72 ] . It is emerging as an AI framework for retrieving facts from external sources. [ 73 ] examines the augmentation of context retrieval through the incorporation of external information. It proposes a sophisticated operation: the direct concatenation of pertinent information obtained from an external source to the prompt, which is subsequently treated as foundational knowledge for input into the expansive language model. Additionally, the paper introduces auto-regressive techniques for both retrieval and decoding, facilitating a more nuanced approach to information retrieval and fusion. This research demonstrates that in-context retrieval-augmented language models [ 73 ] , when constructed upon readily available general-purpose retrievers, yield significant LLM enhancements across a variety of model dimensions and diverse corpora. In another research, [ 74 ] showed that GPT-3 can reduce hallucinations by studying various components of architectures such as Retrieval Augmented Generation (RAG) [ 75 ] , Fusion-inDecoder (FiD) [ 76 ] , Seq2seq [ 77 , 78 , 79 ] and others. [ 80 ] developed Chain-of-Verification (CoVe) to reduce the hallucinations, which introduces that when equipped with tool-use such as retrieval augmentation in the verification execution step, would likely bring further gains.

3.8 Use plugins to polish the prompts

After introducing the detailed techniques and methods of prompt engineering, we now explore the use of some external prompt engineering assistants that have been developed recently and exhibit promising potential. Unlike the methods introduced previously, these instruments can help us to polish the prompt directly. They are adept at analyzing user inputs and subsequently producing pertinent outputs within a context that is defined by itself, thereby amplifying the efficacy of prompts. Some of the plugins provided by OpenAI are good examples of such tools [ 81 ] .

In certain implementations, the definition of a plugin is incorporated into the prompt, potentially altering the output [ 82 ] . Such integration may impact the manner in which LLMs interpret and react to the prompts, illustrating a connection between prompt engineering and plugins. Furthermore, the laborious nature of intricate prompt engineering may be mitigated by plugins, which enable the model to more proficiently comprehend or address user inquiries without necessitating excessively detailed prompts. Consequently, plugins bolster the efficacy of prompt engineering while promoting enhanced user-centric efficiency. These tools, akin to packages, can be seamlessly integrated into Python and invoked directly [ 83 , 84 ] . Such plugins augment the efficacy of prompts by furnishing responses that are both coherent and contextually pertinent. For instance, the “Prompt Enhancer” pluging [ 85 ] , developed by AISEO [ 86 ] , can be invoked by starting the prompt with the word “AISEO” to let the AISEO prompt generator automatically enhance the LLM prompt provided. Similarly, another plugin called “Prompt Perfect”, can be used by starting the prompt with ‘perfect’ to automatically enhance the prompt, aiming for the “perfect” prompt for the task at hand [ 87 , 88 ] .

4 Prospective methodologies

Several key developments on the horizon promise to substantially advance prompt engineering capabilities. In the following section, some of the most significant trajectories would be analyzed that are likely to shape the future of prompt engineering. By anticipating where prompt engineering is headed, developments in this field can be proactively steered toward broadly beneficial outcomes.

4.1 Better understanding of structures

One significant trajectory about the future of prompt engineering that emerges is the importance of better understanding the underlying structures of AI models. This understanding is crucial to effectively guide these models through prompts and to generate outputs that are more closely aligned with user intent.

At the heart of most AI models, including GPT-4, are complex mechanisms designed to understand and generate human language. The interplay of these mechanisms forms the “structure” of these models. Understanding this structure involves unraveling the many layers of neural networks, the various attention mechanisms at work, and the role of individual nodes and weights in the decision-making process of these models [ 89 ] . Deepening our understanding of these structures could lead to substantial improvements in prompt engineering. The misunderstanding of the model may cause a lack of reproducibility [ recht2011hogwild ] . By understanding how specific components of the model’s structure influence its outputs, we could design prompts that more effectively exploit these components.

Furthermore, a comprehensive grasp of these structures could shed light on the shortcomings of certain prompts and guide their enhancement. Frequently, the underlying causes for a prompt’s inability to yield the anticipated output are intricately linked to the model’s architecture. For example, [ 16 ] found evidence of limitations in previous prompt models and questioned how much these methods truly understood the model.

Exploration of AI model architectures remains a vibrant research domain, with numerous endeavors aimed at comprehending these sophisticated frameworks. A notable instance is DeepMind’s “Causal Transformer” model [ melnychuk2022causal ] , designed to explicitly delineate causal relationships within data. This represents a stride towards a more profound understanding of AI model architectures, with the potential to help us design more efficient prompts.

Furthermore, a more comprehensive grasp of AI model architectures would also yield advancements in explainable AI. Beyond better prompt engineering, this would also foster greater trust in AI systems and promote their integration across diverse industries [ novakovsky2023obtaining ] . For example, while AI is transforming the financial sector, encompassing areas such as customer service, fraud detection, risk management, credit assessments, and high-frequency trading, several challenges, particularly those related to transparency, are emerging alongside these advancements [ bertucci2022deep , maple2023ai ] . Another example is medicine, where AI’s transformative potential faces similar challenges [ amann2020explainability , rajpurkar2022ai ] .

In conclusion, the trajectory toward a better understanding of AI model structures promises to bring significant advancements in prompt engineering. As we research deeper into these intricate systems, we should be able to craft more effective prompts, understand the reasons behind prompt failures, and enhance the explainability of AI systems. This path holds the potential for transforming how we interact with and utilize AI systems, underscoring its importance in the future of prompt engineering.

4.2 Agent for AIGC tools

The concept of AI agents has emerged as a potential trajectory in AI research [ ozturk2021does ] . In this brief section, we explore the relationship between agents and prompt engineering and project how agents might influence the future trajectory of AI-generated content (AIGC) tools. By definition, an AI agent comprises large models, memory, active planning, and tool use. AI agents are capable of remembering and understanding a vast array of information, actively planning and strategizing, and effectively using various tools to generate optimal solutions within complex problem spaces [ Seeamber2023IfOA ] .

The evolution of AI agents can be delineated into five distinct phases: models, prompt templates, chains, agents, and multi-agents. Each phase carries its specific implications for prompt engineering. Foundational models, exemplified by architectures such as GPT-4, underpin the realm of prompt engineering.

In particular, prompt templates offer an effective way of applying prompt engineering in practice [ 18 ] . By using these templates, we can create standardized prompts to guide large models, making the generated output more aligned with the desired outcome. The usage of prompt templates is a crucial step towards enabling AI agents to better understand and execute user instructions.

AI agents amalgamate these methodologies and tools into an adaptive framework. Possessing the capability to autonomously modulate their behaviors and strategies, they strive to optimize both efficiency and precision in task execution. A salient challenge for prompt engineering emerges: devising and instituting prompts that adeptly steer AI agents toward self-regulation [ 16 ] .

In conclusion, the introduction of agent-based paradigms heralds a novel trajectory for the evolution of AIGC tools. This shift necessitates a reevaluation of established practices in prompt engineering and ushers in fresh challenges associated with the design, implementation, and refinement of prompts.

5 Assessing the efficacy of prompt methods

There are many different ways to evaluate the quality of the output. To assess the efficacy of current prompt methods in AIGC tools, evaluation methods can generally be divided into subjective and objective categories.

5.1 Subjective and objective evaluations

Subjective evaluations primarily rely on human evaluators to assess the quality of the generated content. Human evaluators can read the text generated by LLMs and score it for quality. Subjective evaluations typically include aspects such as fluency, accuracy, novelty, and relevance [ 26 ] . However, these evaluation methods are, by definition, subjective and can be prone to inconsistencies.

Objective evaluations, also known as automatic evaluation methods, use machine learning algorithms to score the quality of text generated by LLMs. Objective evaluations employ automated metrics, such as BiLingual Evaluation Understudy (BLEU) [ papineni2002 ] , which assigns a score to system-generated outputs, offering a convenient and rapid way to compare various systems and monitor their advancements. Other evaluations such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) [ lin2004rouge ] , and Metric for Evaluation of Translation with Explicit ORdering (METEOR) [ banerjee-lavie-2005-meteor ] , assess the similarity between the generated text and reference text. More recent evaluation methods, such as BERTScore [ zhang2020bertscore ] , aim to assess at a higher semantic level. However, these automated metrics often fail to fully capture the assessment results of human evaluators and therefore must be used with caution [ sai2022survey ] .

Subjective evaluation and objective evaluation methods each have their own advantages and disadvantages. Subjective evaluation tends to be more reliable than objective evaluation, but it is also more expensive and time-consuming. Objective evaluation is less expensive and quicker than subjective evaluation. For instance, despite numerous pieces of research highlighting the limited correlation between BLEU and alternative metrics based on human assessments, their popularity has remained unaltered [ ananthakrishnan2007some , callison2006re ] . Ultimately, the best way to evaluate the quality of LLM output depends on the specific application [ stent2005evaluating ] . If quality is the most important factor, then using human evaluators is the better choice. If cost and time are the most important factors, then using automatic evaluation methods is better.

5.2 Comparing different prompt methods

In the field of prompt engineering, previous work has mostly focused on designing and optimizing specific prompting methods, but evaluating and comparing different prompting approaches in a systematic manner remains limited. There are some models that are increasingly used to grade the output of other models, which aim to ‘check’ the ability of other models [ jain2023bring , wang2023pandalm ] . For instance, LLM-Eval [ lin2023llm ] was developed to measure the open-domain conversations with LLMs. This method tries to evaluate the performance of LLMs on various benchmark datasets [ kiela2021dynabench , dehghani2021benchmark ] and demonstrate their efficiency. Other studies experiment mainly on certain models or tasks and employ disparate evaluation metrics, restricting comparability across methods [ deng-etal-2022-rlprompt , zhou2022large ] . Nevertheless, recent research proposed a general evaluation framework called InstructEval [ Ajith2023InstructEvalSE ] that enables a comprehensive assessment of prompting techniques across multiple models and tasks. The InstructEval study reached the following conclusions: in few-shot settings, omitting prompts or using generic task-agnostic prompts tends to outperform other methods, with prompts having little impact on performance; in zero-shot settings, expert-written task-specific prompts can significantly boost performance, with automated prompts not outperforming simple baselines; the performance of automated prompt generation methods is inconsistent, varying across different models and task types, displaying a lack of generalization. InstructEval provides important references for prompt engineering and demonstrates the need for more universal and reliable evaluation paradigms to design optimal prompts.

6 Applications improved by prompt engineering

The output enhancements provided by prompt engineering techniques make LLMs better applicable to real-world applications. This section briefly discusses applications of prompt engineering in fields such as teaching, programming, and others.

Refer to caption

6.1 Assessment in teaching and learning

The study [ tang2023ml4stem ] investigates the application of machine learning methods in young student education. In such a context, prompt engineering can facilitate the creation of personalized learning environments. By offering tailored prompts, LLMs can adapt to an individual’s learning pace and style. Such an approach can allow for personalized assessments and educational content, paving the way for a more individual-centric teaching model. Recent advancements in prompt engineering suggest that AI tools can also cater to students with specific learning needs, thus fostering inclusivity in education [ 10.5555/3495724.3496249 ] . As a simple example, it is possible for professors to provide rubrics or guidelines for a future course with the assistance of AI. As Figure 10 shows, when GPT-4 was required to provide a rubric about a course, with a suitable prompt, it was able to respond with a specific result that may satisfy the requirement.

The advancements in prompt engineering also bring better potential for automated grading in education. With the help of sophisticated prompts, LLMs can provide preliminary assessments, reducing the workload for educators while providing instant feedback to students [ ariely2023machine ] . Similarly, these models, when coupled with well-designed prompts, can analyze a vast amount of assessment data, thus providing valuable insights into learning patterns and informing educators about areas that require attention or improvement [ nilsson2023gpt , schneider2023towards ] .

6.2 Content creation and editing

With controllable improved input, LLMs have primarily been used in creative works, such as content creation. Pathways Language Model (PaLM) [ 57 ] and prompting approach have been used to facilitate cross-lingual short story generation [ 10 ] . The Recursive Reprompting and Revision framework ( Re 3 fragments Re 3 \mathrm{Re}^{3} ) [ yang2022re3 ] employs zero-shot prompting [ 51 ] with GPT-3 to craft a foundational plan including elements such as settings, characters, and outlines. Subsequently, it adopts a recursive technique, dynamically prompting GPT-3 to produce extended story continuations. For another example, Detailed Outline Control (DOC) [ yang2022doc ] aims at preserving plot coherence across extensive texts generated with the assistance of GPT-3. Unlike Re 3 fragments Re 3 \mathrm{Re}^{3} , DOC employs a detailed outliner and detailed controller for implementation. The detailed outliner initially dissects the overarching outline into subsections through a breadth-first method, where candidate generations for these subsections are generated, filtered, and subsequently ranked. This process is similar to the method of chain-of-though (in Section 3.1 ). Throughout this generation process, an OPT-based Future Discriminators for Generation (FUDGE) [ yang2021fudge ] detailed controller plays a crucial role in maintaining relevance.

6.3 Computer programming

Prompt engineering can help LLMs perform better at outputting programming codes. By using a self-debugging prompting approach [ 46 ] , which contains simple feedback, unit-test, and code explanation prompts module, the text-to-SQL [ Elgohary2020SpeakTY ] model is able to provide a solution it can state as correct unless the maximum number of attempts has been reached. Another example, Multi-Turn Programming Benchmark (MTPB) [ nijkamp2022codegen ] , was constructed to implement a program by breaking it into multi-step natural language prompts.

Another approach is provided in [ shrivastava2023repository ] , which introduced the Repo-Level Prompt Generator (RLPG) to dynamically retrieve relevant repository context and construct a prompt for a given task, especially on code auto-completion task. The most suitable prompt is selected by a prompt proposal classifier and combined with the default context to generate the final output.

6.4 Reasoning tasks

AIGC tools have shown promising performance in reasoning tasks. Several previous researches found that few-shot prompting can enhance the performance in generating accurate reasoning steps for word-based math problems in the GSM8K dataset [ 49 , 57 , 44 , 56 ] . The strategy of including the reasoning traces in such as few-shot prompts [ 35 ] , self-talk [ shwartz2020unsupervised ] and chain-of-thought [ 18 ] , was shown to encourage the model to generate verbalized reasoning steps. [ uesato2022solving ] conducted experiments by involving prompting strategies, various fine-tuning techniques, and re-ranking methods to assess their impact on enhancing the performance of a base LLM. They found that a customized prompt significantly improved the model’s ability with fine-tuning, and demonstrated a significant advantage by generating substantially fewer errors in reasoning. In another research, [ 51 ] observed that solely using zero-shot CoT prompting leads to a significant enhancement in the performance of GPT-3 and PaLM when compared to the conventional zero-shot and few-shot prompting methods. This improvement is particularly noticeable when evaluating these models on the MultiArith [ roy2015solving ] and GSM8K [ 49 ] datasets. [ li-etal-2023-making ] also introduced a novel prompting approach called Diverse Verifier on Reasoning Step (DIVERSE). This approach involves using a diverse set of prompts for each question and incorporates a trained verifier with an awareness of reasoning steps. The primary aim of DIVERSE is to enhance the performance of GPT-3 on various reasoning benchmarks, including GSM8K and others. All these works show that in the application of reasoning tasks, properly customized prompts can obtain better results from the model.

6.5 Dataset generation

LLMs possess the capability of in-context learning, enabling them to be effectively prompted to generate synthetic datasets for training smaller, domain-specific models. [ ding2022gpt ] put forth three distinct prompting approaches for training data generation using GPT-3: unlabeled data annotation, training data generation, and assisted training data generation. Besides, [ yooetal2021gpt3mix ] is designed for the generation of supplementary synthetic data for classification tasks. GPT-3 is utilized in conjunction with a prompt that includes real examples from an existing dataset, along with a task specification. The goal is to jointly create synthetic examples and pseudo-labels using this combination of inputs.

7 Conclusion

In this paper, we present a comprehensive overview of prompt engineering techniques and their instrumental role in refining Large Language Models (LLMs). We detail both foundational and advanced methodologies in prompt engineering, illustrating their efficacy in directing LLMs toward targeted outputs. We also analyze retrieval augmentation and plugins, which can further augment prompt engineering. We discuss broader applications of prompt engineering, highlighting its potential in sectors such as education and programming. We finally cast a forward-looking gaze on the future avenues of prompt engineering, underscoring the need for a deeper understanding of LLM architectures and the significance of agent-based paradigms. In summary, prompt engineering has emerged as a critical technique for guiding and optimizing LLMs. As the ubiquity of prompt engineering develops, we hope that this paper can lay the groundwork for further research.

8 Acknowledgement

This work was funded by the Natural Science Foundation of China (12271047); Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College (2022B1212010006); UIC research grant (R0400001-22; UICR0400008-21; UICR0700041-22; R72021114); Guangdong College Enhancement and Innovation Program (2021ZDZX1046).

  • Linardatos et al. [2020] Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable AI: a review of machine learning interpretability methods. Entropy. 2020;23(1):18.

ar5iv homepage

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Med Internet Res
  • PMC10585440

Logo of jmir

Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial

Bertalan meskó.

1 The Medical Futurist Institute, Budapest, Hungary

Associated Data

Higher resolution version of Figure 1.

Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of large language models (LLMs) to help in various tasks. With the emergence of LLMs, the most popular one being ChatGPT that has attracted the attention of over a 100 million users in only 2 months, artificial intelligence (AI), especially generative AI, has become accessible for the masses. This is an unprecedented paradigm shift not only because of the use of AI becoming more widespread but also due to the possible implications of LLMs in health care. As more patients and medical professionals use AI-based tools, LLMs being the most popular representatives of that group, it seems inevitable to address the challenge to improve this skill. This paper summarizes the current state of research about prompt engineering and, at the same time, aims at providing practical recommendations for the wide range of health care professionals to improve their interactions with LLMs.

The Emergence of Large Language Models and Prompt Engineering

With the emergence of large language models (LLMs), with the most popular one being ChatGPT that has attracted the attention of over a 100 million users in only 2 months, artificial intelligence (AI), especially generative AI has become accessible for the masses [ 1 ]. This is an unprecedented paradigm shift not only because of the use of AI becoming more widespread but also due to the possible implications of LLMs in health care [ 2 ].

Numerous studies have shown what medical tasks and health care processes LLMs can contribute to in order to ease the burden on medical professionals, increase efficiency, and decrease costs [ 3 ].

Health care institutions have started investing in generative AI, medical companies have started integrating LLMs into their businesses, medical associations have released guidelines about the use of these models, and medical curricula have also started covering this novel technology [ 4 - 6 ]. Thus, a new, essential skill has emerged: prompt engineering.

Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of LLMs to help in various tasks. It is essentially the practice of effectively interacting with AI systems to optimize their benefits.

In the context of medical professionals and health care in general, this could encompass the following:

  • Decision support: medical professionals can use prompt engineering to optimize AI systems to aid in decision-making processes, such as diagnosis, treatment selection, or risk assessment.
  • Administrative assistance: prompts can be engineered to facilitate administrative tasks, such as patient scheduling, record keeping, or billing, thereby increasing efficiency.
  • Patient engagement: prompt engineering can be used to improve communication between health care providers and patients. For example, AI systems can be designed to send prompts for medication reminders, appointment scheduling, or lifestyle advice.
  • Research and development: in research scenarios, prompts can be crafted to assist in tasks such as literature reviews, data analysis, and generating hypotheses.
  • Training and education: prompts can be engineered to facilitate the education of medical professionals, including ongoing training in the latest treatments and procedures.
  • Public health: on a larger scale, prompt engineering can assist in public health initiatives by helping analyze population health data, predict disease trends, or educate the public.

Prompt engineering, therefore, has the potential to improve the efficiency, accuracy, and effectiveness of health care delivery, making it an increasingly important skill for medical professionals.

This paper summarizes the current state of research on prompt engineering and, at the same time, aims at providing practical recommendations for the wide range of health care professionals to improve their interactions with LLMs.

The State of Prompt Engineering

The use of LLMs, especially ChatGPT, comes with major limitations and risks. First, since ChatGPT is not updated in real time and its training data only include information up to November 2021, it may lack crucial, up-to-date medical research or changes in clinical guidelines, potentially impacting the quality and relevance of its responses. Furthermore, ChatGPT cannot access or process individual user data or context, which limits its ability to provide personalized medical advice and increases the risk of data misinterpretation.

There is also a crucial need for users to verify every single response from ChatGPT with a qualified health care professional, as the model's answers are generated on the basis of patterns in the data it was trained on and may not be accurate or safe.

The model's inability to empathize or deliver sensitive information may also result in a subpar patient experience. Importantly, potential breaches of patient confidentiality could violate privacy laws such as the Health Insurance Portability and Accountability Act of 1996 in the United States. Despite its potential as an assistive tool, these limitations necessitate careful consideration of its application in health care [ 7 ].

While these risks are significant, the potential outcomes can outweigh them; therefore, the need for improving at designing better prompts has grown extensively since the launch of ChatGPT.

There have been attempts at addressing this issue. One study aimed at designing a catalogue of prompt engineering techniques, presented in pattern form, which have been applied to solve common problems when conversing with LLMs [ 8 ]. Another study provided a summary of the latest advances in prompt engineering for a very specific audience, researchers working in natural language processing for the medical domain, or academic writers [ 9 , 10 ]. One study introduced the potential of an AI system to generate health awareness messages through prompt engineering [ 11 ].

While there is research in the field, it is clear that there have been no comprehensive, yet practical guides for medical professionals. This is the gap that this paper aims to fill.

How to Improve at Prompt Engineering

As in the case of any essential skill, becoming better at prompt engineering would involve an improved understanding of the fundamental principles of the technology, gaining practical exposure to systems using the technology, and continually refining and iterating the skill based on feedback.

The following are some concrete steps that a health care professional can take to improve their skills in prompt engineering:

  • Understanding the underlying principles of how AI and machine learning models work can provide a foundation on which to build prompt engineering skills. As shown, it is possible to gain that understanding without any prior technical or coding knowledge [ 12 ].
  • Familiarizing themselves with the LLMs they are working with as each system has its own set of capabilities and limitations. Understanding both can help craft more effective prompts.
  • Practice makes perfect; therefore, attempting to interact with LLMs regularly and make a note of the prompts that yield the most helpful and accurate results can have benefits.

It is also important to constantly test prompts in real-world scenarios as their effectiveness is best evaluated in practical application.

Specific Recommendations for Better LLM Prompts

Besides these general approaches, here is a summary of specific recommendations with practical examples that a health care professional might want to consider to improve their skills in prompt engineering. Figure 1 summarizes these recommendations, their examples with ChatGPT’s key terms, limitations, and the most popular plugins.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e50638_fig1.jpg

A cheat sheet of prompt engineering recommendations for health care professionals with examples for each: ChatGPT’s key terms and their explanations, its limitations, and its most popular plugins. A high resolution version is attached as Multimedia Appendix 1 .

Be as Specific as Possible

The more specific the prompt, the more accurate and focused the response is likely to be. The following is an example prompt:

  • Less specific: “Tell me about heart disease.”
  • More specific: “What are the most common risk factors for coronary artery disease?”

Describe the Setting and Provide the Context Around the Question

One must consider the discussion one is having with ChatGPT as a discussion one would have with a person they just met, who might still be able to answer their questions and address one’s challenges.

The following is an example prompt: “I'm writing an article about tips and tricks for ChatGPT prompt engineering for people working in healthcare. Can you please list a few of those tips and tricks with some specific prompt examples?”

Experiment With Different Prompt Styles

The style of one’s prompt can significantly impact the answer. One can try different formats such as asking ChatGPT to generate a list about their brief or to provide a summary of the topic. The following is an example:

  • Direct question: “What are the symptoms of COVID-19?”
  • Request for a list: “List all the potential symptoms of COVID-19.”
  • Request for a summary: “Summarize the key symptoms and progression of COVID-19.”
  • Process: “Provide a step-by-step process of diagnosing COVID-19.”

Identify the Overall Goal of the Prompt First

Describe exactly what kind of output is being sought. Whether it would be getting creative ideas for an article, asking for a specific description of an advanced scientific topic, or providing a list of examples around questions, defining it helps ChatGPT come up with more relevant answers. The following is an example: “I'd like to get a list of 5 ideas for a presentation at a scientific event to make my research findings more easily understandable.”

Ask it to Play Roles

This can help streamline the desired process of obtaining the information or input one was looking for in a specific setting. With new topics without prior knowledge, it is prudent to obtain only a basic description; in addition, one can also ask ChatGPT to act as a tutor and help dive into a detailed topic step-by-step. The following are a couple of examples:

  • “Act as a Data Scientist and explain Prompt Engineering to a physician.”
  • “Act as my nutritionist and give me tips about a balanced Mediterranean diet.”

Iterate and Refine

Even if one’s skills in prompt engineering are advanced, LLMs change so dynamically that one rarely get the best response on was looking for after the first prompt attempt. Constantly iterating prompts is something with which we should get accustomed. Users of LLMs are also encouraged to ask the LLM to modify the output based on feedback on its previous response.

Use the Threads

One can navigate back to a specific discussion by clicking on the specific thread in the left column on ChatGPT’s dashboard. This way, one can build upon the details and responses one has already received in a previous thread. This can save a lot of time as there is no need to describe the same situation and all the feedback ChatGPT has received on its responses.

Ask Open-Ended Questions

Open-ended questions can provide a broader, more comprehensive understanding of the user's situation. For instance, asking “How do you feel?” rather than “Do you feel pain?” allows for a wider array of responses that can potentially provide more insight into the patient's mental, emotional, or physical state. Open-ended questions can also help to generate a larger data set for training AI models, making them more effective. Lastly, asking open-ended questions allows ChatGPT to display its potential better by leveraging its training on a diverse range of topics. This can lead to more unexpected and creative solutions or ideas that a health care professional might not have thought of. The following is an example:

  • Closed question: “Is exercise important for patients with osteoporosis?”
  • Open question: “How does regular physical activity benefit patients with osteoporosis?”

Request Examples

Asking for specific examples can help to clarify the meaning of a concept or idea, making it easier to understand. Especially with complex medical terminology or procedures, examples can provide a practical context that aids comprehension. Also, examples often help in visualizing abstract or complicated ideas. When ChatGPT provides examples, it can showcase how a certain concept or rule is applied in different scenarios. This can be beneficial in health care, where theoretical knowledge needs to be connected to real-world applications.

Temporal Awareness

This refers to the model's understanding of time-related concepts and its ability to generate contextually relevant responses based on time. Therefore, describing what time line the prompt and the desired output would be related to helps LLMs provide a more useful answer. The following is an example:

  • Without a time reference: “Describe the healing process after knee surgery.”
  • With a time reference: “What can a patient typically expect during the first six weeks of healing after knee surgery?”

Set Realistic Expectations

Knowing the limitations of AI tools such as ChatGPT is crucial, as it helps set realistic expectations about the output. For instance, ChatGPT cannot access any data or information after November, 2021; it cannot provide personalized medical advice or replace a professional's judgement. The following is an example:

  • Unrealistic prompt: “What's the latest research published this month about Alzheimer's?”
  • Realistic prompt: “What were some of the major research breakthroughs in Alzheimer's treatment up until 2021?”

Use the One-Shot/Few-Shot Prompting Method

The one-shot prompting method is one in which ChatGPT can generate an answer based on a single example or piece of context provided by the user. The following is an example:

  • Generate 10 possible names for a new digital stethoscope device.
  • A name that I like is DigSteth.

With the few-shot strategy, ChatGPT can generate an answer based on a few examples or pieces of context provided by the user. The following is an example:

  • Stethoscope

Prompting for Prompts

One of the easiest ways of improving at prompt engineering is asking ChatGPT to get involved in the process and design prompts for the user. The following is an example: “What prompt could I use right now to get a better output from you in this thread/task?”

Conclusions

As the skill of prompt engineering has gained significant interest worldwide, especially in the health care setting, it would be important to include teaching the practical methods this paper described in the medical curriculum and postgraduate education. While the technical details and background of generative AI will probably be included in future curricula, it would be useful for medical students to learn the most practical tips of using LLMs even before that happens.

The general message for every LLM user should be that they could use such AI tools to expand their knowledge, capabilities, and ideas instead of solving things on their behalf. Ideally, this approach and mindset would stem from trained medical professionals who could share it with their patients.

In summary, as more patients and medical professionals use AI-based tools—LLMs being the most popular representatives of that group—it seems inevitable to address the challenge to improve at this skill. Furthermore, as doing so does not require any technical knowledge or prior programming expertise, prompt engineering alone can be considered an essential emerging skill that helps leverage the full potential of AI in medicine and health care.

Acknowledgments

I used the generative AI tool GPT-4 (OpenAI) [ 1 ] during the ideation process to make sure the paper covers every possible prompt engineering suggestion of value. During that process, I tested the prompt engineering recommendations I made in the paper through imaginary scenarios.

Abbreviations

Multimedia appendix 1.

Conflicts of Interest: None declared.

Prompts Matter: Insights and Strategies for Prompt Engineering in Automated Software Traceability

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Subscribe to the PwC Newsletter

Join the community, edit social preview.

prompt engineering research paper

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row.

  • PROMPT ENGINEERING

Remove a task

prompt engineering research paper

Add a method

Remove a method, edit datasets, exploring prompt engineering practices in the enterprise.

13 Mar 2024  ·  Michael Desmond , Michelle Brachman · Edit social preview

Interaction with Large Language Models (LLMs) is primarily carried out via prompting. A prompt is a natural language instruction designed to elicit certain behaviour or output from a model. In theory, natural language prompts enable non-experts to interact with and leverage LLMs. However, for complex tasks and tasks with specific requirements, prompt design is not trivial. Creating effective prompts requires skill and knowledge, as well as significant iteration in order to determine model behavior, and guide the model to accomplish a particular goal. We hypothesize that the way in which users iterate on their prompts can provide insight into how they think prompting and models work, as well as the kinds of support needed for more efficient prompt engineering. To better understand prompt engineering practices, we analyzed sessions of prompt editing behavior, categorizing the parts of prompts users iterated on and the types of changes they made. We discuss design implications and future directions based on these prompt engineering practices.

Code Edit Add Remove Mark official

Tasks edit add remove, datasets edit, results from the paper edit add remove, methods edit add remove.

This paper is in the following e-collection/theme issue:

Published on 4.10.2023 in Vol 25 (2023)

Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial

Authors of this article:

Author Orcid Image

  • Bertalan Meskó, MD, PhD  

The Medical Futurist Institute, Budapest, Hungary

Corresponding Author:

Bertalan Meskó, MD, PhD

The Medical Futurist Institute

Povl Bang-Jensen u. 2/B1. 4/1.

Budapest, 1118

Phone: 36 703807260

Email: [email protected]

Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of large language models (LLMs) to help in various tasks. With the emergence of LLMs, the most popular one being ChatGPT that has attracted the attention of over a 100 million users in only 2 months, artificial intelligence (AI), especially generative AI, has become accessible for the masses. This is an unprecedented paradigm shift not only because of the use of AI becoming more widespread but also due to the possible implications of LLMs in health care. As more patients and medical professionals use AI-based tools, LLMs being the most popular representatives of that group, it seems inevitable to address the challenge to improve this skill. This paper summarizes the current state of research about prompt engineering and, at the same time, aims at providing practical recommendations for the wide range of health care professionals to improve their interactions with LLMs.

The Emergence of Large Language Models and Prompt Engineering

With the emergence of large language models (LLMs), with the most popular one being ChatGPT that has attracted the attention of over a 100 million users in only 2 months, artificial intelligence (AI), especially generative AI has become accessible for the masses [ 1 ]. This is an unprecedented paradigm shift not only because of the use of AI becoming more widespread but also due to the possible implications of LLMs in health care [ 2 ].

Numerous studies have shown what medical tasks and health care processes LLMs can contribute to in order to ease the burden on medical professionals, increase efficiency, and decrease costs [ 3 ].

Health care institutions have started investing in generative AI, medical companies have started integrating LLMs into their businesses, medical associations have released guidelines about the use of these models, and medical curricula have also started covering this novel technology [ 4 - 6 ]. Thus, a new, essential skill has emerged: prompt engineering.

Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of LLMs to help in various tasks. It is essentially the practice of effectively interacting with AI systems to optimize their benefits.

In the context of medical professionals and health care in general, this could encompass the following:

  • Decision support: medical professionals can use prompt engineering to optimize AI systems to aid in decision-making processes, such as diagnosis, treatment selection, or risk assessment.
  • Administrative assistance: prompts can be engineered to facilitate administrative tasks, such as patient scheduling, record keeping, or billing, thereby increasing efficiency.
  • Patient engagement: prompt engineering can be used to improve communication between health care providers and patients. For example, AI systems can be designed to send prompts for medication reminders, appointment scheduling, or lifestyle advice.
  • Research and development: in research scenarios, prompts can be crafted to assist in tasks such as literature reviews, data analysis, and generating hypotheses.
  • Training and education: prompts can be engineered to facilitate the education of medical professionals, including ongoing training in the latest treatments and procedures.
  • Public health: on a larger scale, prompt engineering can assist in public health initiatives by helping analyze population health data, predict disease trends, or educate the public.

Prompt engineering, therefore, has the potential to improve the efficiency, accuracy, and effectiveness of health care delivery, making it an increasingly important skill for medical professionals.

This paper summarizes the current state of research on prompt engineering and, at the same time, aims at providing practical recommendations for the wide range of health care professionals to improve their interactions with LLMs.

The State of Prompt Engineering

The use of LLMs, especially ChatGPT, comes with major limitations and risks. First, since ChatGPT is not updated in real time and its training data only include information up to November 2021, it may lack crucial, up-to-date medical research or changes in clinical guidelines, potentially impacting the quality and relevance of its responses. Furthermore, ChatGPT cannot access or process individual user data or context, which limits its ability to provide personalized medical advice and increases the risk of data misinterpretation.

There is also a crucial need for users to verify every single response from ChatGPT with a qualified health care professional, as the model's answers are generated on the basis of patterns in the data it was trained on and may not be accurate or safe.

The model's inability to empathize or deliver sensitive information may also result in a subpar patient experience. Importantly, potential breaches of patient confidentiality could violate privacy laws such as the Health Insurance Portability and Accountability Act of 1996 in the United States. Despite its potential as an assistive tool, these limitations necessitate careful consideration of its application in health care [ 7 ].

While these risks are significant, the potential outcomes can outweigh them; therefore, the need for improving at designing better prompts has grown extensively since the launch of ChatGPT.

There have been attempts at addressing this issue. One study aimed at designing a catalogue of prompt engineering techniques, presented in pattern form, which have been applied to solve common problems when conversing with LLMs [ 8 ]. Another study provided a summary of the latest advances in prompt engineering for a very specific audience, researchers working in natural language processing for the medical domain, or academic writers [ 9 , 10 ]. One study introduced the potential of an AI system to generate health awareness messages through prompt engineering [ 11 ].

While there is research in the field, it is clear that there have been no comprehensive, yet practical guides for medical professionals. This is the gap that this paper aims to fill.

How to Improve at Prompt Engineering

As in the case of any essential skill, becoming better at prompt engineering would involve an improved understanding of the fundamental principles of the technology, gaining practical exposure to systems using the technology, and continually refining and iterating the skill based on feedback.

The following are some concrete steps that a health care professional can take to improve their skills in prompt engineering:

  • Understanding the underlying principles of how AI and machine learning models work can provide a foundation on which to build prompt engineering skills. As shown, it is possible to gain that understanding without any prior technical or coding knowledge [ 12 ].
  • Familiarizing themselves with the LLMs they are working with as each system has its own set of capabilities and limitations. Understanding both can help craft more effective prompts.
  • Practice makes perfect; therefore, attempting to interact with LLMs regularly and make a note of the prompts that yield the most helpful and accurate results can have benefits.

It is also important to constantly test prompts in real-world scenarios as their effectiveness is best evaluated in practical application.

Specific Recommendations for Better LLM Prompts

Besides these general approaches, here is a summary of specific recommendations with practical examples that a health care professional might want to consider to improve their skills in prompt engineering. Figure 1 summarizes these recommendations, their examples with ChatGPT’s key terms, limitations, and the most popular plugins.

prompt engineering research paper

Be as Specific as Possible

The more specific the prompt, the more accurate and focused the response is likely to be. The following is an example prompt:

  • Less specific: “Tell me about heart disease.”
  • More specific: “What are the most common risk factors for coronary artery disease?”

Describe the Setting and Provide the Context Around the Question

One must consider the discussion one is having with ChatGPT as a discussion one would have with a person they just met, who might still be able to answer their questions and address one’s challenges.

The following is an example prompt: “I'm writing an article about tips and tricks for ChatGPT prompt engineering for people working in healthcare. Can you please list a few of those tips and tricks with some specific prompt examples?”

Experiment With Different Prompt Styles

The style of one’s prompt can significantly impact the answer. One can try different formats such as asking ChatGPT to generate a list about their brief or to provide a summary of the topic. The following is an example:

  • Direct question: “What are the symptoms of COVID-19?”
  • Request for a list: “List all the potential symptoms of COVID-19.”
  • Request for a summary: “Summarize the key symptoms and progression of COVID-19.”
  • Process: “Provide a step-by-step process of diagnosing COVID-19.”

Identify the Overall Goal of the Prompt First

Describe exactly what kind of output is being sought. Whether it would be getting creative ideas for an article, asking for a specific description of an advanced scientific topic, or providing a list of examples around questions, defining it helps ChatGPT come up with more relevant answers. The following is an example: “I'd like to get a list of 5 ideas for a presentation at a scientific event to make my research findings more easily understandable.”

Ask it to Play Roles

This can help streamline the desired process of obtaining the information or input one was looking for in a specific setting. With new topics without prior knowledge, it is prudent to obtain only a basic description; in addition, one can also ask ChatGPT to act as a tutor and help dive into a detailed topic step-by-step. The following are a couple of examples:

  • “Act as a Data Scientist and explain Prompt Engineering to a physician.”
  • “Act as my nutritionist and give me tips about a balanced Mediterranean diet.”

Iterate and Refine

Even if one’s skills in prompt engineering are advanced, LLMs change so dynamically that one rarely get the best response on was looking for after the first prompt attempt. Constantly iterating prompts is something with which we should get accustomed. Users of LLMs are also encouraged to ask the LLM to modify the output based on feedback on its previous response.

Use the Threads

One can navigate back to a specific discussion by clicking on the specific thread in the left column on ChatGPT’s dashboard. This way, one can build upon the details and responses one has already received in a previous thread. This can save a lot of time as there is no need to describe the same situation and all the feedback ChatGPT has received on its responses.

Ask Open-Ended Questions

Open-ended questions can provide a broader, more comprehensive understanding of the user's situation. For instance, asking “How do you feel?” rather than “Do you feel pain?” allows for a wider array of responses that can potentially provide more insight into the patient's mental, emotional, or physical state. Open-ended questions can also help to generate a larger data set for training AI models, making them more effective. Lastly, asking open-ended questions allows ChatGPT to display its potential better by leveraging its training on a diverse range of topics. This can lead to more unexpected and creative solutions or ideas that a health care professional might not have thought of. The following is an example:

  • Closed question: “Is exercise important for patients with osteoporosis?”
  • Open question: “How does regular physical activity benefit patients with osteoporosis?”

Request Examples

Asking for specific examples can help to clarify the meaning of a concept or idea, making it easier to understand. Especially with complex medical terminology or procedures, examples can provide a practical context that aids comprehension. Also, examples often help in visualizing abstract or complicated ideas. When ChatGPT provides examples, it can showcase how a certain concept or rule is applied in different scenarios. This can be beneficial in health care, where theoretical knowledge needs to be connected to real-world applications.

Temporal Awareness

This refers to the model's understanding of time-related concepts and its ability to generate contextually relevant responses based on time. Therefore, describing what time line the prompt and the desired output would be related to helps LLMs provide a more useful answer. The following is an example:

  • Without a time reference: “Describe the healing process after knee surgery.”
  • With a time reference: “What can a patient typically expect during the first six weeks of healing after knee surgery?”

Set Realistic Expectations

Knowing the limitations of AI tools such as ChatGPT is crucial, as it helps set realistic expectations about the output. For instance, ChatGPT cannot access any data or information after November, 2021; it cannot provide personalized medical advice or replace a professional's judgement. The following is an example:

  • Unrealistic prompt: “What's the latest research published this month about Alzheimer's?”
  • Realistic prompt: “What were some of the major research breakthroughs in Alzheimer's treatment up until 2021?”

Use the One-Shot/Few-Shot Prompting Method

The one-shot prompting method is one in which ChatGPT can generate an answer based on a single example or piece of context provided by the user. The following is an example:

  • Generate 10 possible names for a new digital stethoscope device.
  • A name that I like is DigSteth.

With the few-shot strategy, ChatGPT can generate an answer based on a few examples or pieces of context provided by the user. The following is an example:

  • Stethoscope

Prompting for Prompts

One of the easiest ways of improving at prompt engineering is asking ChatGPT to get involved in the process and design prompts for the user. The following is an example: “What prompt could I use right now to get a better output from you in this thread/task?”

Conclusions

As the skill of prompt engineering has gained significant interest worldwide, especially in the health care setting, it would be important to include teaching the practical methods this paper described in the medical curriculum and postgraduate education. While the technical details and background of generative AI will probably be included in future curricula, it would be useful for medical students to learn the most practical tips of using LLMs even before that happens.

The general message for every LLM user should be that they could use such AI tools to expand their knowledge, capabilities, and ideas instead of solving things on their behalf. Ideally, this approach and mindset would stem from trained medical professionals who could share it with their patients.

In summary, as more patients and medical professionals use AI-based tools—LLMs being the most popular representatives of that group—it seems inevitable to address the challenge to improve at this skill. Furthermore, as doing so does not require any technical knowledge or prior programming expertise, prompt engineering alone can be considered an essential emerging skill that helps leverage the full potential of AI in medicine and health care.

Acknowledgments

I used the generative AI tool GPT-4 (OpenAI) [ 1 ] during the ideation process to make sure the paper covers every possible prompt engineering suggestion of value. During that process, I tested the prompt engineering recommendations I made in the paper through imaginary scenarios.

Conflicts of Interest

None declared.

Higher resolution version of Figure 1.

  • Introducing ChatGPT. OpenAI. URL: https://openai.com/blog/chatgpt [accessed 2023-09-25]
  • Mesko B. The ChatGPT (generative artificial intelligence) revolution has made artificial intelligence approachable for medical professionals. J Med Internet Res. Jun 22, 2023;25:e48392. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel). Mar 19, 2023;11(6) [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Adams K. Google Cloud, Mayo Clinic Strike Generative AI Partnership. MedCity News. 2023. URL: https://medcitynews.com/2023/06/google-cloud-mayo-clinic-generative-ai-llm-healthcare/ [accessed 2023-09-25]
  • Lunden I. Nabla, a digital health startup, launches Copilot, using GPT-3 to turn patient conversations into action. TechCrunch. 2023. URL: https:/​/techcrunch.​com/​2023/​03/​14/​nabla-a-french-digital-health-startup-launches-copilot-using-gpt-3-to-turn-patient-conversations-into-actionable-items/​ [accessed 2023-09-25]
  • Taulli T. Generative AI: How ChatGPT and Other AI Tools Will Revolutionize Business. Berkeley, CA. Apress; 2023.
  • Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med. Jul 06, 2023;6(1):120. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Schmidt DC, Spencer-Smith J, Fu Q, White J. Cataloging Prompt Patterns to Enhance the Discipline of Prompt Engineering. URL: https://www.dre.vanderbilt.edu/~schmidt/PDF/ADA_Europe_Position_Paper.pdf [accessed 2023-09-25]
  • Wang J, Shi E, Yu S, Wu Z, Ma C, Dai H, et al. Prompt engineering for healthcare: Methodologies and applications. arXiv. Preprint posted online April 28, 2023.
  • Giray L. Prompt engineering with ChatGPT: a guide for academic writers. Ann Biomed Eng. Jun 07, 2023 [ CrossRef ] [ Medline ]
  • Lim S, Schmälzle R. Artificial intelligence for health message generation: an empirical study using a large language model (LLM) and prompt engineering. Front Commun. May 26, 2023;8 [ CrossRef ]
  • Meskó B, Görög M. A short guide for medical professionals in the era of artificial intelligence. NPJ Digit Med. Sep 24, 2020;3(1):126. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

Edited by A Mavragani; submitted 07.07.23; peer-reviewed by O Tamburis, A Zavar; comments to author 06.09.23; revised version received 14.09.23; accepted 19.09.23; published 04.10.23.

©Bertalan Meskó. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 04.10.2023.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial

Affiliation.

  • 1 The Medical Futurist Institute, Budapest, Hungary.
  • PMID: 37792434
  • PMCID: PMC10585440
  • DOI: 10.2196/50638

Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of large language models (LLMs) to help in various tasks. With the emergence of LLMs, the most popular one being ChatGPT that has attracted the attention of over a 100 million users in only 2 months, artificial intelligence (AI), especially generative AI, has become accessible for the masses. This is an unprecedented paradigm shift not only because of the use of AI becoming more widespread but also due to the possible implications of LLMs in health care. As more patients and medical professionals use AI-based tools, LLMs being the most popular representatives of that group, it seems inevitable to address the challenge to improve this skill. This paper summarizes the current state of research about prompt engineering and, at the same time, aims at providing practical recommendations for the wide range of health care professionals to improve their interactions with LLMs.

Keywords: AI; AI tool; ChatGPT; GPT-4; LLM; LLMs; NLP; artificial intelligence; chatbot; chatbots; conversational agent; conversational agents; decision-making; digital health; engineering; future; healthcare professional; language model; large language models; natural language processing; prompt; prompt engineering; prompts; technology.

©Bertalan Meskó. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 04.10.2023.

  • Artificial Intelligence*
  • Engineering*
  • Health Personnel

Prompt Engineering Guide

Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs).

Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools.

Prompt engineering is not just about designing and developing prompts. It encompasses a wide range of skills and techniques that are useful for interacting and developing with LLMs. It's an important skill to interface, build with, and understand capabilities of LLMs. You can use prompt engineering to improve safety of LLMs and build new capabilities like augmenting LLMs with domain knowledge and external tools.

Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, advanced prompting techniques, learning guides, model-specific prompting guides, lectures, references, new LLM capabilities, and tools related to prompt engineering.

Want to learn more?

We've partnered with Maven to deliver the following live cohort-based courses on prompt engineering:

LLMs for Everyone (opens in a new tab) (Beginner) - learn about the latest prompt engineering techniques and how to effectively apply them to real-world use cases.

Prompt Engineering for LLMs (opens in a new tab) (Advanced) - learn advanced prompt engineering techniques to build complex use cases and applications with LLMs.

We are now offering a special discount for our learners. Use promo code MAVENAI20 for a 20% discount.

Join us on July 11th for Arize:Observe @ Shack15 →

prompt engineering research paper

Subscribe to the Arize blog

Get the latest

On this page

Suggested reading.

Claire Longo and Duncan McKinnon headshots

How to Prompt LLMs for Text-to-SQL

  • Generative AI
  • Large Language Models
  • Paper Readings

Sarah Welsh

Contributor, introduction.

For this paper read, we’re joined by Shuaichen Chang, now an Applied Scientist at AWS AI Lab and author of this week’s paper to discuss his findings. Shuaichen’s research (conducted at the Ohio State University) investigates the impact of prompt constructions on the performance of large language models (LLMs) in the text-to-SQL task, particularly focusing on zero-shot, single-domain, and cross-domain settings. Shuaichen and his co-author explore various strategies for prompt construction, evaluating the influence of database schema, content representation, and prompt length on LLMs’ effectiveness. The findings emphasize the importance of careful consideration in constructing prompts, highlighting the crucial role of table relationships and content, the effectiveness of in-domain demonstration examples, and the significance of prompt length in cross-domain scenarios.

  • Read the original paper
  • See more paper readings

Overview of the Research

Amber Roberts: All right looks like folks are starting to join. I’ll start out with introductions. My name is Amber. I’m a ML Growth Lead. I work a lot with the marketing and engineering team to create content and material around what we’re doing here at Arize and with our open source tool, Phoenix.

I’m very happy to have Shuaichen on this community paper reading. Would you mind giving an introduction?

Shuaichen Chang: Yeah, sure. Hi, everyone. I’m Shuaichen Chang. I’m a research scientist at the AWS AI lab. And before this, I was doing my PhD at the Ohio State University. So my research is about natural language questions around structured data, including databases and tabular data.

So I’m very happy to be here today, and I’m excited for our discussion.

Amber Roberts: Excellent! Alright. Well, Shuaichen, you mentioned you had some kind of background information that you’re going to start with, and then we can get into the paper and I guess the paper that is coming out on the next work and the applications, because I know you mentioned working in industry being interested in the applications of your research.

Shuaichen Chang: Yeah, sounds good. So let me give a brief introduction. Let me share my screen. 

Amber Roberts: And then anyone that has questions while you get that up. You could ask your questions in the chat in the Q&A. Or even in the Arize community Slack, especially if you have questions that we don’t have time for during this live session. Shuaichen’s in the Arize community Slack, and can answer those afterwards.

Shuaichen Chang: I’ll  give about, let’s say a 5 or 10 minute introduction. But you can interrupt me if you have any questions. 

Okay, cool. So the paper is called How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-Domain, and Cross-Domain Settings. We focus on the study in single domain and customer settings. So the text-to-SQL model is a component in a larger natural language interface to a structured data system. That’s pretty much what my work is about.

So basically for a system, we have a human user. The human user can interact with data by asking natural language questions. And the data in here can be like a table, relational database, or even some charts or images. And the text model is a model behind the AI agent which is able to understand the user question as well as the data and generate the appropriate response.

So to look at the detail of what a text-to-SQL model does is, it takes the natural language question, which is our “NLQ” here, as well as a relational database. So text-to-SQL model takes both of them at input and outputs a SQL query.

The SQL query will be executed against the database to obtain the final answer. So in this case the database is about players and games so questions like last name on rank of the youngest winner across all matches. So by translating this question into a SQL query, which will be executed in the database will have basically all names with the youngest winners across matches.

So in the past year or two, we have seen how fast large language models have developed in almost every application. 

LLMs for Text-to-SQL with In-context Learning

So on the left [in this chart above] is the performance on a SQL dataset we used called Spider. We have seen some fine tuning-based models which ran on 7,000 examples. And on the right we can see the performance of some large language models with in context, learning, which means this model is not particularly trained on the training set up the data side, but just with a 0 or few shots prompting–so usually with less than 32 demonstration examples–those methods can be comparable or even better than the fine-tuning-based methods.

Those studies basically tried to enhance large models’ performance with different approaches. Some of them do retrieval. Some do intermediate reasoning stuff. They also use different strategies for constructing the prompt text of databases and demonstrations, and I will show how they did later.

So this leaves us with two slight drawbacks. One is, it’s hard to compare two works on their main contributions or prompt constructions right and a second, future work still has to explore the effective prompt constructions for the text-to-SQL task which is still called prompt engineering.

Amber Roberts: So, and the models that you show here, do those provide their prompt templates so that you can at least compare them like, by eye.

Shuaichen Chang: Yeah, yeah. So the problem is like. The prompts in general are different in every work. If they do any kind of learning. But here, I will say, the  part of the prompt will talk about  part of the contribution in their work. One is what they propose about how to retrieve demonstrations that really improve the model performance. But besides that, there’s something fundamental for every type of SQL paper is how you convert a structural database into unstructured text into the larger model, because we know that trend for predicting the next word. So they only observe a sequential input instead of a structured input. So even with a lot of work on the first part, the second part is still under study–how to basically cover the structured database into unstructured text.

Amber Roberts: Okay, got it. Thanks. 

Shuaichen Chang: So this is our focus in this paper. Our goal is to study how to represent the structure. Basically, I’m trying to say here, the structured database in the unstructured prompt. We study three common scenario. You may have questions about what the difference is between them, but we can go into that a little bit later. We also want to study how to construct the demonstration for the cross domain setting.

The zero-shot setting is the most straightforward. So imagine the model is a SQL expert. You can just give the model a database and a question, and simply expect the model to return the answer to your question given the database. So in this part, the prompt text contains the database information, including database schema and database content as well as instruction. Instruction is usually pretty simple and says what we expect the model would do given the input and the actual input questions, which is in here:  how many high schoolers are there? It’s a database about high school students, I believe.  And we’d expect the model to generate a SQL query that would be able to get the answer from the database.

So this is the most straightforward setting. 

In this case we have a different way to represent the database information, because, how to represent the structure information into unstructured texts. So apparently the most important thing about databases is what table is there and what columns are in each table?

So this information is always necessary in almost every work. But some works also found different ways to represent some information. Besides the basic table information we have the information. The relational database contains the information in of course, multiple tables through the foreign keys, basically. So we can see all the friends in the front table, student ID is a key referring to the ID in high school. So such information can be also included in the database prompt. And, moreover, we have some other papers proposed that allows us to represent all this information in a simple table format, which is basically using a SQL query to create such a database. And this will be the input of the lateral model to represent the information in the database.

So, besides database schema, we have the database content and database content, we can include pretty much for every table with these three different rows for a table. So if we inserted more rows, using the SQL query, would that simply give us more tables? Or we propose to do that column-wise, to increase the model’s understanding of what is possible value in each column. 

So, to sum up what we found in a zero-shot setting is that the type of schema is definitely always necessary, but adding the table’s relationship which is representing a foreign key, we consistently see that the model can outperform the prompt that is now using the relationship. So we showed that relationships are very helpful for the model to understand the database. Moreover, we found adding the database content to be even more beneficial. But the model doesn’t favor all ways to provide such database content.

Now let’s move to the single domain setting.

So people say: well, zero-shot settings already works so well, why do we need a single domain-setting? With single domain basically, you can imagine you have an application that you know what possible query has been asked in the past, right? So, knowing those questions that have been asked in the past will definitely have the model understand the database will know what questions are more likely to be asked in the future, and therefore it improves the understanding of future questions.

So, like the zero-shot setting, we just have one more different thing which is the demonstrations. The demonstrations are the power of questions and SQL queries this demonstration used before the test question. So we find, after having the in-domain demonstration, we find a model performance can continue enhanced when the number of in-domain demonstrations increase. So this definitely shows how powerful the demonstration is and, moreover, we find the model relies less on the table relationship when we have more examples. So remember on zero-shot with these two parallel lines [referring to chart below] there’s a gap in here, and when we have just examples, they basically merge. So the model doesn’t need that type of relationship from the database prompt, they can learn that from the in-domain examples.

Single-domain Text-to-SQL Results

And however we find such a domain example doesn’t provide enough information about the database content. So the larger model still would like the prompt that contains the database content directly. But by having some in domain examples, the larger model is not sensitive to the choice of prompt representation for the database content.  

Amber Roberts: And just to clarify on the table here. So that’s the Codex language model. And using ChatGPT. This is probably a future question but have you compared this to GPT-4 or Llama II,  some of the larger models that tend to do really well with context like this and structure?

Shuaichen Chang: Yeah, that’s a good question. We actually determined that with some open source models which actually, now, don’t perform as well as ChatGPT. But I personally have never tried GPT-4 in this experiment–it could be pretty expensive to run this with GPT-4.

But in general my personal takeaway is that such information about the database is not something that as the model becomes more powerful, they can naturally get. They have to be exposed to the information of this specific database to perform better on those tasks.

So I believe in-domain demonstration is always necessary. Even the power, the model, can be even more powerful and developed way better in the future.  

Amber Roberts: Okay, yeah. That makes sense. And then, in terms of the accuracy here. For how correct a query is, are you comparing like a  SQL query that you or someone on your team has written, that is correct, and runs and executes compared to the query that has been generated by this LLM, and then how do you get that 82% accuracy, for example?

Shuaichen Chang: Yes. So the accuracy is based on execution resolve. So we actually have a database containing a lot of rows or information inside the database. So, given a SQL query can execute a set of answer to the question. So we compare why the SQL query is correct. Then we use that query to skew that against the database to get a set of answers. We compare that with the original answer to determine whether the SQL query is correct because, you know, the same intent can be writing and SQL querying in many different ways.   

Amber Roberts: Yeah. But for someone that has used sequel and has almost gotten it correct. Almost gotten all the data that I need from a database. It’s interesting because I sometimes say: Well, it returned data, I must have gotten it right. But then there’s one area, or like one line that didn’t work and you just didn’t get that amount of data.

Shuaichen Chang: Yeah. So this actually, a pretty interesting research track just using this execution result to have the model correct yourself in a future prediction. This is not included in this work, but it’s definitely a very good direction to explore.

So then, we move to the last part of this work which is the cross-domain setting. So you may want to know what the cross-domain setting is for. So basically, you can assume that the database doesn’t come with any annotated example. In this case you can imagine the user uploading a table or relation database and another user may ask questions about it. But in this case, nobody has annotated any questions or SQL queries about the database before. And this is very similar to the zero-short setting. But a difference is people found that by using the database and an example from another domain or another database could actually enhance the model’s performance.  So it’s not as good as in-domain, but it’s definitely better than zero-shot. That’s why a lot of people are exploring this direction. In this case, we construct a prompt using the demonstration. 

They are corresponding to a different database than the test database. So we ideally want to answer the question better about the database on the high school database, but we can use another database. Here’s the track database, and you send a question about this database to improve the performance of the larger model on another database.

So first of all, we want to know why if it’s not providing any benefit, why don’t we just switch back to the zero-shot setting.

Compared to not using any example which gave you about 73% accuracy I can see that having some out of domain demonstration could improve the model’s performance. But in the out of domain demonstration we consider a different number of databases. Each database can come with a different number of examples. 

So, the question is: given a certain number of databases we see that having more databases will first improve the performance but it sort of had a threshold for the model codex And then, after we cross this threshold, the model’s performance started to drop. So it was a little bit surprising because by having more database in the demonstration we always are a database ahead of the existing one. So if the model doesn’t like the specific demonstration, it can simply ignore that. So at least the model should be able to maintain its performance if it’s given longer context or more demonstration examples. But we find the model performance starts to drop significantly when we have more examples.

So we analyze this phenomenon and find out it’s actually related to the context lens What and how many tokens we have in the context. So for Codex, it has the maximum context lens provided by OpenAI at 8,000 and we found after the demonstration was about 5500 the performance of the model started to drop. We also experimented with ChatGPT 16K context

So the model supports 16K tokens in contact. And very just interestingly, we find a model type of similar performance. The phenomena after the contacts cause about 11K. So if you think about it. for  different models, all 70% of the context is actually useful for SQL demonstration.

It could be true for other models. But I haven’t experimented with that myself. But at least for this  model, or from OpenAI we see a similar trend.  

Amber Roberts: So just the one example per database–essentially everything after that you get that improvement. But then it just kind of stops adding to the improvement as you add more and more examples?

Shuaichen Chang: Yeah, I believe if the model has unlimited long context, we will see this probably going up, and eventually should be similar to those lines above this one. But since the database schema is very long and has a lot of tokens, it’s very easy to let’s say we have an admissible line here very easy to cause that to make the performance drop before it actually improves more.

Amber Roberts: I still find that like pretty impressive compared to so that’s like  out of every  queries, still getting it correct, even with  like one example per database. Okay. That was better than my accuracy when I was first querying databases with SQL.

Shuaichen Chang: Yeah. But those models used a lot of training data.

Amber Roberts: Yeah, true.

Shuaichen Chang: Yeah. I think in general, after we find this in types of SQL tasks, I also found other people talk about similar phenomena in different tasks for different setting that the model may not be able to leverage the full context in different tasks. So yeah, I think it could be general. But at least in this task we find this phenomena with existing multiple models.

 So now let’s back to the question we care about from the beginning, like how to construct the database prompt. So we find that the table’s relationship to the content is still important. Even with out of domain demonstration. So we can see by comparing this  different part of the table, we can say it’s still better with the relationship accounts to provide the best performance even with a lot of auto domain examples. 

It makes some sense to me, because, like, think about the relationship of the tables and database content. They’re very database specific. So even providing some examples from other databases, the model could leverage or understand what a relationship means. But it still has no idea about what this database in terms of the relationship to database content. So overall, I really hope this work can be like a handbook or something for future study, like when they want to work on types of SQL tasks with in-context learning, they don’t have to explore all different combinations of how to construct the database or how to construct schema content stuff. So we can spend less time on those things and focus on how to improve the general capability of types of SQLs for those models.

Amber Roberts: Yes, no, I agree with that. And then the other thing I was just going to ask, because someone in community was mentioning that they tackled a similar problem. I think this was a while ago, right when transformers and NLP was really taking off probably like around 2018 when these tasks were first becoming like, Oh, you don’t just have to write a rule based system from text to sequel. You can actually use NLP for these problems. And a lot of people thinking, oh we could get to 100% accuracy. You know, it’s pretty structured. And we’re still not there. I’m curious. What do you think is  the biggest thing blocking from that, like 90 to 100% performance?

Shuaichen Chang: So in my opinion. If you look at how much the in-domain example might have. I believe, like for the model–right now We’re not still not at 90. But let’s say in a few years we have GPT10 or whatever , it could possibly do pretty well like, understand the question perfect. But something is missing in what a database means. Some databases you can assume is like a wild design or wild format. So this one perfect, without any questions or anything. You understand what’s going on in the database? Right? But in industry now, a lot of databases have some, let’s say, arbitrary tokens in a column. Right? They have some, let’s say the internal knowledge to name some column or tables. In that case, since your data is private and hopefully not trained by any other models. And then why can we expect that model understand that specific database so basic cannot. So in that case, let’s say, we have a column itself like a name or grade or Id. We have a column like A, B , right? It’s random token there. So we don’t really know what’s going on in that column. So we definitely need some, for example, the in domain examples, or some explanation to that column for the model to understand.

So I feel like from 90 to 100  is not about how we input the model is about. It’s like, no human can do that without any further information. Right? So in that case, I feel like the next step is maybe how we can easily add some we call domain knowledge into the model, like by providing other give it con. Give the domain knowledge along with this database will find other way to like, say, generally, some example using a domain knowledge. So that model to understand the database better first and then answer the question about a database.

Amber Roberts: Okay, yeah.  That makes sense. I buy it. [Laughs] 

Shuaichen Chang: Well, this is actually pretty close to what we’ll be trying to do in the next one. If we have time, we can briefly talk about that?

Amber Roberts: Yeah, I think we have 15 min left. And yeah, pretty much all the images that you used on here were the ones I was gonna ask you about in the paper. So yeah, if you wanna go to your main paper takeaways we could talk about what you’re working on now in applications.

Shuaichen Chang: Yeah. But before that, do we have more questions?

Amber Roberts: Let’s see I think I think we got them all. Yeah, I think we got them all. I will have Sarah check in the community Slack. But from what I see from the chat, they’ve been answered in the questions that I’ve asked.

Shuaichen Chang: okay, cool. So basically, the next one is inspired by what makes the in-domain demonstration so useful. Right? So this is another work which is at MIT this year. So it was called “selective demonstration for customer text-to-SQL.” I personally really like the customer setting because you assume that you don’t have any example, what? What doesn’t rely on any actual annotating in-domain example. So in this case the database provider can be a non SQL expert and database user can be a non SQL expert. So basically, anyone can use the system if we create a good customer text-to-SQL model. 

But before we understand the customer, we want to understand what’s the magic part in the in-domain demonstration? So this is what we found in the Spider dataset is the same as what we talked about before. It’s another data set which also has customer text-to-SQL data. But this has a little bit more challenge than the Spider data. So we can see the zero-shot setting the codex model performed pretty good on Spider, not that good on KaggleDBQA, but having the in- domain example, kind of significant, boost the performance on both sides. So Spider from 36 to 84 and KaggleDBQA from 30 to below 70. 

So we want to know what actually the in domain demonstration did here.

So the two questions we care a lot about here is first what are the key factors within the in-domain annotated example that contribute to the performance improvement. And the second is, can we harness the benefit of this in domain example without actually using the annotation?

There are three different spikes within the in-domain demonstration examples. The first is the text-to-SQL task format and knowledge, which is the general knowledge about this task. And the second is the in-domain natural language question distribution, and the last is the in-domain, SQL query distribution. So we have seen that, providing the test knowledge is pretty easy. We can just simply use the auto domain example to do that. So by having an out of domain example, we do see the performance increase in both data sets, but definitely not comparable to using the actual in-domain example. So we know. Okay, the task knowledge is beneficial, but definitely, not sufficient.

However, we do a simple play on the data. We create a mismatch, natural question and SQL queries in the demonstration. So we find a model basically could not leverage this demonstration for in-domain. So that shows us the task knowledge, even though it’s not sufficient, it’s still necessary in this task. 

And then we study the in-domain NLQ and SQL distribution. So for NLQ, we use the actual natural language question, with the SQL queries that are generated by this codex model in the zero-shot setting.

In this case, the model is close to the actual natural language questions. But the SQL query is predicted by itself. So it doesn’t bring any actual knowledge. Similarly, we can give the model, say, the actual SQL query, but using the natural language that are predicted by itself. So in each setting the model is exposed to one knowledge, but not aware of the other. So we find that by having the SQL distribution, which is using the actual sequel, but predicting natural question, the model could learn almost the same as using the actual in-domain example. So it’s very close in Spider, being slightly larger in KaggleDBQA but much better than all the other settings.

So from this analysis we kind of know…maybe the SQL query about the in-domain, about a database, basically, and have the model understand something about the database. So in the future the model could answer the question better about the same database.

Questions about the Research

Amber Roberts: I have a question from our community. Member Greg, asking about: there’s been a lot of push towards vector databases. Do vector databases fit into the work that’s being done here with SQL because, we’re using a lot of relational databases compared to now, new vector databases that are being used in these large language models.

That sounds like a complicated question. I don’t know if you have thoughts about this for like where vector databases might fit in.

I think he meant using vector databases as opposed to relational databases, which I’m not sure if SQL would be used for vector databases. I don’t know if I would if they would be used together. I don’t know if you’ve come across this?

Shuaichen Chang: So I didn’t think about this scenario exactly, because SQL query was designed for a relational database. But for vector DB I feel like, maybe a model can do that pretty well but in a different way. The easiest way that we use the larger model is just simply give a test input right? We can definitely fine-tune that with embedding. But the easiest way for people who don’t understand AI or machine learning the way they use it is just simply just give a a natural language input in that case, the database may just easier to embed it into the larger model. Maybe we will have some larger model that are designed for like the stuff. You know, we could have the training data, the transcription models.

Amber Roberts: Interesting? Yeah, that’s an interesting question from community. Yeah. So getting back to these  main questions here? How do you think they’re demonstrated? So these are these the main questions that you’re trying to solve?

Shuaichen Chang: Yes. So we we sort of have seen the answer for the first question. I’m quickly going to the second one, how we can harness the benefit of the in-domain example.

Amber Roberts: Okay, perfect.

Shuaichen Chang: Yeah. So even though we know that okay, we found the in-domain SQL distribution is important but it’s not always available, right? You have to annotate. So first of all, you have to know what questions the user may actually ask, and annotate those questions into a SQL query, and then you can say, okay, I can use those to improve the model performance. But for, let’s say in other cases we don’t have the resources for such annotation. How can we still get the benefit? How can we leverage the funding here to create a better model?

My solution is pretty simple. So we try to sync some data about a specific database without seeing any actual examples. So here basically what we’re trying to do is we use this query. But since that example may not always be correct, especially after the query we also want to generate some natural language questions, so the questions may not be that show may not align the SQL query perfectly.

So in this case it may not provide the correct text-to-SQL knowledge. When we have seen that the correct text-to-SQL is necessary for the model. So that’s why we propose to use the hybrid source of demonstration containing both out-of-domain demonstrations. And in-domain synthetic demonstrations.

For out-of-domain, we basically tried to retrieve the out-of-domain examples that are similar to the text examples so that the language model can learn something from the out-of-domain example and use the knowledge into the test example. So we first did SQL prediction. And we used the initial SQL prediction to do the retrieval.

And for in-domain. We first need to sample the in-domain data. We sample some SQL query by using a SQL template from auto domain, example, auto domain SQL. And then we translate that SQL into our queue. And then we have a verification state which is pretty tricky. So we first translate subsequent to a natural language question, and then we translate our queue back to a SQL Query. So if the SQL query had the same execution resolved, then we can assume that the translation was correct in both directions, and then we keep this example.

So after we have the in-domain example, we do some retrieval. Similarly, we want to retrieve the example that is similar to the test question. So we maximized coverage retrieval here. We found that by using the hybrid source of demonstration, we found with our proposed framework outperformed all the baseline models as well as only using out of domain or only using in domain examples.

In general, I found the proposed methods pretty useful for multiple statistical data sets with multiple models.

Amber Roberts: We did get some discussion around that database. A lot of people are having to leave, because I think there’s only 60s left. But a lot of people are thanking you, Shao chien for the information. The talk. I think a lot of people have been trying to do something similar with the GPT-4 models. So one member of our community said that just by adding in a few examples in the prompt engineering really helped for a particular database that they were using to create sequel queries. So it’s great seeing more work being done this area. And then someone from community mentioned that basically he agrees with you like SQL would just not be as appropriate for a vector database because of the space relations that the relational database is built on. And how SQL inherently operates. And so that’s very interesting to see there. 

We are out of time. But thank you so much for going through your paper. Your latest results. Those are great great performance scores, and I can’t wait to see, you know, when it’s just hitting 100 every time for these examples, and I’m also interested to see how this will apply to more industry topics. But thanks everyone for joining, and I hope you all have a great rest of your week.

Shuaichen Chang: Yeah, thank you for having me.

Sign up for our monthly newsletter, The Drift.

Copyright © 2024 Arize AI, Inc

Subscribe to get the latest news, expertise, and product updates from Arize. Your inbox is sacred, so we’ll only curate and send the best stuff.

*We’re committed to your privacy. Arize uses the information you provide to contact you about relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our privacy policy .

Like what you see? Let’s chat. Fill out this form and we will be in contact with you soon!

prompt engineering research paper

IMAGES

  1. ImageNet-A Benchmark (Prompt Engineering)

    prompt engineering research paper

  2. Write an Engineering Research Paper with Our Services

    prompt engineering research paper

  3. FREE 42+ Research Paper Examples in PDF

    prompt engineering research paper

  4. Draft For Research Paper Example : How to Write an APA Research Paper

    prompt engineering research paper

  5. Prompt Engineering: A Handbook for Prompt Engineering, NLP Engineers

    prompt engineering research paper

  6. How to Write an Engineering Research Paper: Tips from Experts

    prompt engineering research paper

VIDEO

  1. How Prompt Engineering Works

  2. Paper session 20: Recommendation Systems

  3. Read a paper: Automatic Prompt Engineering

  4. Supercharge Your Prompt Engineering Skills: Discover the 5 Key Principles

  5. The Truth About Prompt Engineering

  6. PROMPT ENGINEERING

COMMENTS

  1. [2302.11382] A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    Learn how to use prompts, instructions given to large language models (LLMs) to customize their output and interactions, from a catalog of patterns for common problems. This paper documents and applies prompt engineering techniques to solve common problems when conversing with ChatGPT, such as software development tasks.

  2. Papers

    Papers. The following are the latest papers (sorted by release date) on prompt engineering for large language models (LLMs). We update the list of papers on a daily/weekly basis. Overviews. Prompt Design and Engineering: Introduction and Advanced Methods (opens in a new tab) (January 2024)

  3. Prompt Engineering

    3. Paper. Code. **Prompt engineering** is the process of designing and refining the prompts used to generate text from language models, such as GPT-3 or similar models. The goal of prompt engineering is to improve the quality and relevance of the generated text by carefully crafting the prompts to elicit the desired responses from the model.

  4. [2311.05661] Prompt Engineering a Prompt Engineer

    Prompt engineering is a challenging yet crucial task for optimizing the performance of large language models on customized tasks. It requires complex reasoning to examine the model's errors, hypothesize what is missing or misleading in the current prompt, and communicate the task with clarity. While recent works indicate that large language models can be meta-prompted to perform automatic ...

  5. Prompting Change: Exploring Prompt Engineering in Large ...

    This paper explores the transformative potential of Large Language Models Artificial Intelligence (LLM AI) in educational contexts, particularly focusing on the innovative practice of prompt engineering. Prompt engineering, characterized by three essential components of content knowledge, critical thinking, and iterative design, emerges as a key mechanism to access the transformative ...

  6. GitHub

    Must-read papers on prompt-based tuning for pre-trained language models. - thunlp/PromptPapers ... Machine Intelligence Research. Tianxiang Sun, Xiangyang Liu, Xipeng Qiu, Xuanjing Huang , 2021.9. Pilot Work ... A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, ...

  7. (PDF) A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    This paper provides the following contributions to research on prompt engineering that apply LLMs to automate software development tasks. First, it provides a framework for documenting patterns ...

  8. GitHub

    Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs).

  9. (PDF) Generative AI and Prompt Engineering: The Art of ...

    Findings This paper delves into the world of prompt engineering, its alignment with the existing roles and expertise of librarians, and the potential emergence of a new role known as the "prompt ...

  10. PDF A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    This paper provides the following contributions to research on prompt engineering that apply LLMs to automate software de-velopment tasks. First, it provides a framework for documenting patterns for structuring prompts to solve a range of problems so that they can be adapted to different domains. Second, it

  11. [2310.14735] Unleashing the potential of prompt engineering in Large

    This paper delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). Prompt engineering is the process of structuring input text for LLMs and is a technique integral to optimizing the efficacy of LLMs. ... We subsequently delineate prospective directions in prompt engineering research ...

  12. Prompt Engineering as an Important Emerging Skill for Medical

    Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of large language models (LLMs) to help in various tasks. With the emergence of LLMs, the most popular one being ChatGPT that has attracted the attention of over a 100 ...

  13. (PDF) Prompt Engineering For ChatGPT: A Quick Guide To ...

    This article provides a comprehensive guide to mastering prompt engineering techniques, tips, and best practices to achieve optimal outcomes with ChatGPT. The discussion begins with an ...

  14. Prompts Matter: Insights and Strategies for Prompt Engineering in

    Large Language Models (LLMs) have the potential to revolutionize automated traceability by overcoming the challenges faced by previous methods and introducing new possibilities. However, the optimal utilization of LLMs for automated traceability remains unclear. This paper explores the process of prompt engineering to extract link predictions from an LLM. We provide detailed insights into our ...

  15. Exploring Prompt Engineering Practices in the Enterprise

    Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. ... as well as the kinds of support needed for more efficient prompt engineering. To better understand prompt engineering practices, we analyzed sessions of prompt editing behavior, categorizing the parts of prompts users iterated ...

  16. PDF Prompting AI Art: An Investigation into the Creative Skill of Prompt

    This paper delves into prompt engineering as a novel creative skill for creating AI art with text-to-image generation. In a ... As generative models become more widespread, prompt engineering has become an important research area on how humans interact with AI [3, 11-13, 23, 27, 38, 56].

  17. Prompt Engineering for Large Language Models by Andrew Gao

    Users interact with large language models through "prompts'', or natural language instructions. Carefully designed prompts can lead to significantly better outputs. In this review, common strategies for LLM prompt engineering will be explained. Additionally, considerations, recommended resources, and current directions of research on LLM ...

  18. Journal of Medical Internet Research

    Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of large language models (LLMs) to help in various tasks. With the emergence of LLMs, the most popular one being ChatGPT that has attracted the attention of over a 100 million users in only 2 months, artificial intelligence ...

  19. Prompt Engineering as an Important Emerging Skill for Medical ...

    Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of large language models (LLMs) to help in various tasks. ... This paper summarizes the current state of research about prompt engineering and, at the same time, aims at ...

  20. Prompt engineering

    Use the following step-by-step instructions to respond to user inputs. Step 1 - The user will provide you with text in triple quotes. Summarize this text in one sentence with a prefix that says "Summary: ". Step 2 - Translate the summary from Step 1 into Spanish, with a prefix that says "Translation: ".

  21. (PDF) prompt engineering

    Abstract. Prompt engineering is the process of crafting prompts that are effective at eliciting desired responses from large language models (LLMs). In the context of biomedical engineering ...

  22. PDF Prompt Engineering for Healthcare: Methodologies and Applications

    Figure 2: A graphical representation is utilized to depict the number of research papers on prompt engineering for NLP in the medical domain, published from 2019 to April 6, 2023, revealing the trend and growth of this eld over time. 1.2 Literature Search Process The importance of NLP tasks in the medical eld cannot be overstated. With

  23. Prompt Engineering

    Find articles, research reports, white papers, videos, and more related to prompt engineering. "What is more likely is that gen AI will augment rather than forge new roles.

  24. Prompt Engineering Guide

    Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). Researchers use prompt engineering to ...

  25. How to Prompt LLMs for Text-to-SQL

    We discuss the paper "How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings" with researcher Shuaichen Chang. ... Questions about the Research. ... So one member of our community said that just by adding in a few examples in the prompt engineering really helped for a particular database that they ...