• Reference Manager
  • Simple TEXT file

People also looked at

Systematic review article, are we there yet - a systematic literature review on chatbots in education.

www.frontiersin.org

  • 1 Information Center for Education, DIPF | Leibniz Institute for Research and Information in Education, Frankfurt am Main, Germany
  • 2 Educational Science Faculty, Open University of the Netherlands, Heerlen, Netherlands
  • 3 Computer Science Faculty, Goethe University, Frankfurt am Main, Germany

Chatbots are a promising technology with the potential to enhance workplaces and everyday life. In terms of scalability and accessibility, they also offer unique possibilities as communication and information tools for digital learning. In this paper, we present a systematic literature review investigating the areas of education where chatbots have already been applied, explore the pedagogical roles of chatbots, the use of chatbots for mentoring purposes, and their potential to personalize education. We conducted a preliminary analysis of 2,678 publications to perform this literature review, which allowed us to identify 74 relevant publications for chatbots’ application in education. Through this, we address five research questions that, together, allow us to explore the current state-of-the-art of this educational technology. We conclude our systematic review by pointing to three main research challenges: 1) Aligning chatbot evaluations with implementation objectives, 2) Exploring the potential of chatbots for mentoring students, and 3) Exploring and leveraging adaptation capabilities of chatbots. For all three challenges, we discuss opportunities for future research.

Introduction

Educational Technologies enable distance learning models and provide students with the opportunity to learn at their own pace. They have found their way into schools and higher education institutions through Learning Management Systems and Massive Open Online Courses, enabling teachers to scale up good teaching practices ( Ferguson and Sharples, 2014 ) and allowing students to access learning material ubiquitously ( Virtanen et al., 2018 ).

Despite the innovative power of educational technologies, most commonly used technologies do not substantially change teachers’ role. Typical teaching activities like providing students with feedback, motivating them, or adapting course content to specific student groups are still entrusted exclusively to teachers, even in digital learning environments. This can lead to the teacher-bandwidth problem ( Wiley and Edwards, 2002 ), the result of a shortage of teaching staff to provide highly informative and competence-oriented feedback at large scale. Nowadays, however, computers and other digital devices open up far-reaching possibilities that have not yet been fully exploited. For example, incorporating process data can provide students with insights into their learning progress and bring new possibilities for formative feedback, self-reflection, and competence development ( Quincey et al., 2019 ). According to ( Hattie, 2009 ), feedback in terms of learning success has a mean effect size of d = 0.75, while ( Wisniewski et al., 2019 ) even report a mean effect of d = 0.99 for highly informative feedback. Such feedback provides suitable conditions for self-directed learning ( Winne and Hadwin, 2008 ) and effective metacognitive control of the learning process ( Nelson and Narens, 1994 ).

One of the educational technologies designed to provide actionable feedback in this regard is Learning Analytics. Learning Analytics is defined as the research area that focuses on collecting traces that learners leave behind and using those traces to improve learning ( Duval and Verbert, 2012 ; Greller and Drachsler, 2012 ). Learning Analytics can be used both by students to reflect on their own learning progress and by teachers to continuously assess the students’ efforts and provide actionable feedback. Another relevant educational technology is Intelligent Tutoring Systems. Intelligent Tutoring Systems are defined as computerized learning environments that incorporate computational models ( Graesser et al., 2001 ) and provide feedback based on learning progress. Educational technologies specifically focused on feedback for help-seekers, comparable to raising hands in the classroom, are Dialogue Systems and Pedagogical Conversational Agents ( Lester et al., 1997 ). These technologies can simulate conversational partners and provide feedback through natural language ( McLoughlin and Oliver, 1998 ).

Research in this area has recently focused on chatbot technology, a subtype of dialog systems, as several technological platforms have matured and led to applications in various domains. Chatbots incorporate generic language models extracted from large parts of the Internet and enable feedback by limiting themselves to text or voice interfaces. For this reason, they have also been proposed and researched for a variety of applications in education ( Winkler and Soellner, 2018 ). Recent literature reviews on chatbots in education ( Winkler and Soellner, 2018 ; Hobert, 2019a ; Hobert and Meyer von Wolff, 2019 ; Jung et al., 2020 ; Pérez et al., 2020 ; Smutny and Schreiberova, 2020 ; Pérez-Marín, 2021 ) have reported on such applications as well as design guidelines, evaluation possibilities, and effects of chatbots in education.

In this paper, we contribute to the state-of-the-art of chatbots in education by presenting a systematic literature review, where we examine so-far unexplored areas such as implementation objectives, pedagogical roles, mentoring scenarios, the adaptations of chatbots to learners, and application domains. This paper is structured as follows: First, we review related work (section 2), derive research questions from it, then explain the applied method for searching related studies (section 3), followed by the results (section 4), and finally, we discuss the findings (section 5) and point to future research directions in the field (section 5).

Related Work

In order to accurately cover the field of research and deal with the plethora of terms for chatbots in the literature (e.g. chatbot, dialogue system or pedagogical conversational agent) we propose the following definition:

Chatbots are digital systems that can be interacted with entirely through natural language via text or voice interfaces. They are intended to automate conversations by simulating a human conversation partner and can be integrated into software, such as online platforms, digital assistants, or be interfaced through messaging services.

Outside of education, typical applications of chatbots are in customer service ( Xu et al., 2017 ), counseling of hospital patients ( Vaidyam et al., 2019 ), or information services in smart speakers ( Ram et al., 2018 ). One central element of chatbots is the intent classification, also named the Natural Language Understanding (NLU) component, which is responsible for the sense-making of human input data. Looking at the current advances in chatbot software development, it seems that this technology’s goal is to pass the Turing Test ( Saygin et al., 2000 ) one day, which could make chatbots effective educational tools. Therefore, we ask ourselves “ Are we there yet? - Will we soon have an autonomous chatbot for every learner?”

To understand and underline the current need for research in the use of chatbots in education, we first examined the existing literature, focusing on comprehensive literature reviews. By looking at research questions in these literature reviews, we identified 21 different research topics and extracted findings accordingly. To structure research topics and findings in a comprehensible way, a three-stage clustering process was applied. While the first stage consisted of coding research topics by keywords, the second stage was applied to form overarching research categories ( Table 1 ). In the final stage, the findings within each research category were clustered to identify and structure commonalities within the literature reviews. The result is a concept map, which consists of four major categories. Those categories are CAT1. Applications of Chatbots, CAT2. Chatbot Designs, CAT3. Evaluation of Chatbots and CAT4. Educational Effects of Chatbots. To standardize the terminology and concepts applied, we present the findings of each category in a separate sub-section, respectively ( see Figure 1 , Figure 2 , Figure 3 , and Figure 4 ) and extended it with the outcomes of our own literature study that will be reported in the remaining parts of this article. Due to the size of the concept map a full version can be found in Appendix A .

www.frontiersin.org

TABLE 1 . Assignment of coded research topics identified in related literature reviews to research categories.

www.frontiersin.org

FIGURE 1 . Applications of chatbots in related literature reviews (CAT1).

www.frontiersin.org

FIGURE 2 . Chatbot designs in related literature reviews (CAT2).

www.frontiersin.org

FIGURE 3 . Evaluation of chatbots in related literature reviews (CAT3).

www.frontiersin.org

FIGURE 4 . Educational Effects of chatbots in related literature reviews (CAT4).

Regarding the applications of chatbots (CAT1), application clusters (AC) and application statistics (AS) have been described in the literature, which we visualized in Figure 1 . The study of ( Pérez et al., 2020 ) identifies two application clusters, defined through chatbot activities: “service-oriented chatbots” and “teaching-oriented chatbots.” ( Winkler and Soellner, 2018 ) identify applications clusters by naming the domains “health and well-being interventions,” “language learning,” “feedback and metacognitive thinking” as well as “motivation and self-efficacy.” Concerning application statistics (AS), ( Smutny and Schreiberova, 2020 ) found that nearly 47% of the analyzed chatbots incorporate informing actions, and 18% support language learning by elaborating on chatbots integrated into the social media platform Facebook. Besides, the chatbots studied had a strong tendency to use English, at 89%. This high number aligns with results from ( Pérez-Marín, 2021 ), where 75% of observed agents, as a related technology, were designed to interact in the English language. ( Pérez-Marín, 2021 ) also shows that 42% of the analyzed chatbots had mixed interaction modalities. Finally, ( Hobert and Meyer von Wolff, 2019 ) observed that only 25% of examined chatbots were incorporated in formal learning settings, the majority of published material focuses on student-chatbot interaction only and does not enable student-student communication, as well as nearly two-thirds of the analyzed chatbots center only on a single domain. Overall, we can summarize that so far there are six application clusters for chatbots for education categorized by chatbot activities or domains. The provided statistics allow for a clearer understanding regarding the prevalence of chatbots applications in education ( see Figure 1 ).

Regarding chatbot designs (CAT2), most of the research questions concerned with chatbots in education can be assigned to this category. We found three aspects in this category visualized in Figure 2 : Personality (PS), Process Pipeline (PP), and Design Classifications (DC). Within these, most research questions can be assigned to Design Classifications (DC), which are separated into Classification Aspects (DC2) and Classification Frameworks (DC1). One classification framework is defined through “flow chatbots,” “artificially intelligent chatbots,” “chatbots with integrated speech recognition,” as well as “chatbots with integrated context-data” by ( Winkler and Soellner, 2018 ). A second classification framework by ( Pérez-Marín, 2021 ) covers pedagogy, social, and HCI features of chatbots and agents, which themselves can be further subdivided into more detailed aspects. Other Classification Aspects (DC2) derived from several publications, provide another classification schema, which distinguishes between “retrieval vs. generative” based technology, the “ability to incorporate context data,” and “speech or text interface” ( Winkler and Soellner, 2018 ; Smutny and Schreiberova, 2020 ). By specifying text interfaces as “Button-Based” or “Keyword Recognition-Based” ( Smutny and Schreiberova, 2020 ), text interfaces can be subdivided. Furthermore, a comparison of speech and text interfaces ( Jung et al., 2020 ) shows that text interfaces have advantages for conveying information, and speech interfaces have advantages for affective support. The second aspect of CAT2 concerns the chatbot processing pipeline (PP), highlighting user interface and back-end importance ( Pérez et al., 2020 ). Finally, ( Jung et al., 2020 ) focuses on the third aspect, the personality of chatbots (PS). Here, the study derives four guidelines helpful in education: positive or neutral emotional expressions, a limited amount of animated or visual graphics, a well-considered gender of the chatbot, and human-like interactions. In summary, we have found in CAT2 three main design aspects for the development of chatbots. CAT2 is much more diverse than CAT1 with various sub-categories for the design of chatbots. This indicates the huge flexibility to design chatbots in various ways to support education.

Regarding the evaluation of chatbots (CAT3), we found three aspects assigned to this category, visualized in Figure 3 : Evaluation Criteria (EC), Evaluation Methods (EM), and Evaluation Instruments (EI). Concerning Evaluation Criteria, seven criteria can be identified in the literature. The first and most important in the educational field, according to ( Smutny and Schreiberova, 2020 ) is the evaluation of learning success ( Hobert, 2019a ), which can have subcategories such as how chatbots are embedded in learning scenarios ( Winkler and Soellner, 2018 ; Smutny and Schreiberova, 2020 ) and teaching efficiency ( Pérez et al., 2020 ). The second is acceptance, which ( Hobert, 2019a ) names as “acceptance and adoption” and ( Pérez et al., 2020 ) as “students’ perception.” Further evaluation criteria are motivation, usability, technical correctness, psychological, and further beneficial factors ( Hobert, 2019a ). These Evaluation Criteria show broad possibilities for the evaluation of chatbots in education. However, ( Hobert, 2019a ) found that most evaluations are limited to single evaluation criteria or narrower aspects of them. Moreover, ( Hobert, 2019a ) introduces a classification matrix for chatbot evaluations, which consists of the following Evaluation Methods (EM): Wizard-of-Oz approach, laboratory studies, field studies, and technical validations. In addition to this, ( Winkler and Soellner, 2018 ) recommends evaluating chatbots by their embeddedness into a learning scenario, a comparison of human-human and human-chatbot interactions, and comparing spoken and written communication. Instruments to measure these evaluation criteria were identified by ( Hobert, 2019a ) by naming quantitative surveys, qualitative interviews, transcripts of dialogues, and technical log files. Regarding CAT3, we found three main aspects for the evaluation of chatbots. We can conclude that this is a more balanced and structured distribution in comparison to CAT2, providing researchers with guidance for evaluating chatbots in education.

Regarding educational effects of chatbots (CAT4), we found two aspects visualized in Figure 4 : Effect Size (ES) and Beneficial Chatbot Features for Learning Success (BF). Concerning the effect size, ( Pérez et al., 2020 ) identified a strong dependency between learning and the related curriculum, while ( Winkler and Soellner, 2018 ) elaborate on general student characteristics that influence how students interact with chatbots. They state that students’ attitudes towards technology, learning characteristics, educational background, self-efficacy, and self-regulation skills affect these interactions. Moreover, the study emphasizes chatbot features, which can be regarded as beneficial in terms of learning outcomes (BF): “Context-Awareness,” “Proactive guidance by students,” “Integration in existing learning and instant messaging tools,” “Accessibility,” and “Response Time.” Overall, for CAT4, we found two main distinguishing aspects for chatbots, however, the reported studies vary widely in their research design, making high-level results hardly comparable.

Looking at the related work, many research questions for the application of chatbots in education remain. Therefore, we selected five goals to be further investigated in our literature review. Firstly, we were interested in the objectives for implementing chatbots in education (Goal 1), as the relevance of chatbots for applications within education seems to be not clearly delineated. Secondly, we aim to explore the pedagogical roles of chatbots in the existing literature (Goal 2) to understand how chatbots can take over tasks from teachers. ( Winkler and Soellner, 2018 ) and ( Pérez-Marín, 2021 ), identified research gaps for supporting meta-cognitive skills with chatbots such as self-regulation. This requires a chatbot application that takes a mentoring role, as the development of these meta-cognitive skills can not be achieved solely by information delivery. Within our review we incorporate this by reviewing the mentoring role of chatbots as (Goal 3). Another key element for a mentoring chatbot is adaptation to the learners needs. Therefore, (Goal 4) of our review lies in the investigation of the adaptation approaches used by chatbots in education. For (Goal 5), we want to extend the work of ( Winkler and Soellner, 2018 ) and ( Pérez et al., 2020 ) regarding Application Clusters (AC) and map applications by further investigating specific learning domains in which chatbots have been studied.

To delineate and map the field of chatbots in education, initial findings were collected by a preliminary literature search. One of the takeaways is that the emerging field around educational chatbots has seen much activity in the last two years. Based on the experience of this preliminary search, search terms, queries, and filters were constructed for the actual structured literature review. This structured literature review follows the PRISMA framework ( Liberati et al., 2009 ), a guideline for reporting systematic reviews and meta-analyses. The framework consists of an elaborated structure for systematic literature reviews and sets requirements for reporting information about the review process ( see section 3.2 to 3.4).

Research Questions

Contributing to the state-of-the-art, we investigate five aspects of chatbot applications published in the literature. We therefore guided our research with the following research questions:

RQ1: Which objectives for implementing chatbots in education can be identified in the existing literature?

RQ2: Which pedagogical roles of chatbots can be identified in the existing literature?

RQ3: Which application scenarios have been used to mentor students?

RQ4: To what extent are chatbots adaptable to personal students’ needs?

RQ5: What are the domains in which chatbots have been applied so far?

Sources of Information

As data sources, Scopus, Web of Science, Google Scholar, Microsoft Academics, and the educational research database “Fachportal Pädagogik” (including ERIC) were selected, all of which incorporate all major publishers and journals. In ( Martín-Martín et al., 2018 ) it was shown that for the social sciences only 29.8% and for engineering and computer science, 46.8% of relevant literature is included in all of the first three databases. For the topic of chatbots in education, a value between these two numbers can be assumed, which is why an approach of integrating several publisher-independent databases was employed here.

Search Criteria

Based on the findings from the initial related work search, we derived the following search query:

( Education OR Educational OR Learning OR Learner OR Student OR Teaching OR School OR University OR Pedagogical ) AND Chatbot.

It combines education-related keywords with the “chatbot” keyword. Since chatbots are related to other technologies, the initial literature search also considered keywords such as “pedagogical agents,” “dialogue systems,” or “bots” when composing the search query. However, these increased the number of irrelevant results significantly and were therefore excluded from the query in later searches.

Inclusion and Exclusion Criteria

The queries were executed on 23.12.2020 and applied twice to each database, first as a title search query and secondly as a keyword-based search. This resulted in a total of 3.619 hits, which were checked for duplicates resulting in 2.678 candidate publications. The overall search and filtering process is shown in Figure 5 .

www.frontiersin.org

FIGURE 5 . PRISMA flow chart.

In the case of Google Scholar, the number of results sorted by relevance per query was limited to 300, as this database also delivers many less relevant works. The value was determined by looking at the search results in detail using several queries to exclude as few relevant works as possible. This approach showed promising results and, at the same time, did not burden the literature list with irrelevant items.

The further screening consisted of a four-stage filtering process. First, eliminating duplicates in the results of title and keyword queries of all databases independently and second, excluding publications based on the title and abstract that:

• were not available in English

• did not describe a chatbot application

• were not mainly focused on learner-centered chatbots applications in schools or higher education institutions, which is according to the preliminary literature search the main application area within education.

Third, we applied another duplicate filter, this time for the merged set of publications. Finally, a filter based on the full text, excluding publications that were:

• limited to improve chatbots technically (e.g., publications that compare or develop new algorithms), as research questions presented in these publications were not seeking for additional insights on applications in education

• exclusively theoretical in nature (e.g., publications that discuss new research projects, implementation concepts, or potential use cases of chatbots in education), as they either do not contain research questions or hypotheses or do not provide conclusions from studies with learners.

After the first, second, and third filters, we identified 505 candidate publications. We continued our filtering process by reading the candidate publications’ full texts resulting in 74 publications that were used for our review. Compared to 3.619 initial database results, the proportion of relevant publications is therefore about 2.0%.

The final publication list can be accessed under https://bit.ly/2RRArFT .

To analyze the identified publications and derive results according to the research questions, full texts were coded, considering for each publication the objectives for implementing chatbots (RQ1), pedagogical roles of chatbots (RQ2), their mentoring roles (RQ3), adaptation of chatbots (RQ4), as well as their implementation domains in education (RQ5) as separated sets of codes. To this end, initial codes were identified by open coding and iteratively improved through comparison, group discussion among the authors, and subsequent code expansion. Further, codes were supplemented with detailed descriptions until a saturation point was reached, where all included studies could be successfully mapped to codes, suggesting no need for further refinement. As an example, codes for RQ2 (Pedagogical Roles) were adapted and refined in terms of their level of abstraction from an initial set of only two codes, 1 ) a code for chatbots in the learning role and 2 ) a code for chatbots in a service-oriented role. After coding a larger set of publications, it became clear that the code for service-oriented chatbots needed to be further distinguished. This was because it summarized e.g. automation activities with activities related to self-regulated learning and thus could not be distinguished sharply enough from the learning role. After refining the code set in the next iteration into a learning role, an assistance role, and a mentoring role, it was then possible to ensure the separation of the individual codes. In order to avoid defining new codes for singular or a very small number of publications, studies were coded as “other” (RQ1) or “not defined” (RQ2), if their occurrence was less than eight publications, representing less than 10% of the publications in the final paper list.

By grouping the resulting relevant publications according to their date of publication, it is apparent that chatbots in education are currently in a phase of increased attention. The release distribution shows slightly lower publication numbers in the current than in the previous year ( Figure 6 ), which could be attributed to a time lag between the actual publication of manuscripts and their dissemination in databases.

www.frontiersin.org

FIGURE 6 . Identified chatbot publications in education per year.

Applying the curve presented in Figure 6 to Gartner’s Hype Cycle ( Linden and Fenn, 2003 ) suggests that technology around chatbots in education may currently be in the “Innovation Trigger” phase. This phase is where many expectations are placed on the technology, but the practical in-depth experience is still largely lacking.

Objectives for Implementing Chatbots in Education

Regarding RQ1, we extracted implementation objectives for chatbots in education. By analyzing the selected publications we identified that most of the objectives for chatbots in education can be described by one of the following categories: Skill improvement, Efficiency of Education, and Students’ Motivation ( see Figure 7 ). First, the “improvement of a student’s skill” (or Skill Improvement ) objective that the chatbot is supposed to help with or achieve. Here, chatbots are mostly seen as a learning aid that supports students. It is the most commonly cited objective for chatbots. The second objective is to increase the Efficiency of Education in general. It can occur, for example, through the automation of recurring tasks or time-saving services for students and is the second most cited objective for chatbots. The third objective is to increase Students’ Motivation . Finally, the last objective is to increase the Availability of Education . This objective is intended to provide learning or counseling with temporal flexibility or without the limitation of physical presence. In addition, there are other, more diverse objectives for chatbots in education that are less easy to categorize. In cases of a publication indicating more than one objective, the publication was distributed evenly across the respective categories.

www.frontiersin.org

FIGURE 7 . Objectives for implementing chatbots identified in chatbot publications.

Given these results, we can summarize four major implementing objectives for chatbots. Of these, Skill Improvement is the most popular objective, constituting around one-third of publications (32%). Making up a quarter of all publications, Efficiency of Education is the second most popular objective (25%), while addressing Students’ Motivation and Availability of Education are third (13%) and fourth (11%), respectively. Other objectives also make up a substantial amount of these publications (19%), although they were too diverse to categorize in a uniform way. Examples of these are inclusivity ( Heo and Lee, 2019 ) or the promotion of student teacher interactions ( Mendoza et al., 2020 ).

Pedagogical Roles

Regarding RQ2, it is crucial to consider the use of chatbots in terms of their intended pedagogical role. After analyzing the selected articles, we were able to identify four different pedagogical roles: a supporting learning role, an assisting role, and a mentoring role.

In the supporting learning role ( Learning ), chatbots are used as an educational tool to teach content or skills. This can be achieved through a fixed integration into the curriculum, such as conversation tasks (L. K. Fryer et al., 2020 ). Alternatively, learning can be supported through additional offerings alongside classroom teaching, for example, voice assistants for leisure activities at home ( Bao, 2019 ). Examples of these are chatbots simulating a virtual pen pal abroad ( Na-Young, 2019 ). Conversations with this kind of chatbot aim to motivate the students to look up vocabulary, check their grammar, and gain confidence in the foreign language.

In the assisting role ( Assisting ), chatbot actions can be summarized as simplifying the student's everyday life, i.e., taking tasks off the student’s hands in whole or in part. This can be achieved by making information more easily available ( Sugondo and Bahana, 2019 ) or by simplifying processes through the chatbot’s automation ( Suwannatee and Suwanyangyuen, 2019 ). An example of this is the chatbot in ( Sandoval, 2018 ) that answers general questions about a course, such as an exam date or office hours.

In the mentoring role ( Mentoring ), chatbot actions deal with the student’s personal development. In this type of support, the student himself is the focus of the conversation and should be encouraged to plan, reflect or assess his progress on a meta-cognitive level. One example is the chatbot in ( Cabales, 2019 ), which helps students develop lifelong learning skills by prompting in-action reflections.

The distribution of each pedagogical role is shown in Figure 8 . From this, it can be seen that Learning is the most frequently used role of the examined publications (49%), followed by Assisting (20%) and Mentoring (15%). It should be noted that pedagogical roles were not identified for all the publications examined. The absence of a clearly defined pedagogical role (16%) can be attributed to the more general nature of these publications, e.g. focused on students’ small talk behaviors ( Hobert, 2019b ) or teachers’ attitudes towards chatbot applications in classroom teaching (P. K. Bii et al., 2018 ).

www.frontiersin.org

FIGURE 8 . Pedagogical roles identified in chatbot publications.

Looking at pedagogical roles in the context of objectives for implementing chatbots, relations among publications can be inspected in a relations graph ( Figure 9 ). According to our results, the strongest relation in the examined publications can be considered between Skill Improvement objective and the Learning role. This strong relation is partly because both, the Skill Improvement objective and the Learning role, are the largest in their respective categories. In addition, two other strong relations can be observed: Between the Students’ Motivation objective and the Learning role, as well as between Efficiency of Education objective and Assisting role.

www.frontiersin.org

FIGURE 9 . Relations graph of pedagogical roles and objectives for implementing chatbots.

By looking at other relations in more detail, there is surprisingly no relation between Skill Improvement as the most common implementation objective and Assisting , as the 2nd most common pedagogical role. Furthermore, it can be observed that the Mentoring role has nearly equal relations to all of the objectives for implementing chatbots.

The relations graph ( Figure 9 ) can interactively be explored through bit.ly/32FSKQM.

Mentoring Role

Regarding RQ3, we identified eleven publications that deal with chatbots in this regard. The Mentoring role in these publications can be categorized in two dimensions. Starting with the first dimension, the mentoring method, three methods can be observed:

• Scaffolding ( n = 7)

• Recommending ( n = 3)

• Informing ( n = 1)

An example of Scaffolding can be seen in ( Gabrielli et al., 2020 ), where the chatbot coaches students in life skills, while an example of Recommending can be seen in ( Xiao et al., 2019 ), where the chatbot recommends new teammates. Finally, Informing can be seen in ( Kerly et al., 2008 ), where the chatbot informs students about their personal Open Learner Model.

The second dimension is the addressed mentoring topic, where the following topics can be observed:

• Self-Regulated Learning ( n = 5)

• Life Skills ( n = 4)

• Learning Skills ( n = 2)

While Mentoring chatbots to support Self-Regulated Learning are intended to encourage students to reflect on and plan their learning progress, Mentoring chatbots to support Life Skills address general student’s abilities such as self-confidence or managing emotions. Finally, Mentoring chatbots to support Learning Skills , in contrast to Self-Regulated Learning , address only particular aspects of the learning process, such as new learning strategies or helpful learning partners. An example for Mentoring chatbots supporting Life Skill is the Logo counseling chatbot, which promotes healthy self-esteem ( Engel et al., 2020 ). CALMsystem is an example of a Self-Regulated Learning chatbot, which informs students about their data in an open learner model ( Kerly et al., 2008 ). Finally, there is the Learning Skills topic. Here, the MCQ Bot is an example that is designed to introduce students to transformative learning (W. Huang et al., 2019 ).

Regarding RQ4, we identified six publications in the final publication list that address the topic of adaptation. Within these publications, five adaptation approaches are described:

The first approach (A1) is proposed by ( Kerly and Bull, 2006 ) and ( Kerly et al., 2008 ), dealing with student discussions based on success and confidence during a quiz. The improvement of self-assessment is the primary focus of this approach. The second approach (A2) is presented in ( Jia, 2008 ), where the personality of the chatbot is adapted to motivate students to talk to the chatbot and, in this case, learn a foreign language. The third approach (A3), as shown in the work of ( Vijayakumar et al., 2019 ), is characterized by a chatbot that provides personalized formative feedback to learners based on their self-assessment, again in a quiz situation. Here, the focus is on Hattie and Timperley’s three guiding questions: “Where am I going?,” “How am I going?” and “Where to go next?” ( Hattie and Timperley, 2007 ). In the fourth approach (A4), exemplified in ( Ruan et al., 2019 ), the chatbot selects questions within a quiz. Here, the chatbot estimates the student’s ability and knowledge level based on the quiz progress and sets the next question accordingly. Finally, a similar approach (A5) is shown in ( Davies et al., 2020 ). In contrast to ( Ruan et al., 2019 ), this chatbot adapts the amount of question variation and takes psychological features into account which were measured by psychological tests before.

We examined these five approaches by organizing them according to their information sources and extracted learner information. The results can be seen in Table 2 .

www.frontiersin.org

TABLE 2 . Adaptation approaches of chatbots in education.

Four out of five adaptation approaches (A1, A3, A4, and A5) are observed in the context of quizzes. These adaptations within quizzes can be divided into two mainstreams: One is concerned about students’ feedback (A1 and A3), while the other is concerned about learning material selection (A4 and A5). The only different adaptation approach is shown in A2, which focuses on the adaptation of the chatbot personality within a language learning application.

Domains for Chatbots in Education

Regarding RQ5, we identified 20 domains of chatbots in education. These can broadly be divided by their pedagogical role into three domain categories (DC): Learning Chatbots , Assisting Chatbots , and Mentoring Chatbots . The remaining publications are grouped in the Other Research domain category. The complete list of identified domains can be seen in Table 3 .

www.frontiersin.org

TABLE 3 . Domains of chatbots in education.

The domain category Learning Chatbots , which deals with chatbots incorporating the pedagogical role Learning , can be subdivided into seven domains: 1 ) Language Learning , 2 ) Learn to Program , 3 ) Learn Communication Skills , 4 ) Learn about Educational Technologies , 5 ) Learn about Cultural Heritage , 6 ) Learn about Laws , and 7 ) Mathematics Learning . With more than half of publications (53%), chatbots for Language Learning play a prominent role in this domain category. They are often used as chat partners to train conversations or to test vocabulary. An example of this can be seen in the work of ( Bao, 2019 ), which tries to mitigate foreign language anxiety by chatbot interactions in foreign languages.

The domain category Assisting Chatbots , which deals with chatbots incorporating the pedagogical role Assisting , can be subdivided into four domains: 1 ) Administrative Assistance , 2 ) Campus Assistance , 3 ) Course Assistance , and 4 ) Library Assistance . With one-third of publications (33%), chatbots in the Administrative Assistance domain that help to overcome bureaucratic hurdles at the institution, while providing round-the-clock services, are the largest group in this domain category. An example of this can be seen in ( Galko et al., 2018 ), where the student enrollment process is completely shifted to a conversation with a chatbot.

The domain category Mentoring Chatbots , which deals with chatbots incorporating the pedagogical role Mentoring , can be subdivided into three domains: 1 ) Scaffolding Chatbots , 2 ) Recommending Chatbots , and 3 ) Informing Chatbots . An example of a Scaffolding Chatbots is the CRI(S) chatbot ( Gabrielli et al., 2020 ), which supports life skills such as self-awareness or conflict resolution in discussion with the student by promoting helpful ideas and tricks.

The domain category Other Research , which deals with chatbots not incorporating any of these pedagogical roles, can be subdivided into three domains: 1 ) General Chatbot Research in Education , 2 ) Indian Educational System , and 3 ) Chatbot Interfaces . The most prominent domain, General Chatbot Research , cannot be classified in one of the other categories but aims to explore cross-cutting issues. An example for this can be seen in the publication of ( Hobert, 2020 ), which researches the importance of small talk abilities of chatbots in educational settings.

Discussions

In this paper, we investigated the state-of-the-art of chatbots in education according to five research questions. By combining our results with previously identified findings from related literature reviews, we proposed a concept map of chatbots in education. The map, reported in Appendix A , displays the current state of research regarding chatbots in education with the aim of supporting future research in the field.

Answer to Research Questions

Concerning RQ1 (implementation objectives), we identified four major objectives: 1 ) Skill Improvement , 2 ) Efficiency of Education , 3 ) Students’ Motivation, and 4 ) Availability of Education . These four objectives cover over 80% of the analyzed publications ( see Figure 7 ). Based on the findings on CAT3 in section 2, we see a mismatch between the objectives for implementing chatbots compared to their evaluation. Most researchers only focus on narrow aspects for the evaluation of their chatbots such as learning success, usability, and technology acceptance. This mismatch of implementation objectives and suitable evaluation approaches is also well known by other educational technologies such as Learning Analytics dashboards ( Jivet et al., 2017 ). A more structured approach of aligning implementation objectives and evaluation procedures is crucial to be able to properly assess the effectiveness of chatbots. ( Hobert, 2019a ), suggested a structured four-stage evaluation procedure beginning with a Wizard-of-Oz experiment, followed by technical validation, a laboratory study, and a field study. This evaluation procedure systematically links hypotheses with outcomes of chatbots helping to assess chatbots for their implementation objectives. “Aligning chatbot evaluations with implementation objectives” is, therefore, an important challenge to be addressed in the future research agenda.

Concerning RQ2 (pedagogical roles), our results show that chatbots’ pedagogical roles can be summarized as Learning , Assisting , and Mentoring . The Learning role is the support in learning or teaching activities such as gaining knowledge. The Assisting role is the support in terms of simplifying learners’ everyday life, e.g. by providing opening times of the library. The Mentoring role is the support in terms of students’ personal development, e.g. by supporting Self-Regulated Learning. From a pedagogical standpoint, all three roles are essential for learners and should therefore be incorporated in chatbots. These pedagogical roles are well aligned with the four implementation objectives reported in RQ1. While Skill Improvement and Students’ Motivation is strongly related to Learning , Efficiency of Education is strongly related to Assisting . The Mentoring role instead, is evenly related to all of the identified objectives for implementing chatbots. In the reviewed publications, chatbots are therefore primarily intended to 1 ) improve skills and motivate students by supporting learning and teaching activities, 2 ) make education more efficient by providing relevant administrative and logistical information to learners, and 3 ) support multiple effects by mentoring students.

Concerning RQ3 (mentoring role), we identified three main mentoring method categories for chatbots: 1 ) Scaffolding , 2 ) Recommending , and 3 ) Informing . However, comparing the current mentoring of chatbots reported in the literature with the daily mentoring role of teachers, we can summarize that the chatbots are not at the same level. In order to take over mentoring roles of teachers ( Wildman et al., 1992 ), a chatbot would need to fulfill some of the following activities in their mentoring role. With respect to 1 ) Scaffolding , chatbots should provide direct assistance while learning new skills and especially direct beginners in their activities. Regarding 2 ) Recommending , chatbots should provide supportive information, tools or other materials for specific learning tasks to life situations. With respect to 3 ) Informing, chatbots should encourage students according to their goals and achievements, and support them to develop meta-cognitive skills like self-regulation. Due to the mismatch of teacher vs. chatbot mentoring we see here another research challenge, which we call “Exploring the potential of chatbots for mentoring students.”

Regarding RQ4 (adaptation), only six publications were identified that discuss an adaptation of chatbots, while four out of five adaptation approaches (A1, A3, A4, and A5) show similarities by being applied within quizzes. In the context of educational technologies, providing reasonable adaptations for learners requires a high level of experience. Based on our results, the research on chatbots does not seem to be at this point yet. Looking at adaptation literature like ( Brusilovsky, 2001 ) or ( Benyon and Murray, 1993 ), it becomes clear that a chatbot needs to consider the learners’ personal information to fulfill the requirement of the adaptation definition. Personal information must be retrieved and stored at least temporarily, in some sort of learner model. For learner information like knowledge and interest, adaptations seem to be barely explored in the reviewed publications, while the model of ( Brusilovsky and Millán, 2007 ) points out further learner information, which can be used to make chatbots more adaptive: personal goals, personal tasks, personal background, individual traits, and the learner’s context. We identify research in this area as a third future challenge and call it the “Exploring and leveraging adaptation capabilities of chatbots” challenge.

In terms of RQ5 (domains), we identified a detailed map of domains applying chatbots in education and their distribution ( see Table 3 ). By systematically analyzing 74 publications, we identified 20 domains and structured them according to the identified pedagogical role into four domain categories: Learning Chatbots , Assisting Chatbots , Mentoring Chatbots , and Other Research . These results extend the taxonomy of Application Clusters (AC) for chatbots in education, which previously comprised the work from ( Pérez et al., 2020 ), who took the chatbot activity as characteristic, and ( Winkler and Soellner, 2018 ), who characterized the chatbots by domains. It draws relationships between these two types of Application Clusters (AC) and structures them accordingly. Our structure incorporates Mentoring Chatbots and Other Research in addition to the “service-oriented chatbots” (cf. Assisting Chatbots ) and “teaching-oriented chatbots” (cf. Learning Chatbots ) identified by (Perez). Furthermore, the strong tendencies of informing students already mentioned by ( Smutny and Schreiberova, 2020 ) can also be recognized in our results, especially in Assisting Chatbots . Compared to ( Winkler and Soellner, 2018 ), we can confirm the prominent domains of “language learning” within Learning Chatbots and “metacognitive thinking” within Mentoring Chatbots . Moreover, through Table 3 , a more detailed picture of chatbot applications in education is reflected, which could help researchers to find similar works or unexplored application areas.

Limitations

One important limitation to be mentioned here is the exclusion of alternative keywords for our search queries, as we exclusively used chatbot as keyword in order to avoid search results that do not fit our research questions. Though we acknowledge that chatbots share properties with pedagogical agents, dialog systems, and bots, we carefully considered this trade-off between missing potentially relevant work and inflating our search procedure by including related but not necessarily pertinent work. A second limitation may lie in the formation of categories and coding processes applied, which, due to the novelty of the findings, could not be built upon theoretical frameworks or already existing code books. Although we have focused on ensuring that codes used contribute to a strong understanding, the determination of the abstraction level might have affected the level of detail of the resulting data representation.

In this systematic literature review, we explored the current landscape of chatbots in education. We analyzed 74 publications, identified 20 domains of chatbots and grouped them based on their pedagogical roles into four domain categories. These pedagogical roles are the supporting learning role ( Learning ), the assisting role ( Assisting ), and the mentoring role ( Mentoring ). By focusing on objectives for implementing chatbots, we identified four main objectives: 1 ) Skill Improvement , 2 ) Efficiency of Education , 3 ) Students’ Motivation, and 4 ) Availability of Education . As discussed in section 5, these objectives do not fully align with the chosen evaluation procedures. We focused on the relations between pedagogical roles and objectives for implementing chatbots and identified three main relations: 1 ) chatbots to improve skills and motivate students by supporting learning and teaching activities, 2 ) chatbots to make education more efficient by providing relevant administrative and logistical information to learners, and 3 ) chatbots to support multiple effects by mentoring students. We focused on chatbots incorporating the Mentoring role and found that these chatbots are mostly concerned with three mentoring topics 1 ) Self-Regulated Learning , 2 ) Life Skills , and 3 ) Learning Skills and three mentoring methods 1 ) Scaffolding , 2 ) Recommending , and 3 ) Informing . Regarding chatbot adaptations, only six publications with adaptations were identified. Furthermore, the adaptation approaches found were mostly limited to applications within quizzes and thus represent a research gap.

Based on these outcomes we consider three challenges for chatbots in education that offer future research opportunities:

Challenge 1: Aligning chatbot evaluations with implementation objectives . Most chatbot evaluations focus on narrow aspects to measure the tool’s usability, acceptance or technical correctness. If chatbots should be considered as learning aids, student mentors, or facilitators, the effects on the cognitive, and emotional levels should also be taken into account for the evaluation of chatbots. This finding strengthens our conclusion that chatbot development in education is still driven by technology, rather than having a clear pedagogical focus of improving and supporting learning.

Challenge 2: Exploring the potential of chatbots for mentoring students . In order to better understand the potentials of chatbots to mentor students, more empirical studies on the information needs of learners are required. It is obvious that these needs differ from schools to higher education. However, so far there are hardly any studies investigating the information needs with respect to chatbots nor if chatbots address these needs sufficiently.

Challenge 3: Exploring and leveraging adaptation capabilities of chatbots . There is a large literature on adaptation capabilities of educational technologies. However, we have seen very few studies on the effect of adaptation of chatbots for education purposes. As chatbots are foreseen as systems that should personally support learners, the area of adaptable interactions of chatbots is an important research aspect that should receive more attention in the near future.

By addressing these challenges, we believe that chatbots can become effective educational tools capable of supporting learners with informative feedback. Therefore, looking at our results and the challenges presented, we conclude, “No, we are not there yet!” - There is still much to be done in terms of research on chatbots in education. Still, development in this area seems to have just begun to gain momentum and we expect to see new insights in the coming years.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Author Contributions

SW, JS†, DM†, JW†, MR, and HD.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbasi, S., Kazi, H., and Hussaini, N. N. (2019). Effect of Chatbot Systems on Student’s Learning Outcomes. Sylwan 163(10).

CrossRef Full Text

Abbasi, S., and Kazi, H. (2014). Measuring Effectiveness of Learning Chatbot Systems on Student's Learning Outcome and Memory Retention. Asian J. Appl. Sci. Eng. 3, 57. doi:10.15590/AJASE/2014/V3I7/53576

CrossRef Full Text | Google Scholar

Almahri, F. A. J., Bell, D., and Merhi, M. (2020). “Understanding Student Acceptance and Use of Chatbots in the United Kingdom Universities: A Structural Equation Modelling Approach,” in 2020 6th IEEE International Conference on Information Management, ICIM 2020 , London, United Kingdom , March 27–29, 2020 , (IEEE), 284–288. doi:10.1109/ICIM49319.2020.244712

Bao, M. (2019). Can Home Use of Speech-Enabled Artificial Intelligence Mitigate Foreign Language Anxiety - Investigation of a Concept. Awej 5, 28–40. doi:10.24093/awej/call5.3

Benyon, D., and Murray, D. (1993). Applying User Modeling to Human-Computer Interaction Design. Artif. Intell. Rev. 7 (3-4), 199–225. doi:10.1007/BF00849555

Bii, P. K., Too, J. K., and Mukwa, C. W. (2018). Teacher Attitude towards Use of Chatbots in Routine Teaching. Univers. J. Educ. Res. . 6 (7), 1586–1597. doi:10.13189/ujer.2018.060719

Bii, P., Too, J., and Langat, R. (2013). An Investigation of Student’s Attitude Towards the Use of Chatbot Technology in Instruction: The Case of Knowie in a Selected High School. Education Research 4, 710–716. doi:10.14303/er.2013.231

Google Scholar

Bos, A. S., Pizzato, M. C., Vettori, M., Donato, L. G., Soares, P. P., Fagundes, J. G., et al. (2020). Empirical Evidence During the Implementation of an Educational Chatbot with the Electroencephalogram Metric. Creative Education 11, 2337–2345. doi:10.4236/CE.2020.1111171

Brusilovsky, P. (2001). Adaptive Hypermedia. User Model. User-Adapted Interaction 11 (1), 87–110. doi:10.1023/a:1011143116306

Brusilovsky, P., and Millán, E. (2007). “User Models for Adaptive Hypermedia and Adaptive Educational Systems,” in The Adaptive Web: Methods and Strategies of Web Personalization . Editors P. Brusilovsky, A. Kobsa, and W. Nejdl. Berlin: Springer , 3–53. doi:10.1007/978-3-540-72079-9_1

Cabales, V. (2019). “Muse: Scaffolding metacognitive reflection in design-based research,” in CHI EA’19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems , Glasgow, Scotland, United Kingdom , May 4–9, 2019 , (ACM), 1–6. doi:10.1145/3290607.3308450

Carayannopoulos, S. (2018). Using Chatbots to Aid Transition. Int. J. Info. Learn. Tech. 35, 118–129. doi:10.1108/IJILT-10-2017-0097

Chan, C. H., Lee, H. L., Lo, W. K., and Lui, A. K.-F. (2018). Developing a Chatbot for College Student Programme Advisement. in 2018 International Symposium on Educational Technology, ISET 2018 , Osaka, Japan , July 31–August 2, 2018 . Editors F. L. Wang, C. Iwasaki, T. Konno, O. Au, and C. Li, (IEEE), 52–56. doi:10.1109/ISET.2018.00021

Chang, M.-Y., and Hwang, J.-P. (2019). “Developing Chatbot with Deep Learning Techniques for Negotiation Course,” in 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019 , Toyama, Japan , July 7–11, 2019 , (IEEE), 1047–1048. doi:10.1109/IIAI-AAI.2019.00220

Chen, C.-A., Yang, Y.-T., Wu, S.-M., Chen, H.-C., Chiu, K.-C., Wu, J.-W., et al. (2018). “A Study of Implementing AI Chatbot in Campus Consulting Service”, in TANET 2018-Taiwan Internet Seminar , 1714–1719. doi:10.6861/TANET.201810.0317

Chen, H.-L., Widarso, G. V., and Sutrisno, H. (2020). A ChatBot for Learning Chinese: Learning Achievement and Technology Acceptance. J. Educ. Comput. Res. 58 (6), 1161–1189. doi:10.1177/0735633120929622

Daud, S. H. M., Teo, N. H. I., and Zain, N. H. M. (2020). E-java Chatbot for Learning Programming Language: A post-pandemic Alternative Virtual Tutor. Int. J. Emerging Trends Eng. Res. 8(7). 3290–3298. doi:10.30534/ijeter/2020/67872020

Davies, J. N., Verovko, M., Verovko, O., and Solomakha, I. (2020). “Personalization of E-Learning Process Using Ai-Powered Chatbot Integration,” in Selected Papers of 15th International Scientific-practical Conference, MODS, 2020: Advances in Intelligent Systems and Computing , Chernihiv, Ukraine , June 29–July 01, 2020 . Editors S. Shkarlet, A. Morozov, and A. Palagin, ( Springer ) Vol. 1265, 209–216. doi:10.1007/978-3-030-58124-4_20

Diachenko, A. V., Morgunov, B. P., Melnyk, T. P., Kravchenko, O. I., and Zubchenko, L. V. (2019). The Use of Innovative Pedagogical Technologies for Automation of the Specialists' Professional Training. Int. J. Hydrogen. Energy. 8, 288–295. doi:10.5430/ijhe.v8n6p288

Dibitonto, M., Leszczynska, K., Tazzi, F., and Medaglia, C. M. (2018). “Chatbot in a Campus Environment: Design of Lisa, a Virtual Assistant to Help Students in Their university Life,” in 20th International Conference, HCI International 2018 , Las Vegas, NV, USA , July 15–20, 2018 , Lecture Notes in Computer Science. Editors M. Kurosu, (Springer), 103–116. doi:10.1007/978-3-319-91250-9

Durall, E., and Kapros, E. (2020). “Co-design for a Competency Self-Assessment Chatbot and Survey in Science Education,” in 7th International Conference, LCT 2020, Held as Part of the 22nd HCI International Conference, HCII 2020 , Copenhagen, Denmark , July 19–24, 2020 , Lecture Notes in Computer Science. Editors P. Zaphiris, and A. Ioannou, Berlin: Springer Vol. 12206, 13–23. doi:10.1007/978-3-030-50506-6_2

Duval, E., and Verbert, K. (2012). Learning Analytics. Eleed 8 (1).

Engel, J. D., Engel, V. J. L., and Mailoa, E. (2020). Interaction Monitoring Model of Logo Counseling Website for College Students' Healthy Self-Esteem, I. J. Eval. Res. Educ. 9, 607–613. doi:10.11591/ijere.v9i3.20525

Febriani, G. A., and Agustia, R. D. (2019). Development of Line Chatbot as a Learning Media for Mathematics National Exam Preparation. Elibrary.Unikom.Ac.Id . https://elibrary.unikom.ac.id/1130/14/UNIKOM_GISTY%20AMELIA%20FEBRIANI_JURNAL%20DALAM%20BAHASA%20INGGRIS.pdf .

Ferguson, R., and Sharples, M. (2014). “Innovative Pedagogy at Massive Scale: Teaching and Learning in MOOCs,” in 9th European Conference on Technology Enhanced Learning, EC-TEL 2014 , Graz, Austria , September 16–19, 2014 , Lecture Notes in Computer Science. Editors C. Rensing, S. de Freitas, T. Ley, and P. J. Muñoz-Merino, ( Berlin : Springer) Vol. 8719, 98–111. doi:10.1007/978-3-319-11200-8_8

Fryer, L. K., Ainley, M., Thompson, A., Gibson, A., and Sherlock, Z. (2017). Stimulating and Sustaining Interest in a Language Course: An Experimental Comparison of Chatbot and Human Task Partners. Comput. Hum. Behav. 75, 461–468. doi:10.1016/j.chb.2017.05.045

Fryer, L. K., Nakao, K., and Thompson, A. (2019). Chatbot Learning Partners: Connecting Learning Experiences, Interest and Competence. Comput. Hum. Behav. 93, 279–289. doi:10.1016/j.chb.2018.12.023

Fryer, L. K., Thompson, A., Nakao, K., Howarth, M., and Gallacher, A. (2020). Supporting Self-Efficacy Beliefs and Interest as Educational Inputs and Outcomes: Framing AI and Human Partnered Task Experiences. Learn. Individual Differences , 80. doi:10.1016/j.lindif.2020.101850

Gabrielli, S., Rizzi, S., Carbone, S., and Donisi, V. (2020). A Chatbot-Based Coaching Intervention for Adolescents to Promote Life Skills: Pilot Study. JMIR Hum. Factors 7 (1). doi:10.2196/16762

PubMed Abstract | CrossRef Full Text | Google Scholar

Galko, L., Porubän, J., and Senko, J. (2018). “Improving the User Experience of Electronic University Enrollment,” in 16th IEEE International Conference on Emerging eLearning Technologies and Applications, ICETA 2018 , Stary Smokovec, Slovakia , Nov 15–16, 2018 . Editors F. Jakab, (Piscataway, NJ: IEEE ), 179–184. doi:10.1109/ICETA.2018.8572054

Goda, Y., Yamada, M., Matsukawa, H., Hata, K., and Yasunami, S. (2014). Conversation with a Chatbot before an Online EFL Group Discussion and the Effects on Critical Thinking. J. Inf. Syst. Edu. 13, 1–7. doi:10.12937/EJSISE.13.1

Graesser, A. C., VanLehn, K., Rose, C. P., Jordan, P. W., and Harter, D. (2001). Intelligent Tutoring Systems with Conversational Dialogue. AI Mag. 22 (4), 39–51. doi:10.1609/aimag.v22i4.1591

Greller, W., and Drachsler, H. (2012). Translating Learning into Numbers: A Generic Framework for Learning Analytics. J. Educ. Tech. Soc. 15 (3), 42–57. doi:10.2307/jeductechsoci.15.3.42

Haristiani, N., and Rifa’i, M. M. Combining Chatbot and Social Media: Enhancing Personal Learning Environment (PLE) in Language Learning. Indonesian J Sci Tech. 5 (3), 487–506. doi:10.17509/ijost.v5i3.28687

Hattie, J., and Timperley, H. (2007). The Power of Feedback. Rev. Educ. Res. 77 (1), 81–112. doi:10.3102/003465430298487

Hattie, J. (2009). Visible Learning: A Synthesis of over 800 Meta-Analyses Relating to Achievement . Abingdon, UK: Routledge .

Heller, B., Proctor, M., Mah, D., Jewell, L., and Cheung, B. (2005). “Freudbot: An Investigation of Chatbot Technology in Distance Education,” in Proceedings of ED-MEDIA 2005–World Conference on Educational Multimedia, Hypermedia and Telecommunications , Montréal, Canada , June 27–July 2, 2005 . Editors P. Kommers, and G. Richards, ( AACE ), 3913–3918.

Heo, J., and Lee, J. (2019). “CiSA: An Inclusive Chatbot Service for International Students and Academics,” in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science , Orlando, FL, USA , July 26–31, 2019 . Editors C. Stephanidis, ( Springer ) 11786, 153–167. doi:10.1007/978-3-030-30033-3

Hobert, S. (2019a). “How Are You, Chatbot? Evaluating Chatbots in Educational Settings - Results of a Literature Review,” in 17. Fachtagung Bildungstechnologien, DELFI 2019 - 17th Conference on Education Technologies, DELFI 2019 , Berlin, Germany , Sept 16–19, 2019 . Editors N. Pinkwart, and J. Konert, 259–270. doi:10.18420/delfi2019_289

Hobert, S., and Meyer von Wolff, R. (2019). “Say Hello to Your New Automated Tutor - A Structured Literature Review on Pedagogical Conversational Agents,” in 14th International Conference on Wirtschaftsinformatik , Siegen, Germany , Feb 23–27, 2019 . Editors V. Pipek, and T. Ludwig, ( AIS ).

Hobert, S. (2019b). Say Hello to ‘Coding Tutor’! Design and Evaluation of a Chatbot-Based Learning System Supporting Students to Learn to Program in International Conference on Information Systems (ICIS) 2019 Conference , Munich, Germany , Dec 15–18, 2019 , AIS 2661, 1–17.

Hobert, S. (2020). Small Talk Conversations and the Long-Term Use of Chatbots in Educational Settings ‐ Experiences from a Field Study in 3rd International Workshop on Chatbot Research and Design, CONVERSATIONS 2019 , Amsterdam, Netherlands , November 19–20 : Lecture Notes in Computer Science. Editors A. Folstad, T. Araujo, S. Papadopoulos, E. Law, O. Granmo, E. Luger, and P. Brandtzaeg, ( Springer ) 11970, 260–272. doi:10.1007/978-3-030-39540-7_18

Hsieh, S.-W. (2011). Effects of Cognitive Styles on an MSN Virtual Learning Companion System as an Adjunct to Classroom Instructions. Edu. Tech. Society 2, 161–174.

Huang, J.-X., Kwon, O.-W., Lee, K.-S., and Kim, Y.-K. (2018). Improve the Chatbot Performance for the DB-CALL System Using a Hybrid Method and a Domain Corpus in Future-proof CALL: language learning as exploration and encounters–short papers from EUROCALL 2018 , Jyväskylä, Finland , Aug 22–25, 2018 . Editors P. Taalas, J. Jalkanen, L. Bradley, and S. Thouësny, ( Research-publishing.net ). doi:10.14705/rpnet.2018.26.820

Huang, W., Hew, K. F., and Gonda, D. E. (2019). Designing and Evaluating Three Chatbot-Enhanced Activities for a Flipped Graduate Course. Int. J. Mech. Engineer. Robotics. Research. 813–818. doi:10.18178/ijmerr.8.5.813-818

Ismail, M., and Ade-Ibijola, A. (2019). “Lecturer's Apprentice: A Chatbot for Assisting Novice Programmers,”in Proceedings - 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC) , Vanderbijlpark, South Africa , (IEEE), 1–8. doi:10.1109/IMITEC45504.2019.9015857

Jia, J. (2008). “Motivate the Learners to Practice English through Playing with Chatbot CSIEC,” in 3rd International Conference on Technologies for E-Learning and Digital Entertainment, Edutainment 2008 , Nanjing, China , June 25–27, 2008 , Lecture Notes in Computer Science, (Springer) 5093, 180–191. doi:10.1007/978-3-540-69736-7_20

Jia, J. (2004). “The Study of the Application of a Keywords-Based Chatbot System on the Teaching of Foreign Languages,” in Proceedings of SITE 2004--Society for Information Technology and Teacher Education International Conference , Atlanta, Georgia, USA . Editors R. Ferdig, C. Crawford, R. Carlsen, N. Davis, J. Price, R. Weber, and D. Willis, (AACE), 1201–1207.

Jivet, I., Scheffel, M., Drachsler, H., and Specht, M. (2017). “Awareness is not enough: Pitfalls of learning analytics dashboards in the educational practice,” in 12th European Conference on Technology Enhanced Learning, EC-TEL 2017 , Tallinn, Estonia , September 12–15, 2017 , Lecture Notes in ComputerScience. Editors E. Lavoué, H. Drachsler, K. Verbert, J. Broisin, and M. Pérez-Sanagustín, (Springer), 82–96. doi:10.1007/978-3-319-66610-5_7

Jung, H., Lee, J., and Park, C. (2020). Deriving Design Principles for Educational Chatbots from Empirical Studies on Human-Chatbot Interaction. J. Digit. Contents Society , 21, 487–493. doi:10.9728/dcs.2020.21.3.487

Kerly, A., and Bull, S. (2006). “The Potential for Chatbots in Negotiated Learner Modelling: A Wizard-Of-Oz Study,” in 8th International Conference on Intelligent Tutoring Systems, ITS 2006 , Jhongli, Taiwan , June 26–30, 2006 , Lecture Notes in Computer Science. Editors M. Ikeda, K. D. Ashley, and T. W. Chan, ( Springer ) 4053, 443–452. doi:10.1007/11774303

Kerly, A., Ellis, R., and Bull, S. (2008). CALMsystem: A Conversational Agent for Learner Modelling. Knowledge-Based Syst. 21, 238–246. doi:10.1016/j.knosys.2007.11.015

Kerly, A., Hall, P., and Bull, S. (2007). Bringing Chatbots into Education: Towards Natural Language Negotiation of Open Learner Models. Knowledge-Based Syst. , 20, 177–185. doi:10.1016/j.knosys.2006.11.014

Kumar, M. N., Chandar, P. C. L., Prasad, A. V., and Sumangali, K. (2016). “Android Based Educational Chatbot for Visually Impaired People,” in 2016 IEEE International Conference on Computational Intelligence and Computing Research , Chennai, India , December 15–17, 2016 , 1–4. doi:10.1109/ICCIC.2016.7919664

Lee, K., Jo, J., Kim, J., and Kang, Y. (2019). Can Chatbots Help Reduce the Workload of Administrative Officers? - Implementing and Deploying FAQ Chatbot Service in a University in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science , Orlando, FL, USA , July 26–31, 2019 . Editors C. Stephanidis, ( Springer ) 1032, 348–354. doi:10.1007/978-3-030-23522-2

Lester, J. C., Converse, S. A., Kahler, S. E., Barlow, S. T., Stone, B. A., and Bhogal, R. S. (1997). “The Persona Effect: Affective Impact of Animated Pedagogical Agents,” in Proceedings of the ACM SIGCHI Conference on Human factors in computing systems , Atlanta, Georgia, USA , March 22–27, 1997 , (ACM), 359–366.

Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., et al. (2009). The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies that Evaluate Health Care Interventions: Explanation and Elaboration. J. Clin. Epidemiol. 62 (10), e1–e34. doi:10.1016/j.jclinepi.2009.06.006

Lin, M. P.-C., and Chang, D. (2020). Enhancing Post-secondary Writers’ Writing Skills with a Chatbot. J. Educ. Tech. Soc. 23, 78–92. doi:10.2307/26915408

Lin, Y.-H., and Tsai, T. (2019). “A Conversational Assistant on Mobile Devices for Primitive Learners of Computer Programming,” in TALE 2019 - 2019 IEEE International Conference on Engineering, Technology and Education , Yogyakarta, Indonesia , December 10–13, 2019 , (IEEE), 1–4. doi:10.1109/TALE48000.2019.9226015

Linden, A., and Fenn, J. (2003). Understanding Gartner’s Hype Cycles. Strategic Analysis Report No. R-20-1971 8. Stamford, CT: Gartner, Inc .

Liu, Q., Huang, J., Wu, L., Zhu, K., and Ba, S. (2020). CBET: Design and Evaluation of a Domain-specific Chatbot for mobile Learning. Univ. Access Inf. Soc. , 19, 655–673. doi:10.1007/s10209-019-00666-x

Mamani, J. R. C., Álamo, Y. J. R., Aguirre, J. A. A., and Toledo, E. E. G. (2019). “Cognitive Services to Improve User Experience in Searching for Academic Information Based on Chatbot,” in Proceedings of the 2019 IEEE 26th International Conference on Electronics, Electrical Engineering and Computing (INTERCON) , Lima, Peru , August 12–14, 2019 , (IEEE), 1–4. doi:10.1109/INTERCON.2019.8853572

Martín-Martín, A., Orduna-Malea, E., Thelwall, M., and Delgado López-Cózar, E. (2018). Google Scholar, Web of Science, and Scopus: A Systematic Comparison of Citations in 252 Subject Categories. J. Informetrics 12 (4), 1160–1177. doi:10.1016/j.joi.2018.09.002

Matsuura, S., and Ishimura, R. (2017). Chatbot and Dialogue Demonstration with a Humanoid Robot in the Lecture Class, in 11th International Conference on Universal Access in Human-Computer Interaction, UAHCI 2017, held as part of the 19th International Conference on Human-Computer Interaction, HCI 2017 , Vancouver, Canada , July 9–14, 2017 , Lecture Notes in Computer Science. Editors M. Antona, and C. Stephanidis, (Springer) Vol. 10279, 233–246. doi:10.1007/978-3-319-58700-4

Matsuura, S., and Omokawa, R. (2020). Being Aware of One’s Self in the Auto-Generated Chat with a Communication Robot in UAHCI 2020 , 477–488. doi:10.1007/978-3-030-49282-3

McLoughlin, C., and Oliver, R. (1998). Maximising the Language and Learning Link in Computer Learning Environments. Br. J. Educ. Tech. 29 (2), 125–136. doi:10.1111/1467-8535.00054

Mendoza, S., Hernández-León, M., Sánchez-Adame, L. M., Rodríguez, J., Decouchant, D., and Meneses-Viveros, A. (2020). “Supporting Student-Teacher Interaction through a Chatbot,” in 7th International Conference, LCT 2020, Held as Part of the 22nd HCI International Conference, HCII 2020 , Copenhagen, Denmark , July 19–24, 2020 , Lecture Notes in Computer Science. Editors P. Zaphiris, and A. Ioannou, ( Springer ) 12206, 93–107. doi:10.1007/978-3-030-50506-6

Meyer, V., Wolff, R., Nörtemann, J., Hobert, S., and Schumann, M. (2020). “Chatbots for the Information Acquisition at Universities ‐ A Student’s View on the Application Area,“in 3rd International Workshop on Chatbot Research and Design, CONVERSATIONS 2019 , Amsterdam, Netherlands , November 19–20 , Lecture Notes in Computer Science. Editors A. Folstad, T. Araujo, S. Papadopoulos, E. Law, O. Granmo, E. Luger, and P. Brandtzaeg, (Springer) 11970, 231–244. doi:10.1007/978-3-030-39540-7

Na-Young, K. (2018c). A Study on Chatbots for Developing Korean College Students’ English Listening and Reading Skills. J. Digital Convergence 16. 19–26. doi:10.14400/JDC.2018.16.8.019

Na-Young, K. (2019). A Study on the Use of Artificial Intelligence Chatbots for Improving English Grammar Skills. J. Digital Convergence 17, 37–46. doi:10.14400/JDC.2019.17.8.037

Na-Young, K. (2018a). Chatbots and Korean EFL Students’ English Vocabulary Learning. J. Digital Convergence 16. 1–7. doi:10.14400/JDC.2018.16.2.001

Na-Young, K. (2018b). Different Chat Modes of a Chatbot and EFL Students’ Writing Skills Development . 1225–4975. doi:10.16933/sfle.2017.32.1.263

Na-Young, K. (2017). Effects of Different Types of Chatbots on EFL Learners’ Speaking Competence and Learner Perception. Cross-Cultural Studies 48, 223–252. doi:10.21049/ccs.2017.48.223

Nagata, R., Hashiguchi, T., and Sadoun, D. (2020). Is the Simplest Chatbot Effective in English Writing Learning Assistance?, in 16th International Conference of the Pacific Association for Computational Linguistics , PACLING, Hanoi, Vietnam , October 11–13, 2019 , Communications in Computer and Information Science. Editors L.-M. Nguyen, S. Tojo, X.-H. Phan, and K. Hasida, ( Springer ) Vol. 1215, 245–246. doi:10.1007/978-981-15-6168-9

Nelson, T. O., and Narens, L. (1994). Why Investigate Metacognition. in Metakognition: Knowing About Knowing . Editors J. Metcalfe, and P. Shimamura, (MIT Press) 13, 1–25.

Nghi, T. T., Phuc, T. H., and Thang, N. T. (2019). Applying Ai Chatbot for Teaching a Foreign Language: An Empirical Research. Int. J. Sci. Res. 8.

Ondas, S., Pleva, M., and Hládek, D. (2019). How Chatbots Can Be Involved in the Education Process. in ICETA 2019 - 17th IEEE International Conference on Emerging eLearning Technologies and Applications, Proceedings, Stary Smokovec , Slovakia , November 21–22, 2019 . Editors F. Jakab, (IEEE), 575–580. doi:10.1109/ICETA48886.2019.9040095

Pereira, J., Fernández-Raga, M., Osuna-Acedo, S., Roura-Redondo, M., Almazán-López, O., and Buldón-Olalla, A. (2019). Promoting Learners' Voice Productions Using Chatbots as a Tool for Improving the Learning Process in a MOOC. Tech. Know Learn. 24, 545–565. doi:10.1007/s10758-019-09414-9

Pérez, J. Q., Daradoumis, T., and Puig, J. M. M. (2020). Rediscovering the Use of Chatbots in Education: A Systematic Literature Review. Comput. Appl. Eng. Educ. 28, 1549–1565. doi:10.1002/cae.22326

Pérez-Marín, D. (2021). A Review of the Practical Applications of Pedagogic Conversational Agents to Be Used in School and University Classrooms. Digital 1 (1), 18–33. doi:10.3390/digital1010002

Pham, X. L., Pham, T., Nguyen, Q. M., Nguyen, T. H., and Cao, T. T. H. (2018). “Chatbot as an Intelligent Personal Assistant for mobile Language Learning,” in ACM International Conference Proceeding Series doi:10.1145/3291078.3291115

Quincey, E. de., Briggs, C., Kyriacou, T., and Waller, R. (2019). “Student Centred Design of a Learning Analytics System,” in Proceedings of the 9th International Conference on Learning Analytics & Knowledge , Tempe Arizona, USA , March 4–8, 2019 , (ACM), 353–362. doi:10.1145/3303772.3303793

Ram, A., Prasad, R., Khatri, C., Venkatesh, A., Gabriel, R., Liu, Q, et al. (2018). Conversational Ai: The Science behind the Alexa Prize, in 1st Proceedings of Alexa Prize (Alexa Prize 2017) . ArXiv [Preprint]. Available at: https://arxiv.org/abs/1801.03604 .

Rebaque-Rivas, P., and Gil-Rodríguez, E. (2019). Adopting an Omnichannel Approach to Improve User Experience in Online Enrolment at an E-Learning University, in 21st International Conference on Human-Computer Interaction, HCII 2019: Communications in Computer and Information Science , Orlando, FL, USA , July 26–31, 2019 . Editors C. Stephanidis, ( Springer ), 115–122. doi:10.1007/978-3-030-23525-3

Robinson, C. (2019). Impressions of Viability: How Current Enrollment Management Personnel And Former Students Perceive The Implementation of A Chatbot Focused On Student Financial Communication. Higher Education Doctoral Projects.2 . https://aquila.usm.edu/highereddoctoralprojects/2 .

Ruan, S., Jiang, L., Xu, J., Tham, B. J.-K., Qiu, Z., Zhu, Y., Murnane, E. L., Brunskill, E., and Landay, J. A. (2019). “QuizBot: A Dialogue-based Adaptive Learning System for Factual Knowledge,” in 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019 , Glasgow, Scotland, United Kingdom , May 4–9, 2019 , (ACM), 1–13. doi:10.1145/3290605.3300587

Sandoval, Z. V. (2018). Design and Implementation of a Chatbot in Online Higher Education Settings. Issues Inf. Syst. 19, 44–52. doi:10.48009/4.iis.2018.44-52

Sandu, N., and Gide, E. (2019). “Adoption of AI-Chatbots to Enhance Student Learning Experience in Higher Education in india,” in 18th International Conference on Information Technology Based Higher Education and Training , Magdeburg, Germany , September 26–27, 2019 , (IEEE), 1–5. doi:10.1109/ITHET46829.2019.8937382

Saygin, A. P., Cicekli, I., and Akman, V. (2000). Turing Test: 50 Years Later. Minds and Machines 10 (4), 463–518. doi:10.1023/A:1011288000451

Sinclair, A., McCurdy, K., Lucas, C. G., Lopez, A., and Gaševic, D. (2019). “Tutorbot Corpus: Evidence of Human-Agent Verbal Alignment in Second Language Learner Dialogues,” in EDM 2019 - Proceedings of the 12th International Conference on Educational Data Mining .

Smutny, P., and Schreiberova, P. (2020). Chatbots for Learning: A Review of Educational Chatbots for the Facebook Messenger. Comput. Edu. 151, 103862. doi:10.1016/j.compedu.2020.103862

Song, D., Rice, M., and Oh, E. Y. (2019). Participation in Online Courses and Interaction with a Virtual Agent. Int. Rev. Res. Open. Dis. 20, 44–62. doi:10.19173/irrodl.v20i1.3998

Stapić, Z., Horvat, A., and Vukovac, D. P. (2020). Designing a Faculty Chatbot through User-Centered Design Approach, in 22nd International Conference on Human-Computer Interaction,HCII 2020 , Copenhagen, Denmark , July 19–24, 2020 , Lecture Notes in Computer Science. Editors C. Stephanidis, D. Harris, W. C. Li, D. D. Schmorrow, C. M. Fidopiastis, and P. Zaphiris, ( Springer ), 472–484. doi:10.1007/978-3-030-60128-7

Subramaniam, N. K. (2019). Teaching and Learning via Chatbots with Immersive and Machine Learning Capabilities. In International Conference on Education (ICE 2019) Proceedings , Kuala Lumpur, Malaysia , April 10–11, 2019 . Editors S. A. H. Ali, T. T. Subramaniam, and S. M. Yusof, 145–156.

Sugondo, A. F., and Bahana, R. (2019). “Chatbot as an Alternative Means to Access Online Information Systems,” in 3rd International Conference on Eco Engineering Development, ICEED 2019 , Surakarta, Indonesia , November 13–14, 2019 , IOP Conference Series: Earth and Environmental Science, (IOP Publishing) 426. doi:10.1088/1755-1315/426/1/012168

Suwannatee, S., and Suwanyangyuen, A. (2019). “Reading Chatbot” Mahidol University Library and Knowledge Center Smart Assistant,” in Proceedings for the 2019 International Conference on Library and Information Science (ICLIS) , Taipei, Taiwan , July 11–13, 2019 .

Vaidyam, A. N., Wisniewski, H., Halamka, J. D., Kashavan, M. S., and Torous, J. B. (2019). Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape. Can. J. Psychiatry 64 (7), 456–464. doi:10.1177/0706743719828977

Vijayakumar, B., Höhn, S., and Schommer, C. (2019). “Quizbot: Exploring Formative Feedback with Conversational Interfaces,” in 21st International Conference on Technology Enhanced Assessment, TEA 2018 , Amsterdam, Netherlands , Dec 10-11, 2018 . Editors S. Draaijer, B. D. Joosten-ten, and E. Ras, ( Springer ), 102–120. doi:10.1007/978-3-030-25264-9

Virtanen, M. A., Haavisto, E., Liikanen, E., and Kääriäinen, M. (2018). Ubiquitous Learning Environments in Higher Education: A Scoping Literature Review. Educ. Inf. Technol. 23 (2), 985–998. doi:10.1007/s10639-017-9646-6

Wildman, T. M., Magliaro, S. G., Niles, R. A., and Niles, J. A. (1992). Teacher Mentoring: An Analysis of Roles, Activities, and Conditions. J. Teach. Edu. 43 (3), 205–213. doi:10.1177/0022487192043003007

Wiley, D., and Edwards, E. K. (2002). Online Self-Organizing Social Systems: The Decentralized Future of Online Learning. Q. Rev. Distance Edu. 3 (1), 33–46.

Winkler, R., and Soellner, M. (2018). Unleashing the Potential of Chatbots in Education: A State-Of-The-Art Analysis. in Academy of Management Annual Meeting Proceedings 2018 2018 (1), 15903. doi:10.5465/AMBPP.2018.15903abstract

Winne, P. H., and Hadwin, A. F. (2008). “The Weave of Motivation and Self-Regulated Learning,” in Motivation and Self-Regulated Learning: Theory, Research, and Applications . Editors D. H. Schunk, and B. J. Zimmerman, (Mahwah, NJ: Lawrence Erlbaum Associates Publishers ), 297–314.

Wisniewski, B., Zierer, K., and Hattie, J. (2019). The Power of Feedback Revisited: A Meta-Analysis of Educational Feedback Research. Front. Psychol. 10, 1664–1078. doi:10.3389/fpsyg.2019.03087

Wolfbauer, I., Pammer-Schindler, V., and Rose, C. P. (2020). “Rebo Junior: Analysis of Dialogue Structure Quality for a Reflection Guidance Chatbot,” in Proceedings of the Impact Papers at EC-TEL 2020, co-located with the 15th European Conference on Technology-Enhanced Learning “Addressing global challenges and quality education” (EC-TEL 2020) , Virtual , Sept 14–18, 2020 . Editors T. Broos, and T. Farrell, 1–14.

Xiao, Z., Zhou, M. X., and Fu, W.-T. (2019). “Who should be my teammates: Using a conversational agent to understand individuals and help teaming,” in IUI’19: Proceedings of the 24th International Conference on Intelligent User Interfaces, Marina del Ray , California, USA , March 17–20, 2019 , (ACM), 437–447. doi:10.1145/3301275.3302264

Xu, A., Liu, Z., Guo, Y., Sinha, V., and Akkiraju, R. (2017). “A New Chatbot for Customer Service on Social media,” in Proceedings of the 2017 CHI conference on human factors in computing systems , Denver, Colorado, USA , May 6–11, 2017 , ACM, 3506–3510. doi:10.1145/3025453.3025496

Yin, J., Goh, T.-T., Yang, B., and Xiaobin, Y. (2020). Conversation Technology with Micro-learning: The Impact of Chatbot-Based Learning on Students' Learning Motivation and Performance. J. Educ. Comput. Res. 59, 154–177. doi:10.1177/0735633120952067

Appendix a aconcept map of chatbots in education

Keywords: chatbots, education, literature review, pedagogical roles, domains

Citation: Wollny S, Schneider J, Di Mitri D, Weidlich J, Rittberger M and Drachsler H (2021) Are We There Yet? - A Systematic Literature Review on Chatbots in Education. Front. Artif. Intell. 4:654924. doi: 10.3389/frai.2021.654924

Received: 17 January 2021; Accepted: 10 June 2021; Published: 15 July 2021.

Reviewed by:

Copyright © 2021 Wollny, Schneider, Di Mitri, Weidlich, Rittberger and Drachsler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Sebastian Wollny, [email protected] ; Jan Schneider, [email protected]

This article is part of the Research Topic

Intelligent Conversational Agents

Conversational AI: Chatbots

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • JMIR Cancer
  • v.7(4); Oct-Dec 2021

Logo of jmircan

Chatbot for Health Care and Oncology Applications Using Artificial Intelligence and Machine Learning: Systematic Review

1 Institute of Biomedical Engineering, University of Toronto, Toronto, ON, Canada

2 Department of Medical Biophysics, Western University, London, ON, Canada

Leslie Sanders

3 Department of Humanities, York University, Toronto, ON, Canada

4 Department of English, York University, Toronto, ON, Canada

James C L Chow

5 Department of Medical Physics, Radiation Medicine Program, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada

6 Department of Radiation Oncology, University of Toronto, Toronto, ON, Canada

Chatbot is a timely topic applied in various fields, including medicine and health care, for human-like knowledge transfer and communication. Machine learning, a subset of artificial intelligence, has been proven particularly applicable in health care, with the ability for complex dialog management and conversational flexibility.

This review article aims to report on the recent advances and current trends in chatbot technology in medicine. A brief historical overview, along with the developmental progress and design characteristics, is first introduced. The focus will be on cancer therapy, with in-depth discussions and examples of diagnosis, treatment, monitoring, patient support, workflow efficiency, and health promotion. In addition, this paper will explore the limitations and areas of concern, highlighting ethical, moral, security, technical, and regulatory standards and evaluation issues to explain the hesitancy in implementation.

A search of the literature published in the past 20 years was conducted using the IEEE Xplore, PubMed, Web of Science, Scopus, and OVID databases. The screening of chatbots was guided by the open-access Botlist directory for health care components and further divided according to the following criteria: diagnosis, treatment, monitoring, support, workflow, and health promotion.

Even after addressing these issues and establishing the safety or efficacy of chatbots, human elements in health care will not be replaceable. Therefore, chatbots have the potential to be integrated into clinical practice by working alongside health practitioners to reduce costs, refine workflow efficiencies, and improve patient outcomes. Other applications in pandemic support, global health, and education are yet to be fully explored.

Conclusions

Further research and interdisciplinary collaboration could advance this technology to dramatically improve the quality of care for patients, rebalance the workload for clinicians, and revolutionize the practice of medicine.

Introduction

Artificial intelligence (AI) is at the forefront of transforming numerous aspects of our lives by modifying the way we analyze information and improving decision-making through problem solving, reasoning, and learning. Machine learning (ML) is a subset of AI that improves its performance based on the data provided to a generic algorithm from experience rather than defining rules in traditional approaches [ 1 ]. Advancements in ML have provided benefits in terms of accuracy, decision-making, quick processing, cost-effectiveness, and handling of complex data [ 2 ]. Chatbots, also known as chatter robots, smart bots, conversational agents, digital assistants, or intellectual agents, are prime examples of AI systems that have evolved from ML. The Oxford dictionary defines a chatbot as “a computer program that can hold a conversation with a person, usually over the internet . ” They can also be physical entities designed to socially interact with humans or other robots. Predetermined responses are then generated by analyzing user input, on text or spoken ground, and accessing relevant knowledge [ 3 ]. Problems arise when dealing with more complex situations in dynamic environments and managing social conversational practices according to specific contexts and unique communication strategies [ 4 ].

Given these effectual benefits, it is not surprising that chatbots have rapidly evolved over the past 2 decades and integrated themselves into numerous fields, such as entertainment, travel, gaming, robotics, and security. Chatbots have been proven to be particularly applicable in various health care components that usually involve face-to-face interactions. With their ability for complex dialog management and conversational flexibility, integration of chatbot technology into clinical practice may reduce costs, refine workflow efficiencies, and improve patient outcomes [ 5 ]. A web-based, self-report survey examining physicians’ perspectives found positive benefits of health care chatbots in managing one’s own health; for improved physical, psychological, and behavioral outcomes; and most notably, for administrative purposes [ 6 ]. In light of the opportunities provided by this relatively new technology, potential limitations and areas of concern may arise that could potentially harm users. Concerns regarding accuracy, cybersecurity, lack of empathy, and technological maturity are reported as potential factors associated with the delay in chatbot acceptability or integration into health care [ 7 ].

This narrative review paper reports on health care components for chatbots, with a focus on cancer therapy. The rest of this paper is organized as follows: first, we introduce the developmental progress with a general overview of the architecture, design concepts, and types of chatbots; the main Results section focuses on the role that chatbots play in areas related to oncology, such as diagnosis, treatment, monitoring, support, workflow efficiency, and health promotion; and the Discussion section analyzes potential limitations and concerns for successful implementation while addressing future applications and research topics.

This review focuses on articles from peer-reviewed journals and conference proceedings. The following databases were searched from October to December 2020 for relevant and current studies from 2000 to 2020: IEEE Xplore, PubMed, Web of Science, Scopus, and OVID. The literature search used the following key terms: chatbot , chatter robot , conversational agent , artificial intelligence , and machine learning . For further refinement, these key terms were combined with more specific terms aligned with the focus of the paper. This included healthcare , cancer therapy , oncology , diagnosis , treatment , radiation therapy , and radiotherapy . The searches were not limited by language or study design. Letters and technical reports were excluded from the search. The full list of sources and search strategies is available from the authors.

The screening of chatbots was guided by a systematic review process from the Botlist directory during the period of January 2021. This directory was chosen as it was open-access and categorized the chatbots under many different categories (ie, health care, communication, and entertainment) and contained many commonly used messaging services (ie, Facebook Messenger, Discord, Slack, Kik, and Skype). A total of 78 chatbots were identified for health care components and further divided according to the following criteria: diagnosis, treatment, monitoring, support, workflow, and health promotion. It should be noted that using the health filters from a web directory limits the results to the search strategy and marketing label. Thus, the results from equivalent studies may differ when repeated.

Chatbot History and Evolution

The idea of a chatbot was first introduced in 1950 when Alan Turing proposed the question, “Can machines think?” [ 8 ]. The earliest forms were designed to pass the Turing test and mimic human conversations as much as possible. In 1966, ELIZA (MIT Artificial Intelligence Library) was the first known chatbot developed to act as a psychotherapist, using pattern matching and template-based responses to converse in a question-based format [ 9 ]. Improvements were made to build a more human-like and personalized entity by incorporating a personality in PARRY (developed Kenneth Colby) that simulated a paranoid patient [ 10 ]. One of the most well-known chatbots is ALICE, developed in 1995 by Richard Wallace, which uses a pattern-matching technique to retrieve example sentences from output templates and avoid inappropriate responses [ 11 ]. A renewed interest in AI and advances in ML have led to the growing use and availability of chatbots in various fields [ 12 ]. SmarterChild (ActiveBuddy, Inc) [ 13 ] became widely accessible through messenger apps, followed by more familiar web-based assistants using voice-activated systems, such as Apple Siri, Amazon Alexa, Google Assistant, and Microsoft Cortana. On the basis of our analysis ( Figure 1 ), the most popular developments of chatbots for health care purposes are diagnostics, patient support (ie, mental health counseling), and health promotion. Some of these applications will be further explored in the following section for cancer applications.

An external file that holds a picture, illustration, etc.
Object name is cancer_v7i4e27850_fig1.jpg

Search and screening for health care chatbots. Chatbots using more than one platform are included.

Chatbot General Architecture

Although there are a variety of techniques for the development of chatbots, the general layout is relatively straightforward. As a computer application that uses ML to mimic human conversation, the underlying concept is similar for all types with 4 essential stages (input processing, input understanding, response generation, and response selection) [ 14 ]. A simplified general chatbot architecture is illustrated in Figure 2 . First, the user makes a request, in text or speech format, which is received and interpreted by the chatbot. From there, the processed information could be remembered, or more details could be requested for clarification. After the request is understood, the requested actions are performed, and the data of interest are retrieved from the database or external sources [ 15 ].

An external file that holds a picture, illustration, etc.
Object name is cancer_v7i4e27850_fig2.jpg

Schematic representation of general chatbot architecture.

Chatbot Types

With the vast number of algorithms, tools, and platforms available, understanding the different types and end purposes of these chatbots will assist developers in choosing the optimal tools when designing them to fit the specific needs of users. These categories are not exclusive, as chatbots may possess multiple characteristics, making the process more variable. The 5 main types are described below [ 15 ]. Textbox 1 describes some examples of the recommended apps for each type of chatbot but are not limited to the ones specified.

Recommended health care components for the different types of chatbots.

Knowledge domain

  • Open domain: responding to more general and broader topics that can be easily searched within databases; may be the preferred chatbot type for routine symptom screening, connecting to providers or services, or health promotion apps
  • Closed domain: responding to complex or specific questions requiring more in-depth research; may be the preferred chatbot type for treatment planning or recommendation

Service provided

  • Interpersonal: used mainly to transmit information without much intimate connection with users; may be the preferred chatbot type for imaging diagnostics or hereditary assessment where the main duty is to relay factual information to users
  • Intrapersonal: tailored for companionship or support; may be the preferred chatbot type for counseling, emotional support, or health promotion that requires a sense of human touch
  • Interagent: used for communicating with other chatbots or computer systems; may be the preferred chatbot type for administration purposes when transferring patient information between locations
  • Informative: designed to provide information from warehouse database or inventory entry; may be the preferred chatbot type for connecting patients with resources or remote patient monitoring
  • Conversational: built with the purpose of conversing with users as naturally as possible; may be the preferred chatbot type for counseling, emotional support, or health promotion
  • Task based: only performs 1 specific task where actions are predetermined; may be the preferred chatbot type for screening and diagnostics

Response generation

  • Uses pattern matching when the domain is narrow and sufficient data are available to train the system; may be the preferred chatbot type for screening and diagnostics

Human aided

  • Incorporates human computation that increases flexibility and robustness but decreases speed; may be the preferred chatbot type for most apps except for support or workflow efficiency, where speed is an essential factor in the delivery of care

Knowledge domain classification is based on accessible knowledge or the data used to train the chatbot. Under this category are the open domain for general topics and the closed domain focusing on more specific information. Service-provided classification is dependent on sentimental proximity to the user and the amount of intimate interaction dependent on the task performed. This can be further divided into interpersonal for providing services to transmit information, intrapersonal for companionship or personal support to humans, and interagent to communicate with other chatbots [ 14 ]. The next classification is based on goals with the aim of achievement, subdivided into informative, conversational, and task based. Response generation chatbots, further classified as rule based, retrieval based, and generative, account for the process of analyzing inputs and generating responses [ 16 ]. Finally, human-aided classification incorporates human computation, which provides more flexibility and robustness but lacks the speed to accommodate more requests [ 17 ].

Chatbots in Cancer Therapy

Cancer has become a major health crisis and is the second leading cause of death in the United States [ 18 ]. The exponentially increasing number of patients with cancer each year may be because of a combination of carcinogens in the environment and improved quality of care. The latter aspect could explain why cancer is slowly becoming a chronic disease that is manageable over time [ 19 ]. Added life expectancy poses new challenges for both patients and the health care team. For example, many patients now require extended at-home support and monitoring, whereas health care workers deal with an increased workload. Although clinicians’ knowledge base in the use of scientific evidence to guide decision-making has expanded, there are still many other facets to the quality of care that has yet to catch up. Key areas of focus are safety, effectiveness, timeliness, efficiency, equitability, and patient-centered care [ 20 ].

Chatbots have the potential to address many of the current concerns regarding cancer care mentioned above. This includes the triple aim of health care that encompasses improving the experience of care, improving the health of populations, and reducing per capita costs [ 21 ]. Chatbots can improve the quality or experience of care by providing efficient, equitable, and personalized medical services. We can think of them as intermediaries between physicians for facilitating the history taking of sensitive and intimate information before consultations. They could also be thought of as decision aids that deliver regular feedback on disease progression and treatment reactions to help clinicians better understand individual conditions. Preventative measures of cancer have become a priority worldwide, as early detection and treatment alone have not been effective in eliminating this disease [ 22 ]. Physical, psychological, and behavioral improvements of underserved or vulnerable populations may even be possible through chatbots, as they are so readily accessible through common messaging platforms. Health promotion use, such as lifestyle coaching, healthy eating, and smoking cessation, has been one of the most common chatbots according to our search. In addition, chatbots could help save a significant amount of health care costs and resources. Newer therapeutic innovations have come with a heavy price tag, and out-of-pocket expenses have placed a significant strain on patients’ financial well-being [ 23 ]. With chatbots implemented in cancer care, consultations for minor health concerns may be avoided, which allows clinicians to spend more time with patients who need their attention the most. Costs may also be reduced by delivering medical services more efficiently. For example, the workflow can be streamlined by assisting physicians in administrative tasks, such as scheduling appointments, providing medical information, or locating clinics.

With the rapidly increasing applications of chatbots in health care, this section will explore several areas of development and innovation in cancer care. Various examples of current chatbots provided below will illustrate their ability to tackle the triple aim of health care. The specific use case of chatbots in oncology with examples of actual products and proposed designs are outlined in Table 1 .

Use case for chatbots in oncology, with examples of current specific applications or proposed designs.

Diagnostics and Screening

An accurate diagnosis is critical for appropriate care to be administered. In terms of cancer diagnostics, AI-based computer vision is a function often used in chatbots that can recognize subtle patterns from images. This would increase physicians’ confidence when identifying cancer types, as even highly trained individuals may not always agree on the diagnosis [ 52 ]. Studies have shown that the interpretation of medical images for the diagnosis of tumors performs equally well or better with AI compared with experts [ 53 - 56 ]. In addition, automated diagnosis may be useful when there are not enough specialists to review the images. This was made possible through deep learning algorithms in combination with the increasing availability of databases for the tasks of detection, segmentation, and classification [ 57 ]. For example, Medical Sieve (IBM Corp) is a chatbot that examines radiological images to aid and communicate with cardiologists and radiologists to identify issues quickly and reliably [ 24 ]. Similarly, InnerEye (Microsoft Corp) is a computer-assisted image diagnostic chatbot that recognizes cancers and diseases within the eye but does not directly interact with the user like a chatbot [ 42 ]. Even with the rapid advancements of AI in cancer imaging, a major issue is the lack of a gold standard [ 58 ].

From the patient’s perspective, various chatbots have been designed for symptom screening and self-diagnosis. The ability of patients to be directed to urgent referral pathways through early warning signs has been a promising market. Decreased wait times in accessing health care services have been found to correlate with improved patient outcomes and satisfaction [ 59 - 61 ]. The automated chatbot, Quro (Quro Medical, Inc), provides presynopsis based on symptoms and history to predict user conditions (average precision approximately 0.82) without a form-based data entry system [ 25 ]. In addition to diagnosis, Buoy Health (Buoy Health, Inc) assists users in identifying the cause of their illness and provides medical advice [ 26 ]. Another chatbot designed by Harshitha et al [ 27 ] uses dialog flow to provide an initial analysis of breast cancer symptoms. It has been proven to be 95% accurate in differentiating between normal and cancerous images. Even with promising results, there are still potential areas for improvement. A study of 3 mobile app–based chatbot symptom checkers, Babylon (Babylon Health, Inc), Your.md (Healthily, Inc), and Ada (Ada, Inc), indicated that sensitivity remained low at 33% for the detection of head and neck cancer [ 28 ]. The number of studies assessing the development, implementation, and effectiveness are still relatively limited compared with the diversity of chatbots currently available. Further studies are required to establish the efficacy across various conditions and populations. Nonetheless, chatbots for self-diagnosis are an effective way of advising patients as the first point of contact if accuracy and sensitivity requirements can be satisfied.

Early cancer detection can lead to higher survival rates and improved quality of life. Inherited factors are present in 5% to 10% of cancers, including breast, colorectal, prostate, and rare tumor syndromes [ 62 ]. Family history collection is a proven way of easily accessing the genetic disposition of developing cancer to inform risk-stratified decision-making, clinical decisions, and cancer prevention [ 63 ]. The web-based chatbot ItRuns (ItRunsInMyFamily) gathers family history information at the population level to determine the risk of hereditary cancer [ 29 ]. We have yet to find a chatbot that incorporates deep learning to process large and complex data sets at a cellular level. Although not able to directly converse with users, DeepTarget [ 64 ] and deepMirGene [ 65 ] are capable of performing miRNA and target predictions using expression data with higher accuracy compared with non–deep learning models. With the advent of phenotype–genotype predictions, chatbots for genetic screening would greatly benefit from image recognition. New screening biomarkers are also being discovered at a rapid speed, so continual integration and algorithm training are required. These findings align with studies that demonstrate that chatbots have the potential to improve user experience and accessibility and provide accurate data collection [ 66 ].

Chatbots are now able to provide patients with treatment and medication information after diagnosis without having to directly contact a physician. Such a system was proposed by Mathew et al [ 30 ] that identifies the symptoms, predicts the disease using a symptom–disease data set, and recommends a suitable treatment. Although this may seem as an attractive option for patients looking for a fast solution, computers are still prone to errors, and bypassing professional inspection may be an area of concern. Chatbots may also be an effective resource for patients who want to learn why a certain treatment is necessary. Madhu et al [ 31 ] proposed an interactive chatbot app that provides a list of available treatments for various diseases, including cancer. This system also informs the user of the composition and prescribed use of medications to help select the best course of action. The diagnosis and course of treatment for cancer are complex, so a more realistic system would be a chatbot used to connect users with appropriate specialists or resources. A text-to-text chatbot by Divya et al [ 32 ] engages patients regarding their medical symptoms to provide a personalized diagnosis and connects the user with the appropriate physician if major diseases are detected. Rarhi et al [ 33 ] proposed a similar design that provides a diagnosis based on symptoms, measures the seriousness, and connects users with a physician if needed [ 33 ]. In general, these systems may greatly help individuals in conducting daily check-ups, increase awareness of their health status, and encourage users to seek medical assistance for early intervention.

Chatbots have also been used by physicians during treatment planning. For example, IBM’s Watson for Oncology examines data from records and medical notes to generate an evidence-based treatment plan for oncologists [ 34 ]. Studies have shown that Watson for Oncology still cannot replace experts at this moment, as quite a few cases are not consistent with experts (approximately 73% concordant) [ 67 , 68 ]. Nonetheless, this could be an effective decision-making tool for cancer therapy to standardize treatments. Although not specifically an oncology app, another chatbot example for clinicians’ use is the chatbot Safedrugbot (Safe In Breastfeeding) [ 69 ]. This is a chat messaging service for health professionals offering assistance with appropriate drug use information during breastfeeding. Promising progress has also been made in using AI for radiotherapy to reduce the workload of radiation staff or identify at-risk patients by collecting outcomes before and after treatment [ 70 ]. An ideal chatbot for health care professionals’ use would be able to accurately detect diseases and provide the proper course of recommendations, which are functions currently limited by time and budgetary constraints. Continual algorithm training and updates would be necessary because of the constant improvements in current standards of care. Further refinements and testing for the accuracy of algorithms are required before clinical implementation [ 71 ]. This area holds tremendous potential, as an estimated ≥50% of all patients with cancer have used radiotherapy during the course of their treatment.

Patient Monitoring

Chatbots have been implemented in remote patient monitoring for postoperative care and follow-ups. The health care sector is among the most overwhelmed by those needing continued support outside hospital settings, as most patients newly diagnosed with cancer are aged ≥65 years [ 72 ]. The integration of this application would improve patients’ quality of life and relieve the burden on health care providers through better disease management, reducing the cost of visits and allowing timely follow-ups. In terms of cancer therapy, remote monitoring can support patients by enabling higher dose chemotherapy drug delivery, reducing secondary hospitalizations, and providing health benefits after surgery [ 73 - 75 ].

StreamMD (StreamMD, Inc), Conversa (Conversa Health, Inc), and Memora Health (Memora Health, Inc) are chatbots that function on existing messaging platforms that provide patients with immediate access to care instructions and educational information [ 35 ]. To ensure that patients adhere to instructions, AiCure (AiCure, Inc) uses a smartphone webcam to coach them in managing their condition. Recently, a chatbot architecture was proposed for patient support based on microservices to provide personalized eHealth functionalities and data storage [ 36 ]. Several studies have supported the application of chatbots for patient monitoring [ 76 ]. The semiautomized messaging chatbot Infinity (Facebook, Inc) was used to assess the health outcomes and health care impacts of phone-based monitoring for patients with cancer aged ≥65 years. After 2 years of implementation, there was a 97% satisfactory rate, and 87% considered monitoring useful, with the most reported benefit being treatment management and moral support [ 37 ]. Similar results were discovered in 2 studies using Vik (WeFight, Inc), a text-based chatbot that responds to the daily needs and concerns of patients and their relatives with personal insights. A 1-year prospective study of 4737 patients with breast cancer reported a 94% overall satisfaction rate [ 38 ]. A more in-depth analysis of the 132,970 messages showed that users were more likely to answer multiple-choice questions compared with open-ended ones, chatbots improved treatment compliance rate by >20% ( P =.04), and intimate or sensitive topics were openly discussed. An area of concern is that retention rates drastically decreased to 31% by the end of this study. The other study was a phase 3, blind, noninferiority randomized controlled trial (n=132) to assess the level of patient satisfaction with the answers provided by chatbots versus those by physicians [ 39 ]. Using 12 frequently asked questions on breast cancer, participants were split into 2 groups to rate the quality of answers from chatbots or physicians. Among patients with breast cancer in treatment or remission, chatbot answers were shown to be noninferior ( P <.001), with a success rate of 69% compared with 64% in the physician groups. Concerns regarding the chatbot’s ability to successfully answer more complex questions or detect differences between major and minor symptoms still remain to be addressed.

Further refinements and large-scale implementations are still required to determine the benefits across different populations and sectors in health care [ 26 ]. Although overall satisfaction is found to be relatively high, there is still room for improvement by taking into account user feedback tailored to the patient’s changing needs during recovery. In combination with wearable technology and affordable software, chatbots have great potential to affect patient monitoring solutions.

Patient Support

The prevalence of cancer is increasing along with the number of survivors of cancer, partly because of improved treatment techniques and early detection [ 77 ]. These individuals experience added health problems, such as infections, chronic diseases, psychological issues, and sleep disturbances, which often require specific needs that are not met by many practitioners (ie, medical, psychosocial, informational, and proactive contact) [ 78 ]. A number of these individuals require support after hospitalization or treatment periods. Maintaining autonomy and living in a self-sustaining way within their home environment is especially important for older populations [ 79 ]. Implementation of chatbots may address some of these concerns, such as reducing the burden on the health care system and supporting independent living.

With psychiatric disorders affecting at least 35% of patients with cancer, comprehensive cancer care now includes psychosocial support to reduce distress and foster a better quality of life [ 80 ]. The first chatbot was designed for individuals with psychological issues [ 9 ]; however, they continue to be used for emotional support and psychiatric counseling with their ability to express sympathy and empathy [ 81 ]. Health-based chatbots delivered through mobile apps, such as Woebot (Woebot Health, Inc), Youper (Youper, Inc), Wysa (Wysa, Ltd), Replika (Luka, Inc), Unmind (Unmind, Inc), and Shim (Shim, Inc), offer daily emotional support and mental health tracking [ 26 ]. A study performed on Woebot, developed based on cognitive behavioral therapy, showed that depressive symptoms were significantly reduced, and participants were more receptive than in traditional therapies [ 41 ]. This agreed with the Shim results, also using the same type of therapy, which showed that the intervention was highly engaging, improved well-being, and reduced stress [ 82 ]. When another chatbot was developed based on the structured association technique counseling method, the user’s motivation was enhanced, and stress was reduced [ 83 ]. Similarly, a graph-based chatbot has been proposed to identify the mood of users through sentimental analysis and provide human-like responses to comfort patients [ 84 ]. Vivobot (HopeLab, Inc) provides cognitive and behavioral interventions to deliver positive psychology skills and promote well-being. This psychiatric counseling chatbot was effective in engaging users and reducing anxiety in young adults after cancer treatment [ 40 ]. The limitation to the abovementioned studies was that most participants were young adults, most likely because of the platform on which the chatbots were available. In addition, longer follow-up periods with larger and more diverse sample sizes are needed for future studies. Chatbots used for psychological support hold great potential, as individuals are more comfortable disclosing personal information when no judgments are formed, even if users could still discriminate their responses from that of humans [ 82 , 85 ].

Workflow Efficiency

Electronic health records have improved data availability but also increased the complexity of the clinical workflow, contributing to ineffective treatment plans and uninformed management [ 86 ]. A streamlined process using ML techniques would allow clinicians to spend more time with patients by decreasing the time spent on data entry through the ease of documentation, exposing relevant patient information from the chart, automatically authorizing payment, or reducing medical errors [ 58 ]. For example, Mandy is a chatbot that assists health care staff by automating the patient intake process [ 43 ]. Using a combination of data-driven natural language processing with knowledge-driven diagnostics, this chatbot interviews the patient, understands their chief complaints, and submits reports to physicians for further analysis [ 43 ]. Similarly, Sense.ly (Sense.ly, Inc) acts as a web-based nurse to assist in monitoring appointments, managing patients’ conditions, and suggesting therapies. Another chatbot that reduces the burden on clinicians and decreases wait time is Careskore (CareShore, Inc), which tracks vitals and anticipates the need for hospital admissions [ 42 ]. Chatbots have also been proposed to autonomize patient encounters through several advanced eHealth services. In addition to collecting data and providing bookings, Health OnLine Medical Suggestions or HOLMES (Wipro, Inc) interacts with patients to support diagnosis, choose the proper treatment pathway, and provide prevention check-ups [ 44 ]. Although the use of chatbots in health care and cancer therapy has the potential to enhance clinician efficiency, reimbursement codes for practitioners are still lacking before universal implementation. In addition, studies will need to be conducted to validate the effectiveness of chatbots in streamlining workflow for different health care settings. Nonetheless, chatbots hold great potential to complement telemedicine by streamlining medical administration and autonomizing patient encounters.

Health Promotion

Survivors of cancer, particularly those who underwent treatment during childhood, are more susceptible to adverse health risks and medical complications. Consequently, promoting a healthy lifestyle early on is imperative to maintain quality of life, reduce mortality, and decrease the risk of secondary cancers [ 87 ]. According to the analysis from the web directory, health promotion chatbots are the most commonly available; however, most of them are only available on a single platform. Thus, interoperability on multiple common platforms is essential for adoption by various types of users across different age groups. In addition, voice and image recognition should also be considered, as most chatbots are still text based.

Healthy diets and weight control are key to successful disease management, as obesity is a significant risk factor for chronic conditions. Chatbots have been incorporated into health coaching systems to address health behavior modifications. For example, CoachAI and Smart Wireless Interactive Health System used chatbot technology to track patients’ progress, provide insight to physicians, and suggest suitable activities [ 45 , 46 ]. Another app is Weight Mentor, which provides self-help motivation for weight loss maintenance and allows for open conversation without being affected by emotions [ 47 ]. Health Hero (Health Hero, Inc), Tasteful Bot (Facebook, Inc), Forksy (Facebook, Inc), and SLOWbot (iaso heath, Inc) guide users to make informed decisions on food choices to change unhealthy eating habits [ 48 , 49 ]. The effectiveness of these apps cannot be concluded, as a more rigorous analysis of the development, evaluation, and implementation is required. Nevertheless, chatbots are emerging as a solution for healthy lifestyle promotion through access and human-like communication while maintaining anonymity.

Most would assume that survivors of cancer would be more inclined to practice health protection behaviors with extra guidance from health professionals; however, the results have been surprising. Smoking accounts for at least 30% of all cancer deaths; however, up to 50% of survivors continue to smoke [ 88 ]. The benefit of using chatbots for smoking cessation across various age groups has been highlighted in numerous studies showing improved motivation, accessibility, and adherence to treatment, which have led to increased smoking abstinence [ 89 - 91 ]. The cognitive behavioral therapy–based chatbot SMAG, supporting users over the Facebook social network, resulted in a 10% higher cessation rate compared with control groups [ 50 ]. Motivational interview–based chatbots have been proposed with promising results, where a significant number of patients showed an increase in their confidence and readiness to quit smoking after 1 week [ 92 ]. No studies have been found to assess the effectiveness of chatbots for smoking cessation in terms of ethnic, racial, geographic, or socioeconomic status differences. Creating chatbots with prespecified answers is simple; however, the problem becomes more complex when answers are open. Bella, one of the most advanced text-based chatbots on the market advertised as a coach for adults, gets stuck when responses are not prompted [ 51 ]. Therefore, the reaction to unexpected responses is still an area in progress. Given all the uncertainties, chatbots hold potential for those looking to quit smoking, as they prove to be more acceptable for users when dealing with stigmatized health issues compared with general practitioners [ 7 ].

Challenges and Limitations

AI and ML have advanced at an impressive rate and have revealed the potential of chatbots in health care and clinical settings. AI technology outperforms humans in terms of image recognition, risk stratification, improved processing, and 24/7 assistance with data and analysis. However, there is no machine substitute for higher-level interactions, critical thinking, and ambiguity [ 93 ]. Chatbots create added complexity that must be identified, addressed, and mitigated before their universal adoption in health care.

Hesitancy from physicians and poor adoption by patients is a major barrier to overcome, which could be explained by many of the factors discussed in this section. A cross-sectional web-based survey of 100 practicing physicians gathered the perceptions of chatbots in health care [ 6 ]. Although a wide variety of beneficial aspects were reported (ie, management of health and administration), an equal number of concerns were present. Over 70% of physicians believe that chatbots cannot effectively care for all the patients’ needs, cannot display human emotion, cannot provide detailed treatment plans, and pose a risk if patients self-diagnose or do not fully comprehend their diagnosis. If the limitations of chatbots are better understood and mitigated, the fears of adopting this technology in health care may slowly subside. The Discussion section ends by exploring the challenges and questions for health care professionals, patients, and policy makers.

Moral and Ethical Constraints

The use of chatbots in health care presents a novel set of moral and ethical challenges that must be addressed for the public to fully embrace this technology. Issues to consider are privacy or confidentiality, informed consent, and fairness. Each of these concerns is addressed below. Although efforts have been made to address these concerns, current guidelines and policies are still far behind the rapid technological advances [ 94 ].

Health care data are highly sensitive because of the risk of stigmatization and discrimination if the information is wrongfully disclosed. The ability of chatbots to ensure privacy is especially important, as vast amounts of personal and medical information are often collected without users being aware, including voice recognition and geographical tracking. The public’s lack of confidence is not surprising, given the increased frequency and magnitude of high-profile security breaches and inappropriate use of data [ 95 ]. Unlike financial data that becomes obsolete after being stolen, medical data are particularly valuable, as they are not perishable. Privacy threats may break the trust that is essential to the therapeutic physician–patient relationship and inhibit open communication of relevant clinical information for proper diagnosis and treatment [ 96 ].

Chatbots experience the Black Box problem, which is similar to many computing systems programmed using ML that are trained on massive data sets to produce multiple layers of connections. Although they are capable of solving complex problems that are unimaginable by humans, these systems remain highly opaque, and the resulting solutions may be unintuitive. This means that the systems’ behavior is hard to explain by merely looking inside, and understanding exactly how they are programmed is nearly impossible. For both users and developers, transparency becomes an issue, as they are not able to fully understand the solution or intervene to predictably change the chatbot’s behavior [ 97 ]. With the novelty and complexity of chatbots, obtaining valid informed consent where patients can make their own health-related risk and benefit assessments becomes problematic [ 98 ]. Without sufficient transparency, deciding how certain decisions are made or how errors may occur reduces the reliability of the diagnostic process. The Black Box problem also poses a concern to patient autonomy by potentially undermining the shared decision-making between physicians and patients [ 99 ]. The chatbot’s personalized suggestions are based on algorithms and refined based on the user’s past responses. The removal of options may slowly reduce the patient’s awareness of alternatives and interfere with free choice [ 100 ].

Finally, the issue of fairness arises with algorithm bias when data used to train and test chatbots do not accurately reflect the people they represent [ 101 ]. As the AI field lacks diversity, bias at the level of the algorithm and modeling choices may be overlooked by developers [ 102 ]. In a study using 2 cases, differences in prediction accuracy were shown concerning gender and insurance type for intensive care unit mortality and psychiatric readmissions [ 103 ]. On a larger scale, this may exacerbate barriers to health care for minorities or underprivileged individuals, leading to worse health outcomes. Identifying the source of algorithm bias is crucial for addressing health care disparities between various demographic groups and improving data collection.

Chances for Errors

Although studies have shown that AI technologies make fewer mistakes than humans in terms of diagnosis and decision-making, they still bear inherent risks for medical errors [ 104 ]. The interpretation of speech remains prone to errors because of the complexity of background information, accuracy of linguistic unit segmentation, variability in acoustic channels, and linguistic ambiguity with homophones or semantic expressions. Chatbots are unable to efficiently cope with these errors because of the lack of common sense and the inability to properly model real-world knowledge [ 105 ]. Another factor that contributes to errors and inaccurate predictions is the large, noisy data sets used to train modern models because large quantities of high-quality, representative data are often unavailable [ 58 ]. In addition to the concern of accuracy and validity, addressing clinical utility and effectiveness of improving patients’ quality of life is just as important. With the increased use of diagnostic chatbots, the risk of overconfidence and overtreatment may cause more harm than benefit [ 99 ]. There is still clear potential for improved decision-making, as diagnostic deep learning algorithms were found to be equivalent to health care professionals in classifying diseases in terms of accuracy [ 106 ]. These issues presented above all raise the question of who is legally liable for medical errors. Avoiding responsibility becomes easier when numerous individuals are involved at multiple stages, from development to clinical applications [ 107 ]. Although the law has been lagging and litigation is still a gray area, determining legal liability becomes increasingly pressing as chatbots become more accessible in health care.

Regulatory Considerations

Regulatory standards have been developed to accommodate for rapid modifications and ensure the safety and effectiveness of AI technology, including chatbots. The US Food and Drug Administration has recognized the distinctiveness of chatbots compared with traditional medical devices by defining the software within the medical device category and has outlined its approach through the Digital Health Innovation Action Plan [ 108 ]. With the growing number of AI algorithms approved by the Food and Drug Administration, they opened public consultations for setting performance targets, monitoring performance, and reviewing when performance strays from preset parameters [ 102 ]. The American Medical Association has also adopted the Augmented Intelligence in Health Care policy for the appropriate integration of AI into health care by emphasizing the design approach and enhancement of human intelligence [ 109 ]. An area of concern is that chatbots are not covered under the Health Insurance Portability and Accountability Act; therefore, users’ data may be unknowingly sold, traded, and marketed by companies [ 110 ]. On the other hand, overregulation may diminish the value of chatbots and decrease the freedom for innovators. Consequently, balancing these opposing aspects is essential to promote benefits and reduce harm to the health care system and society.

Future Directions

Chatbots’ robustness of integrating and learning from large clinical data sets, along with its ability to seamlessly communicate with users, contributes to its widespread integration in various health care components. Given the current status and challenges of cancer care, chatbots will likely be a key player in this field’s continual improvement. More specifically, they hold promise in addressing the triple aim of health care by improving the quality of care, bettering the health of populations, and reducing the burden or cost of our health care system. Beyond cancer care, there is an increasing number of creative ways in which chatbots could be applicable to health care. During the COVID-19 pandemic, chatbots were already deployed to share information, suggest behavior, and offer emotional support. They have the potential to prevent misinformation, detect symptoms, and lessen the mental health burden during global pandemics [ 111 ]. At the global health level, chatbots have emerged as a socially responsible technology to provide equal access to quality health care and break down the barriers between the rich and poor [ 112 ]. To further advance medicine and knowledge, the use of chatbots in education for learning and assessments is crucial for providing objective feedback, personalized content, and cost-effective evaluations [ 113 ]. For example, the development of the Einstein app as a web-based physics teacher enables interactive learning and evaluations but is still far from being perfect [ 114 ]. Given chatbots’ diverse applications in numerous aspects of health care, further research and interdisciplinary collaboration to advance this technology could revolutionize the practice of medicine.

On the basis of the discussion above, the following features are general directions of future suggestions for improvements in chatbots within cancer care in no particular order of importance:

  • Patients with cancer may feel vulnerable or fear discrimination from employers or society [ 115 ]. Security of sensitive information must be held to the highest standards, especially when personal health information is shared between providers and hospital systems.
  • An increasing number of patients are bringing internet-based information to consultations that are not critically assessed for trustworthiness or credibility. If used correctly, the additional health information could enhance understanding, improve the ability to manage their conditions, and increase confidence during interaction with physicians [ 116 ]. Unfortunately, this is often not the case, and most patients are not adequately informed regarding the proper screening of information. Ways to address this challenge include promoting awareness and developing patient management guidelines. Chatbots also have the potential to become a key player in their ability to screen for credible information. They could help vulnerable individuals critically navigate web-based cancer information, especially for the older or more chronic populations that tend to be less technologically adept.
  • Current applications of chatbots as computerized decision support systems for diagnosis and treatment are relatively limited. The targeted audience for most has been for patients’ use, and few are designed to aid physicians at the point of care. Medical Sieve and Watson for Oncology are the only chatbots found in our search that are designed specifically for clinicians. There are far more AI tools in the market to help with clinical decision-making without the ability to interact with users [ 117 ]. With the rapid data collection from electronic health records, real-time predictions, and links to clinical recommendations, adding chatbot functionalities to current decision aids will only improve patient-centered care and streamline the workflow for clinicians.
  • More concrete evidence of high quality and accuracy across a broad range of conditions and populations entails more representative training data reflecting racial biases and developing peer-reviewed algorithms to reduce the Black Box problem.
  • Integration into the health care system, particularly with telemedicine, for seamless delivery from the beginning to the end does not mean replacing in-person care but rather complementing the health care workflow to ensure patients receive continuity and coordination of care.
  • Reimbursement of chatbot services to physicians who decide to implement this technology into their practice will likely increase adoption rates. Organizations and health providers will likely profit because chatbots allow for a more efficient and reduced cost of delivery.
  • Continual training of chatbots as new knowledge is uncovered, such as symptom patterns or standard of care, is needed.
  • As the Vik study found that users were more likely to respond to multiple-choice questions over open-ended ones [ 38 ], chatbot developers should move toward the choice with higher response rates. Studies, surveys, and focus groups should continue to be conducted to determine the best ways to converse with users.
  • Universal adoption of various technical features, such as training with additional languages, image recognition, voice recognition, user feedback to improve services according to needs, access on multiple common platforms, and reacting to unexpected responses, need to be considered.

The ability to accurately measure performance is critical for continuous feedback and improvement of chatbots, especially the high standards and vulnerable individuals served in health care. Given that the introduction of chatbots to cancer care is relatively recent, rigorous evidence-based research is lacking. Standardized indicators of success between users and chatbots need to be implemented by regulatory agencies before adoption. Once the primary purpose is defined, common quality indicators to consider are the success rate of a given action, nonresponse rate, comprehension quality, response accuracy, retention or adoption rates, engagement, and satisfaction level. The ultimate goal is to assess whether chatbots positively affect and address the 3 aims of health care. Regular quality checks are especially critical for chatbots acting as decision aids because they can have a major impact on patients’ health outcomes.

Review Limitations

The systematic literature review and chatbot database search includes a few limitations. The literature review and chatbot search were all conducted by a single reviewer, which could have potentially introduced bias and limited findings. In addition, our review explored a broad range of health care topics, and some areas could have been elaborated upon and explored more deeply. Furthermore, only a limited number of studies were included for each subtopic of chatbots for oncology apps because of the scarcity of studies addressing this topic. Future studies should consider refining the search strategy to identify other potentially relevant sources that may have been overlooked and assign multiple reviews to limit individual bias.

As illustrated in this review, these chatbots’ potential in cancer diagnostics and treatment, patient monitoring and support, clinical workflow efficiency, and health promotion have yet to be fully explored. Numerous risks and challenges will continue to arise that require careful navigation with the rapid advancements in chatbots. Consequently, weighing the gains versus threats with a critical eye is imperative. Even after laying down the proper foundations for using chatbots safely and effectively, the human element in the practice of medicine is irreplaceable and will always be present. Health care professionals have the responsibility of understanding both the benefits and risks associated with chatbots and, in turn, educating their patients.

Acknowledgments

This work was supported by a Canadian Institutes of Health Research Planning and Dissemination Grant—Institute Community Support under grant number CIHR PCS-168296.

Abbreviations

Conflicts of Interest: None declared.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • 18 January 2023

ChatGPT listed as author on research papers: many scientists disapprove

  • Chris Stokel-Walker 0

Chris Stokel-Walker is a freelance journalist in Newcastle, UK.

You can also search for this author in PubMed   Google Scholar

The artificial-intelligence (AI) chatbot ChatGPT that has taken the world by storm has made its formal debut in the scientific literature — racking up at least four authorship credits on published papers and preprints.

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

Nature 613 , 620-621 (2023)

doi: https://doi.org/10.1038/d41586-023-00107-z

Kung, T. H. et al. Preprint at medRxiv https://doi.org/10.1101/2022.12.19.22283643 (2022).

O’Connor, S. & ChatGPT Nurse Educ. Pract. 66 , 103537 (2023).

Article   PubMed   Google Scholar  

ChatGPT & Zhavoronkov, A. Oncoscience 9 , 82–84 (2022).

GPT, Osmanovic Thunström, A. & Steingrimsson, S. Preprint at HAL https://hal.science/hal-03701250 (2022).

Download references

Reprints and permissions

Related Articles

chatbot research paper

Are ChatGPT and AlphaCode going to replace programmers?

Lethal AI weapons are here: how can we control them?

Lethal AI weapons are here: how can we control them?

News Feature 23 APR 24

Do insects have an inner life? Animal consciousness needs a rethink

Do insects have an inner life? Animal consciousness needs a rethink

News 19 APR 24

AI-fuelled election campaigns are here — where are the rules?

AI-fuelled election campaigns are here — where are the rules?

World View 09 APR 24

Algorithm ranks peer reviewers by reputation — but critics warn of bias

Algorithm ranks peer reviewers by reputation — but critics warn of bias

Nature Index 25 APR 24

Retractions are part of science, but misconduct isn’t — lessons from a superconductivity lab

Retractions are part of science, but misconduct isn’t — lessons from a superconductivity lab

Editorial 24 APR 24

Londoners see what a scientist looks like up close in 50 photographs

Londoners see what a scientist looks like up close in 50 photographs

Career News 18 APR 24

Postdoctoral Associate- Computational Spatial Biology

Houston, Texas (US)

Baylor College of Medicine (BCM)

chatbot research paper

Staff Scientist - Genetics and Genomics

Technician - senior technician in cell and molecular biology.

APPLICATION CLOSING DATE: 24.05.2024 Human Technopole (HT) is a distinguished life science research institute founded and supported by the Italian ...

Human Technopole

chatbot research paper

Postdoctoral Fellow

The Dubal Laboratory of Neuroscience and Aging at the University of California, San Francisco (UCSF) seeks postdoctoral fellows to investigate the ...

San Francisco, California

University of California, San Francsico

chatbot research paper

Postdoctoral Associate

chatbot research paper

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Advertisement

Advertisement

Interacting with educational chatbots: A systematic review

  • Open access
  • Published: 09 July 2022
  • Volume 28 , pages 973–1018, ( 2023 )

Cite this article

You have full access to this open access article

chatbot research paper

  • Mohammad Amin Kuhail 1 ,
  • Nazik Alturki 2 ,
  • Salwa Alramlawi 3 &
  • Kholood Alhejori 4  

49k Accesses

123 Citations

13 Altmetric

Explore all metrics

Chatbots hold the promise of revolutionizing education by engaging learners, personalizing learning activities, supporting educators, and developing deep insight into learners’ behavior. However, there is a lack of studies that analyze the recent evidence-based chatbot-learner interaction design techniques applied in education. This study presents a systematic review of 36 papers to understand, compare, and reflect on recent attempts to utilize chatbots in education using seven dimensions: educational field, platform, design principles, the role of chatbots, interaction styles, evidence, and limitations. The results show that the chatbots were mainly designed on a web platform to teach computer science, language, general education, and a few other fields such as engineering and mathematics. Further, more than half of the chatbots were used as teaching agents, while more than a third were peer agents. Most of the chatbots used a predetermined conversational path, and more than a quarter utilized a personalized learning approach that catered to students’ learning needs, while other chatbots used experiential and collaborative learning besides other design principles. Moreover, more than a third of the chatbots were evaluated with experiments, and the results primarily point to improved learning and subjective satisfaction. Challenges and limitations include inadequate or insufficient dataset training and a lack of reliance on usability heuristics. Future studies should explore the effect of chatbot personality and localization on subjective satisfaction and learning effectiveness.

Similar content being viewed by others

chatbot research paper

Conversational agent-based guidance: examining the effect of chatbot usage frequency and satisfaction on visual design self-efficacy, engagement, satisfaction, and learner autonomy

chatbot research paper

Integrating chatbots in education: insights from the Chatbot-Human Interaction Satisfaction Model (CHISM)

chatbot research paper

What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education

Avoid common mistakes on your manuscript.

1 Introduction

Chatbots, also known as conversational agents, enable the interaction of humans with computers through natural language, by applying the technology of natural language processing (NLP) (Bradeško & Mladenić, 2012 ). Due to their ability to emulate human conversations and thus automate services and reduce effort, chatbots are increasingly becoming popular in several domains, including healthcare (Oh et al., 2017 ), consumer services (Xu et al., 2017 ), education (Anghelescu & Nicolaescu, 2018 ), and academic advising (Alkhoori et al., 2020 ). In fact, the size of the chatbot market worldwide is expected to be 1.23 billion dollars in 2025 (Kaczorowska-Spychalska, 2019 ). In the US alone, the chatbot industry was valued at 113 million US dollars and is expected to reach 994.5 million US dollars in 2024 Footnote 1 .

The adoption of educational chatbots is on the rise due to their ability to provide a cost-effective method to engage students and provide a personalized learning experience (Benotti et al., 2018 ). Chatbot adoption is especially crucial in online classes that include many students where individual support from educators to students is challenging (Winkler & Söllner, 2018 ). Chatbots can facilitate learning within the educational context, for instance by instantaneously providing students with course content (Cunningham-Nelson et al., 2019 ), assignments (Ismail & Ade-Ibijola, 2019 ), rehearsal questions (Sinha et al., 2020 ), and study resources (Mabunda, 2020 ). Moreover, chatbots may interact with students individually (Hobert & Meyer von Wolff, 2019 ) or support collaborative learning activities (Chaudhuri et al., 2009 ; Tegos et al., 2014 ; Kumar & Rose, 2010 ; Stahl, 2006 ; Walker et al., 2011 ). Chatbot interaction is achieved by applying text, speech, graphics, haptics, gestures, and other modes of communication to assist learners in performing educational tasks.

Existing literature review studies attempted to summarize current efforts to apply chatbot technology in education. For example, Winkler and Söllner ( 2018 ) focused on chatbots used for improving learning outcomes. On the other hand, Cunningham-Nelson et al. ( 2019 ) discussed how chatbots could be applied to enhance the student’s learning experience. The study by Pérez et al. ( 2020 ) reviewed the existing types of educational chatbots and the learning results expected from them. Smutny and Schreiberova ( 2020 ) examined chatbots as a learning aid for Facebook Messenger. Thomas ( 2020 ) discussed the benefits of educational chatbots for learners and educators, showing that the chatbots are successful educational tools, and their benefits outweigh the shortcomings and offer a more effective educational experience. Okonkwo and Ade-Ibijola ( 2021 ) analyzed the main benefits and challenges of implementing chatbots in an educational setting.

The existing review studies contributed to the literature, albeit their main emphasis was using chatbots for improving the learning experience and outcomes (Winkler & Söllner, 2018 ; Cunningham-Nelson et al., 2019 ; Smutny & Schreiberova, 2020 ; Thomas, 2020 ), identifying the types of educational chatbots (Pérez et al., 2020 ), and determining the benefits, and challenges of implementing educational chatbots (Okonkwo & Ade-Ibijola, 2021 ). Nonetheless, the existing review studies have not concentrated on the chatbot interaction type and style, the principles used to design the chatbots, and the evidence for using chatbots in an educational setting.

Given the magnitude of research on educational chatbots, there is a need for a systematic literature review that sheds light on several vital dimensions: field of application, platform, role in education, interaction style, design principles, empirical evidence, and limitations.

By systematically analyzing 36 articles presenting educational chatbots representing various interaction styles and design approaches, this study contributes: (1) an in-depth analysis of the learner-chatbot interaction approaches and styles currently used to improve the learning process, (2) a characterization of the design principles used for the development of educational chatbots, (3) an in-depth explanation of the empirical evidence used to back up the validity of the chatbots, and (4) the discussion of current challenges and future research directions specific to educational chatbots. This study will help the education and human-computer interaction community aiming at designing and evaluating educational chatbots. Potential future chatbots might adopt some ideas from the chatbots surveyed in this study while addressing the discussed challenges and considering the suggested future research directions. This study is structured as follows: In Section  2 , we present background information about chatbots, while Section  3 discusses the related work. Section  4 explains the applied methodology, while Section  5 presents the study’s findings. Section  6 presents the discussion and future research directions. Finally, we present the conclusion and the study’s limitations in Section  7 .

2 Background

Chatbots have existed for more than half a century. Prominent examples include ELIZA, ALICE, and SmarterChild. ELIZA, the first chatbot, was developed by Weizenbaum ( 1966 ). The chatbot used pattern matching to emulate a psychotherapist conversing with a human patient. ALICE was a chatbot developed in the mid-1990s. It used Artificial Intelligence Markup Language (AIML) to identify an accurate response to user input using knowledge records (AbuShawar and Atwell, 2015 ). Another example is Smart Child (Chukhno et al., 2019 ), which preceded today’s modern virtual chatbot-based assistants such as Alexa Footnote 2 and Siri Footnote 3 , which are available on messaging applications with the ability to emulate conversations with quick data access to services.

Chatbots have been utilized in education as conversational pedagogical agents since the early 1970s (Laurillard, 2013 ). Pedagogical agents, also known as intelligent tutoring systems, are virtual characters that guide users in learning environments (Seel, 2011 ). Conversational Pedagogical Agents (CPA) are a subgroup of pedagogical agents. They are characterized by engaging learners in a dialog-based conversation using AI (Gulz et al., 2011 ). The design of CPAs must consider social, emotional, cognitive, and pedagogical aspects (Gulz et al., 2011 ; King, 2002 ).

A conversational agent can hold a discussion with students in a variety of ways, ranging from spoken (Wik & Hjalmarsson, 2009 ) to text-based (Chaudhuri et al., 2009 ) to nonverbal (Wik & Hjalmarsson, 2009 ; Ruttkay & Pelachaud, 2006 ). Similarly, the agent’s visual appearance can be human-like or cartoonish, static or animated, two-dimensional or three-dimensional (Dehn & Van Mulken, 2000 ). Conversational agents have been developed over the last decade to serve a variety of pedagogical roles, such as tutors, coaches, and learning companions (Haake & Gulz, 2009 ). Furthermore, conversational agents have been used to meet a variety of educational needs such as question-answering (Feng et al., 2006 ), tutoring (Heffernan & Croteau, 2004 ; VanLehn et al., 2007 ), and language learning (Heffernan & Croteau, 2004 ; VanLehn et al., 2007 ).

When interacting with students, chatbots have taken various roles such as teaching agents, peer agents, teachable agents, and motivational agents (Chhibber & Law, 2019 ; Baylor, 2011 ; Kerry et al., 2008 ). Teaching agents play the role of human teachers and can present instructions, illustrate examples, ask questions (Wambsganss et al., 2020 ), and provide immediate feedback (Kulik & Fletcher, 2016 ). On the other hand, peer agents serve as learning mates for students to encourage peer-to-peer interactions. The agent of this approach is less knowledgeable than the teaching agent. Nevertheless, peer agents can still guide the students along a learning path. Students typically initiate the conversation with peer agents to look up certain definitions or ask for an explanation of a specific topic. Peer agents can also scaffold an educational conversation with other human peers.

Students can teach teachable agents to facilitate gradual learning. In this approach, the agent acts as a novice and asks students to guide them along a learning route. Rather than directly contributing to the learning process, motivational agents serve as companions to students and encourage positive behavior and learning (Baylor, 2011 ). An agent could serve as a teaching or peer agent and a motivational one.

Concerning their interaction style, the conversation with chatbots can be chatbot or user-driven (Følstad et al., 2018 ). Chatbot-driven conversations are scripted and best represented as linear flows with a limited number of branches that rely upon acceptable user answers (Budiu, 2018 ). Such chatbots are typically programmed with if-else rules. When the user provides answers compatible with the flow, the interaction feels smooth. However, problems occur when users deviate from the scripted flow.

User-driven conversations are powered by AI and thus allow for a flexible dialogue as the user chooses the types of questions they ask and thus can deviate from the chatbot’s script. There are one-way and two-way user-driven chatbots. One-way user-driven chatbots use machine learning to understand what the user is saying (Dutta, 2017 ), and the responses are selected from a set of premade answers. In contrast, two-way user-driven chatbots build accurate answers word by word to users (Winkler & Söllner, 2018 ). Such chatbots can learn from previous user input in similar contexts (De Angeli & Brahnam, 2008 ).

In terms of the medium of interaction, chatbots can be text-based, voice-based, and embodied. Text-based agents allow users to interact by simply typing via a keyboard, whereas voice-based agents allow talking via a mic. Voice-based chatbots are more accessible to older adults and some special-need people (Brewer et al., 2018 ). An embodied chatbot has a physical body, usually in the form of a human, or a cartoon animal (Serenko et al., 2007 ), allowing them to exhibit facial expressions and emotions.

Concerning the platform, chatbots can be deployed via messaging apps such as Telegram, Facebook Messenger, and Slack (Car et al., 2020 ), standalone web or phone applications, or integrated into smart devices such as television sets.

3 Related work

Recently several studies reviewed chatbots in education. The studies examined various areas of interest concerning educational chatbots, such as the field of application (Smutny & Schreiberova, 2020 ; Wollny et al., 2021 ; Hwang & Chang, 2021 ), objectives and learning experience (Winkler & Söllner, 2018 ; Cunningham-Nelson et al., 2019 ; Pérez et al., 2020 ; Wollny et al., 2021 ; Hwang & Chang, 2021 ), how chatbots are applied (Winkler & Söllner, 2018 ; Cunningham-Nelson et al., 2019 ; Wollny et al., 2021 ), design approaches (Winkler & Söllner, 2018 ; Martha & Santoso, 2019 ; Hwang & Chang, 2021 ), the technology used (Pérez et al., 2020 ), evaluation methods used (Pérez et al., 2020 ; Hwang & Chang, 2021 ; Hobert & Meyer von Wolff, 2019 ), and challenges in using educational chatbots (Okonkwo & Ade-Ibijola, 2021 ). Table  1 summarizes the areas that the studies explored.

Winkler and Söllner ( 2018 ) reviewed 80 articles to analyze recent trends in educational chatbots. The authors found that chatbots are used for health and well-being advocacy, language learning, and self-advocacy. Chatbots are either flow-based or powered by AI, concerning approaches to their designs.

Several studies have found that educational chatbots improve students’ learning experience. For instance, Okonkwo and Ade-Ibijola ( 2021 ) found out that chatbots motivate students, keep them engaged, and grant them immediate assistance, particularly online. Additionally, Wollny et al. ( 2021 ) argued that educational chatbots make education more available and easily accessible.

Concerning how they are applied, Cunningham-Nelson et al. ( 2019 ) identified two main applications: answering frequently-asked questions (FAQ) and performing short quizzes, while Wollny et al. ( 2021 ) listed three other applications, including scaffolding, activity recommendations, and informing them about activities.

In terms of the design of educational chatbots, Martha and Santoso ( 2019 ) found out that the role and appearance of the chatbot are crucial elements in designing the educational chatbots, while Winkler and Söllner ( 2018 ) identified various types of approaches to designing educational chatbots such as flow and AI-based, in addition to chatbots with speech recognition capabilities.

Pérez et al. ( 2020 ) identified various technologies used to implement chatbots such as Dialogflow Footnote 4 , FreeLing (Padró and Stanilovsky, 2012 ), and ChatFuel Footnote 5 . The study investigated the effect of the technologies used on performance and quality of chatbots.

Hobert and Meyer von Wolff ( 2019 ), Pérez et al. ( 2020 ), and Hwang and Chang ( 2021 ) examined the evaluation methods used to assess the effectiveness of educational chatbots. The authors identified that several evaluation methods such as surveys, experiments, and evaluation studies measure acceptance, motivation, and usability.

Okonkwo and Ade-Ibijola ( 2021 ) discussed challenges and limitations of chatbots including ethical, programming, and maintenance issues.

Although these review studies have contributed to the literature, they primarily focused on chatbots as a learning aid and thus how they can be used to improve educational objectives. Table  2 compares this study and the related studies in terms of the seven dimensions that this study focuses on: field of application, platform, educational role, interaction style, design principles, evaluation, and limitations.

Only four studies (Hwang & Chang, 2021 ; Wollny et al., 2021 ; Smutny & Schreiberova, 2020 ; Winkler & Söllner, 2018 ) examined the field of application. None of the studies discussed the platforms on which the chatbots run, while only one study (Wollny et al., 2021 ) analyzed the educational roles the chatbots are playing. The study used “teaching,” “assisting,” and “mentoring” as categories for educational roles. This study, however, uses different classifications (e.g., “teaching agent”, “peer agent”, “motivational agent”) supported by the literature in Chhibber and Law ( 2019 ), Baylor ( 2011 ), and Kerlyl et al. ( 2006 ). Other studies such as (Okonkwo and Ade-Ibijola, 2021 ; Pérez et al., 2020 ) partially covered this dimension by mentioning that chatbots can be teaching or service-oriented.

Only two articles partially addressed the interaction styles of chatbots. For instance, Winkler and Söllner ( 2018 ) classified the chatbots as flow or AI-based, while Cunningham-Nelson et al. ( 2019 ) categorized the chatbots as machine-learning-based or dataset-based. In this study, we carefully look at the interaction style in terms of who is in control of the conversation, i.e., the chatbot or the user. As such, we classify the interactions as either chatbot or user-driven.

Only a few studies partially tackled the principles guiding the design of the chatbots. For instance, Martha and Santoso ( 2019 ) discussed one aspect of the design (the chatbot’s visual appearance). This study focuses on the conceptual principles that led to the chatbot’s design.

In terms of the evaluation methods used to establish the validity of the articles, two related studies (Pérez et al., 2020 ; Smutny & Schreiberova, 2020 ) discussed the evaluation methods in some detail. However, this study contributes more comprehensive evaluation details such as the number of participants, statistical values, findings, etc.

Regarding limitations, Pérez et al. ( 2020 ) examined the technological limitations that have an effect on the quality of the educational chatbots, while Okonkwo and Ade-Ibijola ( 2021 ) presented some challenges and limitations facing educational chatbots such as ethical, technical, and maintenance matters. While the identified limitations are relevant, this study identifies limitations from other perspectives such as the design of the chatbots and the student experience with the educational chatbots. To sum up, Table  2 shows some gaps that this study aims at bridging to reflect on educational chatbots in the literature.

4 Methodology

The literature related to chatbots in education was analyzed, providing a background for new approaches and methods, and identifying directions for further research. This study follows the guidelines described by Keele et al. ( 2007 ). The process includes these main steps: (1) defining the review protocol, including the research questions, how to answer them, search strategy, and inclusion and exclusion criteria. (2) running the study by selecting the articles, assessing their quality, and synthesizing the results. (3) reporting the findings.

4.1 Research questions

Based on the shortcomings of the existing related literature review studies, we formulated seven main research questions:

- In what fields are the educational chatbots used?

- What platforms do the educational chatbots operate on?

- What role do the educational chatbots play when interacting with students?

- What are the interaction styles supported by the educational chatbots?

- What are the principles used to guide the design of the educational chatbots?

- What empirical evidence exists to support the validity of the educational chatbots?

- What are the challenges of applying and using the chatbots in the classroom?

The first question identifies the fields of the proposed educational chatbots, while the second question presents the platforms the chatbots operate on, such as web or phone-based platforms. The third question discusses the roles chatbots play when interacting with students. For instance, chatbots could be used as teaching or peer agents. The fourth question sheds light on the interaction styles used in the chatbots, such as flow-based or AI-powered. The fifth question addresses the principles used to design the proposed chatbots. Examples of such principles could be collaborative and personalized learning. The sixth question focuses on the evaluation methods used to prove the effectiveness of the proposed chatbots. Finally, the seventh question discusses the challenges and limitations of the works behind the proposed chatbots and potential solutions to such challenges.

4.2 Search process

The search process was conducted during the period (2011 - 2021) in the following databases: ACM Digital Library, Scopus, IEEE Xplore, and SpringerLink. We analyzed our research questions, objectives, and related existing literature review studies to identify keywords for the search string of this study. Subsequently, we executed and refined the keywords and the search string iteratively until we arrived at promising results. We used these search keywords: “Chatbot” and “Education.” Correlated keywords for “Chatbot” are “Conversational Agent” and “Pedagogical Agent.” Further, correlated keywords for “Education” are ”Learning,” “Learner,” “Teaching,” “Teacher,” and “Student.”

The search string was defined using the Boolean operators as follows:

(‘Chatbot’ OR ‘Conversational Agent’ OR ‘Pedagogical Agent’) AND (‘Education’ OR ‘Learning’ OR ‘Learner’ OR ‘Teaching’ OR ‘Teacher’ OR ‘Student’)

According to their relevance to our research questions, we evaluated the found articles using the inclusion and exclusion criteria provided in Table  3 . The inclusion and exclusion criteria allowed us to reduce the number of articles unrelated to our research questions. Further, we excluded tutorials, technical reports, posters, and Ph.D. thesis since they are not peer-reviewed.

After defining the criteria, our search query was performed in the selected databases to begin the inclusion and exclusion process. Initially, the total of studies resulting from the databases was 1208 studies. The metadata of the studies containing; title, abstract, type of article (conference, journal, short paper), language, and keywords were extracted in a file format (e.g., bib file format). Subsequently, it was imported into the Rayyan tool Footnote 6 , which allowed for reviewing, including, excluding, and filtering the articles collaboratively by the authors.

The four authors were involved in the process of selecting the articles. To maintain consistency amongst our decisions and inter-rater reliability, the authors worked in two pairs allowing each author to cross-check the selection and elimination of the author they were paired with. The process of selecting the articles was carried out in these stages:

Reading the articles’ metadata and applying the inclusion criteria of IC-1 and the exclusion criteria of EC-1. As a result, the number of studies was reduced to 1101.

As a first-round, we applied the inclusion criterion IC-2 by reading the studies’ title, abstract, and keywords. Additionally, the EC-2 exclusion criterion was applied in the same stage. As a result, only 197 studies remained.

In this stage, we eliminated the articles that were not relevant to any of our research questions and applied the EC-3 criteria. As a result, the articles were reduced to 71 papers.

Finally, we carefully read the entire content of the articles having in mind IC-3. Additionally, we excluded studies that had no or little empirical evidence for their effectiveness of the educational chatbot (EC-4 criterion). As a result, the articles were reduced to 36 papers.

Figure  1 . shows the flowchart of the selecting processes, in which the final stage of the selection resulted in 36 papers.

figure 1

Flowchart of the process of the selection of the studies

Figure  2 shows the number and types of articles plotted against time. 63.88% (23) of the selected articles are conference papers, while 36.11% (13) were published in journals. Most conference papers were published after 2017. Interestingly, 38.46% (5) of the journal articles were published recently in 2020. Concerning the publication venues, two journal articles were published in the Journal of IEEE Transactions on Learning Technologies (TLT), which covers various topics such as innovative online learning systems, intelligent tutors, educational software applications and games, and simulation systems for education. Intriguingly, one article was published in Computers in Human Behavior journal. The remaining journal articles were published in several venues such as IEEE Transactions on Affective Computing, Journal of Educational Psychology, International Journal of Human-Computer Studies, ACM Transactions on Interactive Intelligent System. Most of these journals are ranked Q1 or Q2 according to Scimago Journal and Country Rank Footnote 7 .

figure 2

A timeline of the selected studies

Figure  3 . shows the geographical mapping of the selected articles. The total sum of the articles per country in Fig.  3 is more than 36 (the number of selected articles) as the authors of a single article could work in institutions located in different countries. The vast majority of selected articles were written or co-written by researchers from American universities. However, the research that emerged from all European universities combined was the highest in the number of articles (19 articles). Asian universities have contributed 10 articles, while American universities contributed 9 articles. Further, South American universities have published 5 articles. Finally, universities from Africa and Australia contributed 4 articles (2 articles each).

figure 3

A geographical mapping of the selected articles

5.1 RQ1: What fields are the educational chatbots used in?

Recently, chatbots have been utilized in various fields (Ramesh et al., 2017 ). Most importantly, chatbots played a critical role in the education field, in which most researchers (12 articles; 33.33%) developed chatbots used to teach computer science topics (Fig.  4 ). For example, some chatbots were used as tutors for teaching programming languages such as Java (Coronado et al., 2018 ; Daud et al., 2020 ) and Python (Winkler et al., 2020 ), while other researchers proposed educational chatbots for computer networks (Clarizia et al., 2018 ; Lee et al., 2020 ), databases (Latham et al., 2011 ; Ondáš et al., 2019 ), and compilers (Griol et al., 2011 ).

figure 4

The fields of the chatbots in the selected articles

Table  4 . shows that ten (27.77%) articles presented general-purpose educational chatbots that were used in various educational contexts such as online courses (Song et al., 2017 ; Benedetto & Cremonesi, 2019 ; Tegos et al., 2020 ). The approach authors use often relies on a general knowledge base not tied to a specific field.

In comparison, chatbots used to teach languages received less attention from the community (6 articles; 16.66%;). Interestingly, researchers used a variety of interactive media such as voice (Ayedoun et al., 2017 ; Ruan et al., 2021 ), video (Griol et al., 2014 ), and speech recognition (Ayedoun et al., 2017 ; Ruan et al., 2019 ).

A few other subjects were targeted by the educational chatbots, such as engineering (Mendez et al., 2020 ), religious education (Alobaidi et al., 2013 ), psychology (Hayashi, 2013 ), and mathematics (Rodrigo et al., 2012 ).

5.2 RQ2: What platforms do the proposed chatbots operate on?

Table  5 . shows an overview of the platforms the educational chatbots operate on. Most researchers (25 articles; 69.44%) developed chatbots that operate on the web (Fig.  5 ). The web-based chatbots were created for various educational purposes. For example, KEMTbot (Ondáš et al., 2019 ) is a chatbot system that provides information about the department, its staff, and their offices. Other chatbots acted as intelligent tutoring systems, such as Oscar (Latham et al., 2011 ), used for teaching computer science topics. Moreover, other web-based chatbots such as EnglishBot (Ruan et al., 2021 ) help students learn a foreign language.

figure 5

The platforms of the chatbots in the selected articles

Six (16.66%) articles presented educational chatbots that exclusively operate on a mobile platform (e.g., phone, tablet). The articles were published recently in 2019 and 2020. The mobile-based chatbots were used for various purposes. Examples include Rexy (Benedetto & Cremonesi, 2019 ), which helps students enroll in courses, shows exam results, and gives feedback. Another example is the E-Java Chatbot (Daud et al., 2020 ), a virtual tutor that teaches the Java programming language.

Five articles (13.88%) presented desktop-based chatbots, which were utilized for various purposes. For example, one chatbot focused on the students’ learning styles and personality features (Redondo-Hernández & Pérez-Marín, 2011 ). As another example, the SimStudent chatbot is a teachable agent that students can teach (Matsuda et al., 2013 ).

In general, most desktop-based chatbots were built in or before 2013, probably because desktop-based systems are cumbersome to modern users as they must be downloaded and installed, need frequent updates, and are dependent on operating systems. Unsurprisingly, most chatbots were web-based, probably because the web-based applications are operating system independent, do not require downloading, installing, or updating. Mobile-based chatbots are on the rise. This can be explained by users increasingly desiring mobile applications. According to an App Annie report, users spent 120 billion dollars on application stores Footnote 8 .

5.3 RQ3 - What role do the educational chatbots play when interacting with students?

Chatbots have been found to play various roles in educational contexts, which can be divided into four roles (teaching agents, peer agents, teachable agents, and peer agents), with varying degrees of success (Table  6 , Fig.  6 ). Exceptionally, a chatbot found in (D’mello & Graesser, 2013 ) is both a teaching and motivational agent.

By far, the majority (20; 55.55%) of the presented chatbots play the role of a teaching agent, while 13 studies (36.11%) discussed chatbots that are peer agents. Only two studies used chatbots as teachable agents, and two studies used them as motivational agents.

Teaching agents

The teaching agents presented in the different studies used various approaches. For instance, some teaching agents recommended tutorials to students based upon learning styles (Redondo-Hernández & Pérez-Marín, 2011 ), students’ historical learning (Coronado et al., 2018 ), and pattern matching (Ondáš et al., 2019 ). In some cases, the teaching agent started the conversation by asking the students to watch educational videos (Qin et al., 2020 ) followed by a discussion about the videos. In other cases, the teaching agent started the conversation by asking students to reflect on past learning (Song et al., 2017 ). Other studies discussed a scenario-based approach to teaching with teaching agents (Latham et al., 2011 ; D’mello & Graesser, 2013 ). The teaching agent simply mimics a tutor by presenting scenarios to be discussed with students. In other studies, the teaching agent emulates a teacher conducting a formative assessment by evaluating students’ knowledge with multiple-choice questions (Rodrigo et al., 2012 ; Griol et al., 2014 ; Mellado-Silva et al., 2020 ; Wambsganss et al., 2020 ).

Moreover, it has been found that teaching agents use various techniques to engage students. For instance, some teaching agents engage students with a discussion in a storytelling style (Alobaidi et al., 2013 ; Ruan et al., 2019 ), whereas other chatbots engage students with effective channeling, with empathetic phrases as “uha” to show interest (Ayedoun et al., 2017 ). Other teaching agents provide adaptive feedback (Wambsganss et al., 2021 ).

Peer agents

Most peer agent chatbots allowed students to ask for specific help on demand. For instance, the chatbots discussed in (Clarizia et al., 2018 ; Lee et al., 2020 ) allowed students to look up specific terms or concepts, while the peer agents in (Verleger & Pembridge, 2018 ; da Silva Oliveira et al., 2019 ; Mendez et al., 2020 ) were based on a Question and Answer (Q&A) database, and as such answered specific questions. Other peer agents provide more advanced assistance. For example, students may ask the peer agent in (Janati et al., 2020 ) how to use a particular technology (e.g., using maps in Oracle Analytics), while the peer agent described in (Tegos et al., 2015 ; Tegos et al., 2020 ; Hayashi, 2013 ) scaffolded a group discussion. Interestingly, the only peer agent that allowed for a free-style conversation was the one described in (Fryer et al., 2017 ), which could be helpful in the context of learning a language.

Teachable agents

Only two articles discussed teachable agent chatbots. In general, the followed approach with these chatbots is asking the students questions to teach students certain content. For example, the chatbot discussed in (Matsuda et al., 2013 ) presents a mathematical equation and then asks the student of each required step to gradually solve the equation, while in the work presented in (Law et al., 2020 ), students individually or in a group teach a classification task to chatbots in several topics.

Motivational agents

Two studies presented chatbots as motivational agent-based chatbots. One of them presented in (D’mello & Graesser, 2013 ) asks the students a question, then waits for the student to write an answer. Then the motivational agent reacts to the answer with varying emotions, including empathy and approval, to motivate students. Similarly, the chatbot in (Schouten et al., 2017 ) shows various reactionary emotions and motivates students with encouraging phrases such as “you have already achieved a lot today” .

figure 6

The roles of the chatbots in the selected articles

5.4 RQ4 – What are the interaction styles supported by the educational chatbots?

As shown in Table  7 and Fig.  7 , most of the articles (88.88%) used the chatbot-driven interaction style where the chatbot controls the conversation. 52.77% of the articles used flow-based chatbots where the user had to follow a specific learning path predetermined by the chatbot. Notable examples are explained in (Rodrigo et al., 2012 ; Griol et al., 2014 ), where the authors presented a chatbot that asks students questions and provides them with options to choose from. Other authors, such as (Daud et al., 2020 ), used a slightly different approach where the chatbot guides the learners to select the topic they would like to learn. Subsequently, the assessment of specific topics is presented where the user is expected to fill out values, and the chatbot responds with feedback. The level of the assessment becomes more challenging as the student makes progress. A slightly different interaction is explained in (Winkler et al., 2020 ), where the chatbot challenges the students with a question. If they answer incorrectly, they are explained why the answer is incorrect and then get asked a scaffolding question.

The remaining articles (13 articles; 36.11%) present chatbot-driven chatbots that used an intent-based approach. The idea is the chatbot matches what the user says with a premade response. The matching could be done using pattern matching as discussed in (Benotti et al., 2017 ; Clarizia et al., 2018 ) or simply by relying on a specific conversational tool such as Dialogflow Footnote 9 as in (Mendez et al., 2020 ; Lee et al., 2020 ; Ondáš et al., 2019 ).

Only four (11.11%) articles used chatbots that engage in user-driven conversations where the user controls the conversation and the chatbot does not have a premade response. For example, the authors in (Fryer et al., 2017 ) used Cleverbot, a chatbot designed to learn from its past conversations with humans. The authors used Cleverbot for foreign language education. User-driven chatbots fit language learning as students may benefit from an unguided conversation. The authors in (Ruan et al., 2021 ) used a similar approach where students freely speak a foreign language. The chatbot assesses the quality of the transcribed text and provides constructive feedback. In comparison, the authors in (Tegos et al., 2020 ) rely on a slightly different approach where the students chat together about a specific programming concept. The chatbot intervenes to evoke curiosity or draw students’ attention to an interesting, related idea.

figure 7

The interaction styles of the chatbots in the selected articles

5.5 RQ5 – What are the principles used to guide the design of the educational chatbots?

Various design principles, including pedagogical ones, have been used in the selected studies (Table  8 , Fig.  8 ). We discuss examples of how each of the principles was applied.

figure 8

The principles used to design the chatbots

Personalized Learning The ability to tailor chatbots to the individual user may help meet students’ needs (Clarizia et al., 2018 ). Many studies claim that students learn better when the chatbot is represented by a personalized method rather than a non-personalized one (Kester et al., 2005 ). From our selected studies, ten (27.77%) studies have applied personalized learning principles. For instance, the study in (Coronado et al., 2018 ) designed a chatbot to teach Java. The students’ learning process is monitored by collecting information on all interactions between the students and the chatbot. Thus, direct and customized instruction and feedback are provided to students. Another notable example can be found in (Latham et al., 2011 ), where students were given a learning path designed to their learning styles. With this approach, the students received 12% more accurate answers than those given chatbots without personalized learning materials. Moreover, other articles, such as the one mentioned in (Villegas-Ch et al., 2020 ), used AI for activity recommendation, depending on each student’s needs and learning paths. The chatbot evaluates and identifies students’ weaknesses and allows the AI model to be used in personalized learning.

Experiential Learning Experiential learning utilizes reflection on experience and encourages individuals to gain and construct knowledge by interacting with their environment through a set of perceived experiences (Felicia, 2011 ). Reflection on experience is the most important educational activity for developing comprehension skills and constructing knowledge. It is primarily based on the individual’s experience. Song et al. ( 2017 ) describe reflection as an intellectual activity that supports the course’s weekly reflection for online learners. The chatbot asks questions to help students reflect and construct their knowledge. D’mello and Graesser ( 2013 ) have presented a constructivist view of experiential learning. The embodied chatbot mimics the conversation movements of human tutors who advise students in gradually developing explanations to problems.

Social Dialog Social dialog, also called small talk, is a chit-chat that manages social situations rather than content exchange (Klüwer, 2011 ). The advantage of incorporating social dialog in the development of conversational agents is to establish relationships with users to engage users and gain their trust. For example, the chatbot presented in (Wambsganss et al., 2021 ) uses a casual chat mode allowing students to ask the chatbot to tell jokes, fun facts, or talk about unrelated content such as the weather to take a break from the main learning activity. As another example, Qin et al. ( 2020 ) suggested the usage of various social phrases that show interest, agreement, and social presence.

Collaborative learning Collaborative learning is an approach that involves groups of learners working together to complete a task or solve a problem. Collaborative learning has been demonstrated to be beneficial in improving students’ knowledge and improving the students’ critical thinking and argumentation (Tegos et al., 2015 ). One of the techniques used to support collaborative learning is using an Animated Conversational Agent (ACA) (Zedadra et al., 2014 ). This cognitive agent considers all the pedagogical activities related to Computer-Supported Collaborative Learning (CSCL), such as learning, collaboration, and tutoring. On the other hand, the collaborative learning approach that Tegos et al. ( 2020 ) used provides an open-ended discussion, encouraging the students to work collaboratively as a pair to provide answers for a question. Before beginning the synchronous collaborative activity, the students were advised to work on a particular unit material that contained videos, quizzes, and assignments. Additionally, Tegos et al. ( 2015 ) proposed a conversational agent named MentorChat, a cloud-based CSCL, to help teachers build dialog-based collaborative activities.

Affective learning Affective learning is a form of empathetic feedback given to the learner to keep the interest, attention, or desire to learn (Ayedoun et al., 2017 ). Two articles used this form of learning. For instance, Ayedoun et al. ( 2017 ) provided various types of affective feedback depending on the situation: congratulatory, encouraging, sympathetic, and reassuring. The idea is to support learners, mainly when a problematic situation arises, to increase their learning motivation. On the other hand, to assess the learning for low-literate people, Schouten et al. ( 2017 ) built their conversation agent to categorize four basic emotions: anger, fear, sadness, and happiness. Depending on the situation, the chatbot shows students an empathetic reaction. The researchers showed that this is helpful for learners and agents to express themselves, especially in the event of difficulty.

Learning by teaching Learning by teaching is a well-known pedagogical approach that allows students to learn through the generation of explanations to others (Chase et al., 2009 ). Two studies used this pedagogical technique. The first study (Matsuda et al., 2013 ) described a chatbot that learns from students’ answers and activities. Students are supposed to act as “tutors” and provide the chatbot with examples and feedback. The second study (Law et al., 2020 ) describes a teachable agent which starts by asking students low or high-level questions about a specific topic to evoke their curiosity. The student answers the questions, and the chatbot simulates learning. The chatbot provides a variety of questions by filling a pre-defined sentence template. To confirm its learning and make the conversation interesting, the chatbot seeks feedback from students by asking questions such as, “ Am I smart? ”, “ Am I learning? ” and “ Do you think I know more now than before? ”.

Scaffolding In the educational field, scaffolding is a term describing several teaching approaches used to gradually bring students toward better comprehension and, eventually, more independence in the learning process (West et al., 2017 ). Teachers provide successive degrees of temporary support that aid students in reaching excellent comprehension and skill development levels that they would not be able to attain without assistance (Maybin et al., 1992 ). In the same way, scaffolding was used as a learning strategy in a chatbot named Sara to improve students’ learning (Winkler et al., 2020 ). The chatbot provided voice and text-based scaffolds when needed. The approach had a significant improvement during learning in programming tasks.

5.6 RQ6 – What empirical evidence is there to substantiate the effectiveness of the proposed chatbots in education?

The surveyed articles used different types of empirical evaluation to assess the effectiveness of chatbots in educational settings. In some instances, researchers combined multiple evaluation methods, possibly to strengthen the findings.

We classified the empirical evaluation methods as follows: experiment, evaluation study, questionnaire, and focus group. An experiment is a scientific test performed under controlled conditions (Cook et al., 2002 ); one factor is changed at a time, while other factors are kept constant. It is the most familiar type of evaluation. It includes a hypothesis, a variable that the researcher can manipulate, and variables that can be measured, calculated, and compared. An evaluation study is a test to provide insights into specific parameters (Payne and Payne, 2004 ). There is typically no hypothesis to prove, and the results are often not statistically significant. A questionnaire is a data collection method for evaluation that focuses on a specific set of questions (Mellenbergh & Adèr, 2008 ). These questions aim to extract information from participants’ answers. It can be carried on by mail, telephone, face-to-face interview, and online using the web or email. A focus group allows researchers to evaluate a small group or sample that represents the community (Morgan, 1996 ). The idea behind the focus group is to examine some characteristics or behaviors of a sample when it’s difficult to examine all groups.

Table  9 and Fig.  9 show the various evaluation methods used by the articles. Most articles (13; 36.11%) used an experiment to establish the validity of the used approach, while 10 articles (27.77%) used an evaluation study to validate the usefulness and usability of their approach. The remaining articles used a questionnaire (10; 27.7%) and a focus group (3; 8.22%) as their evaluation methods.

figure 9

Empirical evaluation methods applied in the selected studies

Experiments

Table  10 shows the details of the experiments the surveyed studies had used. Eight articles produced statistically significant results pointing to improved learning when using educational chatbots compared to a traditional learning setting, while a few other articles pointed to improved engagement, interest in learning, as well as subjective satisfaction.

A notable example of a conducted experiment includes the one discussed in (Wambsganss et al., 2021 ). The experiment evaluated whether adaptive tutoring implemented via the chatbot helps students write more convincing texts. The author designed two groups: a treatment group and a control group. The result showed that students using the chatbot (treatment group) to conduct a writing exercise wrote more convincing texts with a better formal argumentation quality than the traditional approach (control group). Another example is the experiment conducted by the authors in (Benotti et al., 2017 ), where the students worked on programming tasks. The experiment assessed the students’ learning by a post-test. Comparing the treatment group (students who interacted with the chatbot) with a control group (students in a traditional setting), the students in the control group have improved their learning and gained more interest in learning. Another study (Hayashi, 2013 ) evaluated the effect of text and audio-based suggestions of a chatbot used for formative assessment. The result shows that students receiving text and audio-based suggestions have improved learning.

Despite most studies showing overwhelming evidence for improved learning and engagement, one study (Fryer et al., 2017 ) found that students’ interest in communicating with the chatbot significantly declined in an 8-week longitudinal study where a chatbot was used to teach English.

Evaluation studies

In general, the studies conducting evaluation studies involved asking participants to take a test after being involved in an activity with the chatbot. The results of the evaluation studies (Table  12 ) point to various findings such as increased motivation, learning, task completeness, and high subjective satisfaction and engagement.

As an example of an evaluation study, the researchers in (Ruan et al., 2019 ) assessed students’ reactions and behavior while using ‘BookBuddy,’ a chatbot that helps students read books. The participants were five 6-year-old children. The researchers recorded the facial expressions of the participants using webcams. It turned out that the students were engaged more than half of the time while using BookBuddy.

Another interesting study was the one presented in (Law et al., 2020 ), where the authors explored how fourth and fifth-grade students interacted with a chatbot to teach it about several topics such as science and history. The students appreciated that the robot was attentive, curious, and eager to learn.

Questionnaires

Studies that used questionnaires as a form of evaluation assessed subjective satisfaction, perceived usefulness, and perceived usability, apart from one study that assessed perceived learning (Table  11 ). Assessing students’ perception of learning and usability is expected as questionnaires ultimately assess participants’ subjective opinions, and thus, they don’t objectively measure metrics such as students’ learning.

While using questionnaires as an evaluation method, the studies identified high subjective satisfaction, usefulness, and perceived usability. The questionnaires used mostly Likert scale closed-ended questions, but a few questionnaires also used open-ended questions.

A notable example of a study using questionnaires is ‘Rexy,’ a configurable educational chatbot discussed in (Benedetto & Cremonesi, 2019 ). The authors designed a questionnaire to assess Rexy. The questionnaires elicited feedback from participants and mainly evaluated the effectiveness and usefulness of learning with Rexy. The results largely point to high perceived usefulness. However, a few participants pointed out that it was sufficient for them to learn with a human partner. One student indicated a lack of trust in a chatbot.

Another example is the study presented in (Ondáš et al., 2019 ), where the authors evaluated various aspects of a chatbot used in the education process, including helpfulness, whether users wanted more features in the chatbot, and subjective satisfaction. The students found the tool helpful and efficient, albeit they wanted more features such as more information about courses and departments. About 62.5% of the students said they would use the chatbot again. In comparison, 88% of the students in (Daud et al., 2020 ) found the tool highly useful.

Focus group

Only three articles were evaluated by the focus group method. Only one study pointed to high usefulness and subjective satisfaction (Lee et al., 2020 ), while the others reported low to moderate subjective satisfaction (Table  13 ). For instance, the chatbot presented in (Lee et al., 2020 ) aims to increase learning effectiveness by allowing students to ask questions related to the course materials. The authors invited 10 undergraduate students to evaluate the chatbot. It turned out that most of the participants agreed that the chatbot is a valuable educational tool that facilitates real-time problem solving and provides a quick recap on course material. The study mentioned in (Mendez et al., 2020 ) conducted two focus groups to evaluate the efficacy of chatbot used for academic advising. While students were largely satisfied with the answers given by the chatbot, they thought it lacked personalization and the human touch of real academic advisors. Finally, the chatbot discussed by (Verleger & Pembridge, 2018 ) was built upon a Q&A database related to a programming course. Nevertheless, because the tool did not produce answers to some questions, some students decided to abandon it and instead use standard search engines to find answers.

5.7 RQ7: What are the challenges and limitations of using proposed chatbots?

Several challenges and limitations that hinder the use of chatbots were identified in the selected studies, which are summarized in Table  14 and listed as follow:

Insufficient or Inadequate Dataset Training The most recurring limitation in several studies is that the chatbots are either trained with a limited dataset or, even worse, incorrectly trained. Learners using chatbots with a limited dataset experienced difficulties learning as the chatbot could not answer their questions. As a result, they became frustrated (Winkler et al., 2020 ) and could not wholly engage in the learning process (Verleger & Pembridge, 2018 ; Qin et al., 2020 ). Another example that caused learner frustration is reported in (Qin et al., 2020 ), where the chatbot gave incorrect responses.

To combat the issues arising from inadequate training datasets, authors such as (Ruan et al., 2021 ) trained their chatbot using standard English language examination materials (e.g., IELTS and TOEFL). The evaluation suggests an improved engagement. Further, Song et al. ( 2017 ) argue that the use of Natural Language Processing (NLP) supports a more natural conversation instead of one that relies on a limited dataset and a rule-based mechanism.

User-centered design User-centered design (UCD) refers to the active involvement of users in several stages of the software cycle, including requirements gathering, iterative design, and evaluation (Dwivedi et al., 2012 ). The ultimate goal of UCD is to ensure software usability. One of the challenges mentioned in a couple of studies is the lack of student involvement in the design process (Verleger and Pembridge, 2018 ) which may have resulted in decreased engagement and motivation over time. As another example, Law et al. ( 2020 ) noted that personality traits might affect how learning with a chatbot is perceived. Thus, educators wishing to develop an educational chatbot may have to factor students’ personality traits into their design.

Losing Interest Over Time Interestingly, apart from one study, all of the reviewed articles report educational chatbots were used for a relatively short time. Fryer et al. ( 2017 ) found that students’ interest in communicating with the chatbot significantly dropped in a longitudinal study. The decline happened between the first and the second tasks suggesting a novelty effect while interacting with the chatbot. Such a decline did not happen when students were interacting with a human partner.

Lack of Feedback Feedback is a crucial element that affects learning in various environments (Hattie and Timperley, 2007 ). It draws learners’ attention to understanding gaps and supports them gain knowledge and competencies (Narciss et al., 2014 ). Moreover, feedback helps learners regulate their learning (Chou & Zou, 2020 ). Villegas-Ch et al. ( 2020 ) noted that the lack of assessments and exercises coupled with the absence of the feedback mechanism negatively affected the chatbot’s success.

Distractions Usability heuristics call for a user interface that focuses on the essential elements and does not distract users from necessary information (Inostroza et al., 2012 ). In the context of educational chatbots, this would mean that the design must focus on the essential interactions between the chatbot and the student. Qin et al. ( 2020 ) identified that external links and popups suggested by the chatbot could be distracting to students, and thus, must be used judiciously.

6 Discussion and future research directions

The purpose of this work was to conduct a systematic review of the educational chatbots to understand their fields of applications, platforms, interaction styles, design principles, empirical evidence, and limitations.

Seven general research questions were formulated in reference to the objectives.

RQ1 examined the fields the educational chatbots are used in. The results show that the surveyed chatbots were used to teach several fields. More than a third of the chatbots were developed to teach computer science topics, including programming languages and networks. Fewer chatbots targeted foreign language education, while slightly less than a third of the studies used general-purpose educational chatbots. Our findings are somewhat similar to (Wollny et al., 2021 ), and (Hwang and Chang, 2021 ), although both of the review studies reported that language learning was the most targeted educational topic, followed by computer programming. Other review studies such as (Winkler & Söllner, 2018 ) highlighted that chatbots were used to educate students on health, well-being, and self-advocacy.

RQ2 identified the platforms the educational chatbots operate on. Most surveyed chatbots are executed within web-based platforms, followed by a few chatbots running on mobile and desktop platforms. The web offers a versatile platform as multiple devices can access it, and it does not require installation. Other review studies such as (Cunningham-Nelson et al., 2019 ) and (Pérez et al., 2020 ) did not discuss the platform but mentioned the tools used to develop the chatbots. Popular tools include Dialogflow Footnote 10 , QnA Maker Footnote 11 , ChatFuel Footnote 12 . Generally, these tools allow for chatbot deployment on web and mobile platforms. Interestingly, Winkler and Söllner ( 2018 ) highlighted that mobile platforms are popular for chatbots used for medical education.

RQ3 explored the roles of the chatbots when interacting with students. More than half of the surveyed chatbots were used as teaching agents that recommended educational content to students or engaged students in a discussion on relevant topics. Our results are similar to those reported in (Smutny & Schreiberova, 2020 ) which classified most chatbots as teaching agents that recommend content, conducted formative assessments, and set learning goals.

Slightly more than a third of the surveyed chatbots acted as peer agents which helped students ask for help when needed. Such help includes term definition, FAQ (Frequently Asked Questions), and discussion scaffolding. No studies reported the use of peer agents. However, a review study (Wollny et al., 2021 ) reported that some chatbots were used for scaffolding which correlates with our findings.

Two chatbots were used as motivational agents showing empathetic and encouraging feedback as students learn. A few review studies such as (Okonkwo & Ade-Ibijola, 2021 ) and (Winkler & Söllner, 2018 ) identified that chatbots are used for motivation and engagement, but no details were given.

Finally, only two surveyed chatbots acted as teachable agents where students gradually taught the chatbots.

RQ4 investigated the interaction styles supported by the educational chatbots. Most surveyed chatbots used a chatbot-driven conversation where the chatbot was in control of the conversation. Some of these chatbots used a predetermined path, whereas others used intents that were triggered depending on the conversation. In general, related review studies did not distinguish between intent-based or flow-based chatbots. However, a review study surveyed chatbot-driven agents that were used for FAQ (Cunningham-Nelson et al., 2019 ). Other review studies, such as (Winkler & Söllner, 2018 ) highlighted that some chatbots are flow-based. However, no sufficient details were mentioned.

Only a few surveyed chatbots allowed for a user-driven conversation where the user can initiate and lead the conversation. Other review studies reported that such chatbots rely on AI algorithms (Winkler & Söllner, 2018 ).

RQ5 examined the principles used to guide the design of the educational chatbots. Personalized learning is a common approach where the learning content is recommended, and instruction and feedback are tailored based on students’ performance and learning styles. Most related review studies did not refer to personalized learning as a design principle, but some review studies such as (Cunningham-Nelson et al., 2019 ) indicated that some educational chatbots provided individualized responses to students.

Scaffolding has also been used in some chatbots where students are provided gradual guidance to help them become independent learners. Scaffolding chatbots can help when needed, for instance, when students are working on a challenging task. Other review studies such as (Wollny et al., 2021 ) also revealed that some chatbots scaffolded students’ discussions to help their learning.

Other surveyed chatbots supported collaborative learning by advising the students to work together on tasks or by engaging a group of students in a conversation. A related review study (Winkler & Söllner, 2018 ) highlighted that chatbots could be used to support collaborative learning.

The remaining surveyed chatbots engaged students in various methods such as social dialog, affective learning, learning by teaching, and experiential learning. However, none of the related review studies indicated such design principles behind educational chatbots.

A few surveyed chatbots have used social dialog to engage students. For instance, some chatbots engaged students with small talk and showed interest and social presence. Other chatbots used affective learning in the form of sympathetic and reassuring feedback to support learners in problematic situations. Additionally, learning by teaching was also used by two chatbots where the chatbot acted as a student and asked the chatbot for answers and examples. Further, a surveyed chatbot used experiential learning by asking students to develop explanations to problems gradually.

RQ6 studied the empirical evidence used to back the validity of the chatbots. Most surveyed chatbots were evaluated with experiments that largely proved with statistical significance that chatbots could improve learning and student satisfaction. A related review study (Hwang & Chang, 2021 ) indicated that many studies used experiments to substantiate the validity of chatbots. However, no discussion of findings was reported.

Some of the surveyed chatbots used evaluation studies to assess the effect of chatbots on perceived usefulness and subjective satisfaction. The results are in favor of the chatbots. A related review study (Hobert & Meyer von Wolff, 2019 ) mentioned that qualitative studies using pre/post surveys were used. However, no discussion of findings was reported.

Questionnaires were also used by some surveyed chatbots indicating perceived subjective satisfaction, ease of learning, and usefulness. Intriguingly, a review study (Pérez et al., 2020 ) suggested that questionnaires were the most common method of evaluation of chatbots. Such questionnaires pointed to high user satisfaction and no failure on the chatbot’s part.

Finally, only this study reported using focus groups as an evaluation method. Only three chatbots were evaluated with this method with a low number of participants, and the results showed usefulness, reasonable subjective satisfaction, and lack of training.

RQ7 examined the challenges and limitations of using educational chatbots. A frequently reported challenge was a lack of dataset training which caused frustration and learning difficulties. A review study (Pérez et al., 2020 ) hinted at a similar issue by shedding light on the complex task of collecting data to train the chatbots.

Two surveyed studies also noticed the novelty effect. Students seem to lose interest in talking to chatbots over time. A similar concern was reported by a related review study (Pérez et al., 2020 ).

Other limitations not highlighted by related review studies include the lack of user-centered design, the lack of feedback, and distractions. In general, the surveyed chatbots were not designed with the involvement of students in the process. Further, one surveyed chatbot did not assess the students’ knowledge, which may have negatively impacted the chatbot’s success. Finally, a surveyed study found that a chatbot’s external links and popup messages distracted the students from the essential tasks.

The main limitation not identified in our study is chatbot ethics. A review study (Okonkwo & Ade-Ibijola, 2021 ) discussed that ethical issues such as privacy and trust must be considered when designing educational chatbots.

To set the ground for future research and practical implementation of chatbots, we shed some light on several areas that should be considered when designing and implementing chatbots

Usability Principles Usability is a quality attribute that evaluates how easy a user interface is to use (Nathoo et al., 2019 ). Various usability principles can serve as guidance for designing user interfaces. For instance, Nielson presented ten heuristics considered rules of thumb Footnote 13 . Moreover, Shneiderman mentioned eight golden user interface design rules (Shneiderman et al., 2016 ). Further, based on the general usability principles and heuristics, some researchers devised usability heuristics for designing and evaluating chatbots (conversational user interfaces). The heuristics are based on traditional usability heuristics in conjunction with principles specific to conversation and language studies. In terms of the design phase, it is recommended to design user interfaces iteratively by involving users during the design phase (Lauesen, 2005 ).

The chatbots discussed in the reviewed articles aimed at helping students with the learning process. Since they interact with students, the design of the chatbots must pay attention to usability principles. However, none of the chatbots explicitly discussed the reliance on usability principles in the design phase. However, it could be argued that some of the authors designed the chatbots with usability in mind based on some design choices. For instance, Alobaidi et al. ( 2013 ) used contrast to capture user attention, while Ayedoun et al. ( 2017 ) designed their chatbot with subjective satisfaction in mind. Further, Song et al. ( 2017 ) involved the users in their design employing participatory design, while Clarizia et al. ( 2018 ) ensured that the chatbot design is consistent with existing popular chatbots. Similarly, Villegas-Ch et al. ( 2020 ) developed the user interface of their chatbot to be similar to that of Facebook messenger.

Nevertheless, we argue that it is crucial to design educational chatbots with usability principles in mind explicitly. Further, we recommend that future educators test for the chatbot’s impact on learning or student engagement and assess the usability of the chatbots.

Chatbot Personality Personality describes consistent and characteristic patterns of behavior, emotions, and cognition (Smestad and Volden, 2018 ). Research suggests that users treat chatbots as if they were humans (Chaves & Gerosa, 2021 ), and thus chatbots are increasingly built to have a personality. In fact, researchers have also used the Big Five model to explain the personalities a chatbot can have when interacting with users (Völkel & Kaya, 2021 ; McCrae & Costa, 2008 ). Existing studies experimented with various chatbot personalities such as agreeable, neutral, and disagreeable (Völkel & Kaya, 2021 ). An agreeable chatbot uses family-oriented words such as “family” or “together” (Hirsh et al., 2009 ), words that are regarded as positive emotionally such as “like” or “nice” (Hirsh et al., 2009 ), words indicating assurance such as “sure” (Nass et al., 1994 ), as well as certain emojis (Völkel et al., 2019 ), as suggested by the literature. On the other hand, a disagreeable chatbot does not show interest in the user and might be critical and uncooperative (Andrist et al., 2015 ).

Other personalities have also been attributed to chatbots, such as casual and formal personalities, where a formal chatbot uses a standardized language with proper grammar and punctuation, whereas a casual chatbot includes everyday, informal language (Andrist et al., 2015 ; Cafaro et al., 2016 ).

Despite the interest in chatbot personalities as a topic, most of the reviewed studies shied away from considering chatbot personality in their design. A few studies, such as (Coronado et al., 2018 ; Janati et al., 2020 ; Qin et al., 2020 ; Wambsganss et al., 2021 ), integrated social dialog into the design of the chatbot. However, the intention of the chatbots primarily focused on the learning process rather than the chatbot personality. We argue that future studies should shed light on how chatbot personality could affect learning and subjective satisfaction.

Chatbot Localization and Acceptance Human societies’ social behavior and conventions, as well as the individuals’ views, knowledge, laws, rituals, practices, and values, are all influenced by culture. It is described as the underlying values, beliefs, philosophy, and methods of interacting that contribute to a person’s unique psychological and social environment. Shin et al. ( 2022 ) defines culture as the common models of behaviors and interactions, cognitive frameworks, and perceptual awareness gained via socialization in a cross-cultural environment. The acceptance of chatbots involves a cultural dimension. The cultural and social circumstances in which the chatbot is used influence how students interpret the chatbot and how they consume and engage with it. For example, the study by (Rodrigo et al., 2012 ) shows evidence that the chatbot ‘Scooter’ was regarded and interacted with differently in the Philippines than in the United States. According to student gaming behavior in the Philippines, Scooter’s interface design did not properly exploit Philippine society’s demand for outwardly seamless interpersonal relationships.

Nevertheless, all other studies didn’t focus on localization as a design element crucial to the chatbot’s effectiveness and acceptance. We encourage future researchers and educators to assess how the localization of chatbots affects students’ acceptance of the chatbots and, consequently, the chatbot’s success as a learning mate.

Development Framework As it currently stands, the literature offers little guidance on designing effective usable chatbots. None of the studies used a certain framework or guiding principles in designing the chatbots. Future works could contribute to the Human-Computer Interaction (HCI) and education community by formulating guiding principles that assist educators and instructional designers in developing effective usable chatbots. Such guiding principles must assist educators and researchers across multiple dimensions, including learning outcomes and usability principles. A software engineering approach can be adopted, which guides educators in four phases: requirements, design, deployment, and assessment. A conceptual framework could be devised as a result of analyzing quantitative and qualitative data from empirical evaluations of educational chatbots. The framework could guide designing a learning activity using chatbots by considering learning outcomes, interaction styles, usability guidelines, and more.

End-user development of chatbots End-User Development (EUD) is a field that is concerned with tools and activities enabling end-users who are not professional software developers to write software programs (Lieberman et al., 2006 ). EUD uses various approaches such as visual programming (Kuhail et al., 2021 ) and declarative formulas (Kuhail and Lauesen, 2012 ). Since end-users outnumber software engineers by a factor of 30-to-1, EUD empowers a much larger pool of people to participate in software development (Kuhail et al., 2021 ). Only a few studies (e.g., (Ondáš et al., 2019 ; Benedetto & Cremonesi, 2019 ) have discussed how the educational chatbots were developed using technologies such as Google Dialogflow and IBM Watson Footnote 14 . Nevertheless, such technologies are only accessible to developers. Recently, commercial tools such as Google Dialogflow CX Footnote 15 emerged to allow non-programmers to develop chatbots with visual programming, allowing end-users to create a program by putting together graphical visual elements rather than specifying them textually.

Future studies could experiment with existing EUD tools that allow educational chatbots’ development. In particular, researchers could assess the usability and expressiveness of such tools and their suitability in the educational context.

7 Conclusion

This study described how several educational chatbot approaches empower learners across various domains. The study analyzed 36 educational chatbots proposed in the literature. To analyze the tools, the study assessed each chatbot within seven dimensions: educational field, platform, educational role, interaction style, design principles, empirical principles, and challenges as well as limitations.

The results show that the chatbots were proposed in various areas, including mainly computer science, language, general education, and a few other fields such as engineering and mathematics. Most chatbots are accessible via a web platform, and a fewer chatbots were available on mobile and desktop platforms. This choice can be explained by the flexibility the web platform offers as it potentially supports multiple devices, including laptops, mobile phones, etc.

In terms of the educational role, slightly more than half of the studies used teaching agents, while 13 studies (36.11%) used peer agents. Only two studies presented a teachable agent, and another two studies presented a motivational agent. Teaching agents gave students tutorials or asked them to watch videos with follow-up discussions. Peer agents allowed students to ask for help on demand, for instance, by looking terms up, while teachable agents initiated the conversation with a simple topic, then asked the students questions to learn. Motivational agents reacted to the students’ learning with various emotions, including empathy and approval.

In terms of the interaction style, the vast majority of the chatbots used a chatbot-driven style, with about half of the chatbots using a flow-based with a predetermined specific learning path, and 36.11% of the chatbots using an intent-based approach. Only four chatbots (11.11%) used a user-driven style where the user was in control of the conversation. A user-driven interaction was mainly utilized for chatbots teaching a foreign language.

Concerning the design principles behind the chatbots, slightly less than a third of the chatbots used personalized learning, which tailored the educational content based on learning weaknesses, style, and needs. Other chatbots used experiential learning (13.88%), social dialog (11.11%), collaborative learning (11.11%), affective learning (5.55%), learning by teaching (5.55%), and scaffolding (2.77%).

Concerning the evaluation methods used to establish the validity of the approach, slightly more than a third of the chatbots used experiment with mostly significant results. The remaining chatbots were evaluated with evaluation studies (27.77%), questionnaires (27.77%), and focus groups (8.33%). The findings point to improved learning, high usefulness, and subjective satisfaction.

Some studies mentioned limitations such as inadequate or insufficient dataset training, lack of user-centered design, students losing interest in the chatbot over time, and some distractions.

There are several challenges to be addressed by future research. None of the articles explicitly relied on usability heuristics and guidelines in designing the chatbots, though some authors stressed a few usability principles such as consistency and subjective satisfaction. Further, none of the articles discussed or assessed a distinct personality of the chatbots though research shows that chatbot personality affects users’ subjective satisfaction.

Future studies should explore chatbot localization, where a chatbot is customized based on the culture and context it is used in. Moreover, researchers should explore devising frameworks for designing and developing educational chatbots to guide educators to build usable and effective chatbots. Finally, researchers should explore EUD tools that allow non-programmer educators to design and develop educational chatbots to facilitate the development of educational chatbots. Adopting EUD tools to build chatbots would accelerate the adoption of the technology in various fields.

Study Limitations

We established some limitations that may affect this study. We restricted our research to the period January 2011 to April 2021. This limitation was necessary to allow us to practically begin the analysis of articles, which took several months. We potentially missed other interesting articles that could be valuable for this study at the date of submission.

We conducted our search using four digital libraries: ACM, Scopus, IEEE Xplore, and SpringerLink. We may have missed other relevant articles found in other libraries such as Web of Science.

Our initial search resulted in a total of 1208 articles. We applied exclusion criteria to find relevant articles that were possible to assess. As such, our decision might have caused a bias: for example, we could have excluded short papers presenting original ideas or papers without sufficient evidence.

Since different researchers with diverse research experience participated in this study, article classification may have been somewhat inaccurate. As such, we mitigated this risk by cross-checking the work done by each reviewer to ensure that no relevant article was erroneously excluded. We also discussed and clarified all doubts and gray areas after analyzing each selected article.

There is also a bias towards empirically evaluated articles as we only selected articles that have an empirical evaluation, such as experiments, evaluation studies, etc. Further, we only analyzed the most recent articles when many articles discussed the same concept by the same researchers.

At last, we could have missed articles that report an educational chatbot that could not be found in the selected search databases. To deal with this risk, we searched manually to identify significant work beyond the articles we found in the search databases. Nevertheless, the manual search did not result in any articles that are not already found in the searched databases.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

https://www.statista.com/statistics/656596/worldwidechatbot-market/

https://developer.amazon.com/en-US/alexa

https://www.apple.com/sa-ar/siri/

https://dialogflow.cloud.google.com

https://chatfuel.com/

https://www.rayyan.ai/

https://www.scimagojr.com/

https://www.appannie.com/en/insights/market-data/state-of-mobile-2020-infographic/

https://dialogflow.cloud.google.com/

https://www.qnamaker.ai/

https://www.nngroup.com/articles/ten-usability-heuristics/

https://www.ibm.com/sa-ar/watson

https://cloud.google.com/dialogflow/cx/docs/basics

AbuShawar, B., & Atwell, E. (2015). Alice chatbot: Trials and outputs. Computación y Sistemas , 19 (4), 625–632.

Article   Google Scholar  

Baylor, A.L (2011). The design of motivational agents and avatars. Educational Technology Research and Development , 59 (2), 291–300.

Benotti, L., Martnez, M.C., & Schapachnik, F. (2017). Atool for introducing computer science with automatic formative assessment. IEEE Transactions on Learning Technologies , 11 (2), 179–192.

Benotti, L., Martnez, M.C., & Schapachnik, F. (2018). Atool for introducing computer science with automatic formative assessment. IEEE Transactions on Learning Technologies , 11 (2), 179–192. https://doi.org/10.1109/TLT.2017.2682084 .

Cafaro, A., Vilhjálmsson, H.H., & Bickmore, T. (2016). First impressions in human–agent virtual encounters. ACM Transactions on Computer-Human Interaction (TOCHI) , 23 (4), 1–40.

Car, L.T., Dhinagaran, D.A., Kyaw, B.M., Kowatsch, T., Joty, S., Theng, Y.-L., & Atun, R. (2020). Conversational agents in health care: scoping review and conceptual analysis. Journal of medical Internet research , 22 (8), e17158.

Chase, C.C, Chin, D.B, Oppezzo, M.A, & Schwartz, D.L (2009). Teachable agents and the protégé effect: Increasing the effort towards learning. Journal of Science Education and Technology , 18 (4), 334–352.

Chaves, A.P., & Gerosa, M.A. (2021). How should my chatbot interact? a survey on social characteristics in human–chatbot interaction design. International Journal of Human–Computer Interaction , 37 (8), 729–758.

Chou, C.-Y., & Zou, N.-B. (2020). An analysis of internal and external feedback in self-regulated learning activities mediated by self-regulated learning tools and open learner models. International Journal of Educational Technology in Higher Education , 17 (1), 1–27.

Cook, T.D, Campbell, D.T., & Shadish, W. (2002). Experimental and quasi-experimental designs for generalized causal inference . MA: Houghton Mifflin Boston.

Google Scholar  

Coronado, M., Iglesias, C.A., Carrera, Á., & Mardomingo, A. (2018). A cognitive assistant for learning java featuring social dialogue. International Journal of Human-Computer Studies , 117 , 55–67.

Cunningham-Nelson, S., Baktashmotlagh, M., & Boles, W. (2019). Visualizing student opinionthroughtext analysis. IEEE Transactions on Education , 62 (4), 305–311.

Daud, S.H.M., Teo, N.H.I., & Zain, N.H.M. (2020). Ejava chatbot for learning programming language: Apost-pandemic alternative virtual tutor. International Journal , 8 (7), 3290–3298.

De Angeli, A., & Brahnam, S. (2008). I hate you! disinhibition with virtualpartners. Interacting with computers , 20 (3), 302–310.

Dehn, D.M, & Van Mulken, S. (2000). The impact of animated interface agents: a review of empirical research. International journal of human-computer studies , 52 (1), 1–22.

D’mello, S., & Graesser, A. (2013). Autotutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems (TiiS) , 2 (4), 1–39.

Dwivedi, M., Upadhyay, M.S., & Tripathi, A. (2012). A working framework for the user-centered design approach and a survey of the available methods. International Journal of Scientific and Research Publications , 2 (4), 12–19.

Fryer, L.K, Ainley, M., Thompson, A., Gibson, A., & Sherlock, Z. (2017). Stimulating and sustaining interest in a language course: An experimental comparison of chatbot and human task partners. Computers in Human Behavior , 75 , 461–468.

Haake, M., & Gulz, A. (2009). A look atthe roles of look & roles in embodied pedagogical agents–a user preference perspective. International Journal of Artificial Intelligence in Education , 19 (1), 39–71.

Hattie, J., & Timperley, H. (2007). Thepower of feedback. Review of Educational Research , 77 (1), 81–112.

Hirsh, J.B, DeYoung, C.G, & Peterson, J.B (2009). Metatraits of the big five differentially predict engagement and restraint of behavior. Journal of Personality , 77 (4), 1085–1102.

Kester, L., Kirschner, P.A, & Van Merriënboer, J.J. (2005). The management of cognitive load during complex cognitive skill acquisition by means of computer-simulated problem solving. British Journal of Educational Psychology , 75 (1), 71–85.

King, F.B (2002). A virtual student: Not an ordinary joe. The Internet and Higher Education , 5 (2), 157–166.

Kulik, J.A, & Fletcher, J.D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of Educational Research , 86 (1), 42–78.

Kumar, R., & Rose, C.P (2010). Architecture for building conversational agents that support collaborative learning. IEEE Transactions on Learning Technologies , 4 (1), 21–34.

Martha, A.S.D., & Santoso, H.B (2019). The design and impact of the pedagogical agent: A systematic literature review. Journal of Educators Online , 16 (1), n1.

Matsuda, N., Yarzebinski, E., Keiser, V., Raizada, R., Cohen, W.W, Stylianides, G.J, & Koedinger, K.R (2013). Cognitive anatomy of tutor learning: Lessons learned with simstudent. Journal of Educational Psychology , 105 (4), 1152.

Morgan, D.L (1996). Focus groups. Annual Review of Sociology , 22 (1), 129–152.

Narciss, S., Sosnovsky, S., Schnaubert, L., Andrès, E., Eichelmann, A., Goguadze, G., & Melis, E. (2014). Exploring feedback and student characteristics relevant for personalizing feedback strategies. Computers & Education , 71 , 56–76.

Okonkwo, C.W., & Ade-Ibijola, A. (2021). Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence , 2 , 100033.

Pérez, J.Q., Daradoumis, T., & Puig, J.M.M. (2020). Rediscovering the use of chatbots in education: A systematic literature review. Computer Applications in Engineering Education , 28 (6), 1549–1565.

Rodrigo, M.M.T, Baker, R.S., Agapito, J., Nabos, J., Repalam, M.C., Reyes, S.S, & San Pedro, M.O.C. (2012). The effects of an interactive software agent on student affective dynamics while using; an intelligent tutoring system. IEEE Transactions on Affective Computing , 3 (2), 224–236.

Ruttkay, Z., & Pelachaud, C. (2006). From browstotrust: Evaluating embodied conversational agents Vol. 7. Berlin: Springer Science & Business Media.

MATH   Google Scholar  

Schouten, D.G., Venneker, F., Bosse, T., Neerincx, M.A, & Cremers, A.H. (2017). A digital coach that provides affective and social learning support to low-literate learners. IEEE Transactions on Learning Technologies , 11 (1), 67–80.

Seel, N.M. (2011). Encyclopedia of the sciences of learning . Berlin: Springer Science & Business Media.

Serenko, A., Bontis, N., & Detlor, B. (2007). End-user adoption of animated interface agent sin everyday work applications. Behaviour & Information Technology , 26 (2), 119–132.

Shin, D., Chotiyaputta, V., & Zaid, B. (2022). The effects of cultural dimensions on algorithmic news: How do cultural value orientations affect how people perceive algorithms? Computers in Human Behavior , 126 , 107007.

Shneiderman, B., Plaisant, C., Cohen, M.S, Jacobs, S., Elmqvist, N., & Diakopoulos, N. (2016). Designingthe user interface: strategies for effective human-computer interaction . London: Pearson.

Smutny, P., & Schreiberova, P. (2020). Chatbots for learning: A review of educational chatbots for the facebook messenger. Computers & Education , 151 , 103862.

Stahl, G. (2006). Group cognition: Computer support for building collaborative knowledge (acting with technology) . Penguin: The MIT Press.

Book   Google Scholar  

Tegos, S., Demetriadis, S., & Karakostas, A. (2015). Promoting academically productive talk with conversational agent interventions in collaborative learning settings. Computers & Education , 87 , 309–325.

Tegos, S., Demetriadis, S., & Tsiatsos, T. (2014). A configurable conversational agent to trigger students’ productive dialogue: a pilot study in the call domain. International Journal of Artificial Intelligence in Education , 24 (1), 62–91.

VanLehn, K., Graesser, A., Jackson, G.T., Jordan, P., Olney, A., & Rosé, C.P. (2007). Natural language tutoring: A comparison of human tutors, computer tutors, and text. Cognitive Science , 31 (1), 3–52.

Villegas-Ch, W., Arias-Navarrete, A., & Palacios-Pacheco, X. (2020). Proposal of an architecture for the integration of a chatbot with artificial intelligence in a smart campus for the improvement of learning. Sustainability , 12 (4), 1500.

Walker, E., Rummel, N., & Koedinger, K.R (2011). Designing automated adaptive support to improve student helping behaviors in a peer tutoring activity. International Journal of Computer-Supported Collaborative Learning , 6 (2), 279–306.

Weizenbaum, J. (1966). Eliza–a computer program for the study of natural language communication between man and machine. Communications of the ACM , 9 (1), 36–45.

Wik, P., & Hjalmarsson, A. (2009). Embodied conversational agents in computer assisted language learning. Speech Communication , 51 (10), 1024–1037.

Alkhoori, A., Kuhail, M.A., & Alkhoori, A. (2020). Unibud: A virtual academic adviser. In 2020 12th annual undergraduate research conference on applied computing (URC) (pp. 1–4).

Alobaidi, O.G, Crockett, K.A, O’Shea, J.D, & Jarad, T.M (2013). Abdullah: An intelligent arabic conversational tutoring system for modern islamic education. In Proceedings ofthe world congress on engineering , Vol. 2.

Andrist, S., Mutlu, B., & Tapus, A. (2015). Look like me: matching robot personality via gazeto increase motivation. In Proceedings ofthe 33rd annual ACM conference on human factors in computing systems (pp. 3603–3612).

Anghelescu, P., & Nicolaescu, S.V. (2018). Chatbot application using search engines and teaching methods. In 2018 10th international conference on electronics, computers and artificial intelligence (ECAI) (pp. 1–6).

Ayedoun, E., Hayashi, Y., & Seta, K. (2017). Communication strategies and affective back channels for conversational agents to enhance learners’ willingnessto communicate in a second language. In International conference on artificial intelligence in education (pp. 459–462).

Benedetto, L., & Cremonesi, P. (2019). Rexy, a configurable application for building virtual teaching assistants. In IFIP conference on human-computer interaction (pp. 233–241).

Bradeško, L., & Mladenić, D. (2012). A survey of chatbot systemsthrough a loebner prize competition. In Proceedings of Slovenian language technologies society eighth conference of language technologies (pp. 34–37).

Brewer, R.N, Findlater, L., Kaye, J., Lasecki, W., Munteanu, C., & Weber, A. (2018). Accessible voice interfaces. In Companion ofthe 2018 ACM conference on computer supported cooperative work and social computing (pp. 441–446).

Budiu, R. (2018). The user experience of chatbots. Nielsen Norman Group 25 .

Chaudhuri, S., Kumar, R., Howley, I.K., & Rosé, C.P. (2009). Engaging collaborative learners with helping agents. In AIED (pp. 365–372).

Chhibber, N., & Law, E. (2019). Using conversational agents to support learning by teaching. arXiv: 1909.13443 .

Chukhno, O., Chukhno, N., Samouylov, K.E, & Shorgin, S. (2019). A chatbot as an environment for carrying out the group decision making process. In ITTMM (Selected Papers) (pp. 15–25).

Clarizia, F., Colace, F., Lombardi, M., Pascale, F., & Santaniello, D. (2018). Chatbot: An education support system for student. In International symposium on cyberspace safety and security (pp. 291–302).

Cunningham-Nelson, S., Boles, W., Trouton, L., & Margerison, E. (2019). A review of chatbots in education:practical steps forward. In 30th annual conference for the australasian association for engineering education (AAEE 2019): educators becoming agents of change: innovate, integrate, motivate (pp. 299–306).

da Silva Oliveira, J., Espíndola, D.B., Barwaldt, R., Ribeiro, L.M., & Pias, M. (2019). Ibm watson application as faq assistant about moodle. In 2019 ieee frontiers in education conference (FIE) (pp. 1–8).

Dutta, D. (2017). Developing an intelligent chat-bottool to assist high school students for learning general knowledge subjects (Tech. rep.) Georgia Institute of Technology.

Felicia, P. (2011). Handbook of research on improving learning and motivation through educational games: Multidisciplinary approaches: Multidisciplinary approaches. iGi Global.

Feng, D., Shaw, E., Kim, J., & Hovy, E. (2006). An intelligent discussion-bot for answering student queries in threaded discussions. In Proceedings ofthe 11th international conference on intelligent user interfaces (pp. 171–177).

Følstad, A., Skjuve, M., & Brandtzaeg, P.B. (2018). Different chatbots for different purposes:towards a typology of chatbots to understand interaction design. In International conference on internet science (pp. 145–156).

Griol, D., Baena, I., Molina, J.M., & de Miguel, A.S. (2014). A multimodal conversational agent for personalized language learning. In Ambient intelligence-software and applications (pp. 13–21). Springer.

Griol, D., García-Herrero, J., & Molina, J. (2011). The educagent platform: Intelligent conversational agents for e-learning applications. In Ambient intelligence-software and applications (pp. 117–124). Springer.

Gulz, A., Haake, M., Silvervarg, A., Sjödén, B., & Veletsianos, G. (2011). Building a social conversational pedagogical agent: Design challenges and methodological approaches. In Conversational agents and natural language interaction: Techniques and effective practices (pp. 128–155). IGI Global.

Hayashi, Y. (2013). Learner-support agents for collaborative interaction: A study on affect and communication channels.

Heffernan, N.T, & Croteau, E.A (2004). Web-based evaluations showing differential learning for tutorial strategies employed by the ms. lindquisttutor. In International conference on intelligent tutoring systems (pp. 491–500).

Hobert, S., & Meyer von Wolff, R. (2019). Say helloto your new automatedtutor–a structured literature review on pedagogical conversational agents.

Hwang, G.-J., & Chang, C.-Y. (2021). A review of opportunities and challenges of chatbots in education. Interactive Learning Environments, 1–14.

Inostroza, R., Rusu, C., Roncagliolo, S., Jimenez, C., & Rusu, V. (2012). Usability heuristics for touchscreen-based mobile devices. In 2012 ninth international conference on information technology-new generations (pp. 662–667).

Ismail, M., & Ade-Ibijola, A. (2019). Lecturer’s apprentice: A chatbot for assisting novice programmers. In 2019 international multidisciplinary information technology and engineering conference (IMITEC) (pp. 1–8).

Janati, S.E., Maach, A., & Ghanami, D.E. (2020). Adaptive e-learning ai-powered chatbot based on multimedia indexing. International Journal of Advanced Computer Science and Applications 11 (12). Retrieved from https://doi.org/10.14569/IJACSA.2020.0111238 .

Kaczorowska-Spychalska, D. (2019). Chatbots in marketing. Management 23 (1).

Keele, S., et al. (2007). Guidelines for performing systematic literature reviews in software engineering (Tech. Rep.) Citeseer.

Kerlyl, A., Hall, P., & Bull, S. (2006). Bringing chatbots into education: Towards natural language negotiation of open learner models. In International conference on innovative techniques and applications of artificial intelligence (pp. 179–192).

Kerry, A., Ellis, R., & Bull, S. (2008). Conversational agents in e-learning. In International conference on innovative techniques and applications of artificial intelligence (pp. 169–182).

Klüwer, T. (2011). “i like your shirt”-dialogue acts for enabling socialtalk in conversational agents. In International workshop on intelligent virtual agents (pp. 14–27).

Kuhail, M.A., & Lauesen, S. (2012). Customizable visualizations with formula-linked building blocks. In GRAPP/IVAPP (pp. 768–771).

Kuhail, M.A., Farooq, S., Hammad, R., & Bahja, M. (2021). Characterizing visual programming approaches for end-user developers: A systematic review. IEEE Access.

Latham, A., Crockett, K., McLean, D., & Edmonds, B. (2011). Oscar: an intelligent adaptive conversational agent tutoring system. In KES international symposium on agent and multi-agent systems: technologies and applications (pp. 563–572).

Lauesen, S. (2005). User interface design: a software engineering perspective. Pearson Education.

Laurillard, D. (2013). Rethinking university teaching: A conversational framework for the effective use of learning technologies. Routledge.

Law, E., Baghaei Ravari, P., Chhibber, N., Kulic, D., Lin, S., Pantasdo, K.D, Ceha, J., Suh, S., & Dillen, N. (2020). Curiosity notebook: A platform for learning by teaching conversational agents. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1–9).

Lee, L.-K., Fung, Y.-C., Pun, Y.-W., Wong, K.-K., Yu, M.T.-Y., & Wu, N.-I. (2020). Using a multiplatform chatbot as an onlinetutor in a university course. In 2020 international symposium on educational technology (ISET) (pp. 53–56). IEEE.

Lieberman, H., Paternò, F., Klann, M., & Wulf, V. (2006). End-user development: An emerging paradigm. In End user development (pp. 1–8). Springer.

Mabunda, K. (2020). An intelligent chatbot for guiding visitors and locating venues.

Maybin, J., Mercer, N., & Stierer, B. (1992). Scaffolding learning in the classroom. Thinking voices: The work ofthe national oracy project 186–195.

McCrae, R.R., & Costa, P.T.Jr. (2008). The five-factortheory of personality.

Mellado-Silva, R., Faúndez-Ugalde, A., & Blanco-Lobos, M. (2020). Effective learning of tax regulations using different chatbot techniques.

Mellenbergh, G.J, & Adèr, H.J. (2008). Tests and questionnaires: Construction and administration. Advising on Research Methods: A Consultant’s Companion 211–236.

Mendez, S., Johanson, K., Martin Conley, V., Gosha, K., A Mack, N., Haynes, C., & A Gerhardt, R. (2020). Chatbots: Atoolto supplementthe future faculty mentoring of doctoral engineering students. International Journal of Doctoral Studies 15 .

Nass, C., Steuer, J., & Tauber, E.R (1994). Computers are social actors. In Proceedings ofthe SIGCHI conference on Human factors in computing systems (pp. 72–78).

Nathoo, A., Gangabissoon, T., & Bekaroo, G. (2019). Exploringthe use of tangible user interfaces for teaching basic java programming concepts: A usability study. In 2019 conference on next generation computing applications (NextComp) (pp. 1–5).

Oh, K.-J., Lee, D., Ko, B., & Choi, H.-J. (2017). A chatbot for psychiatric counseling in mental healthcare service based on emotional dialogue analysis and sentence generation. In 2017 18th IEEE international conference on mobile data management (MDM) (pp. 371–375).

Ondáš, S., Pleva, M., & Hládek, D. (2019). How chatbots can be involved in the education process. In 2019 17th international conference on emerging elearning technologies and applications (ICETA) (pp. 575–580).

Padró, L., & Stanilovsky, E. (2012). Freeling 3.0: Towards wider multilinguality. In LREC2012 .

Payne, G., & Payne, J. (2004). Key concepts in social research. Sage.

Qin, C., Huang, W., & Hew, K.F. (2020). Usingthe community of inquiry framework to develop an educational chatbot: lesson learned from a mobile instant messaging learning environment. In Proceedings ofthe 28th international conference on computers in education .

Ramesh, K., Ravishankaran, S., Joshi, A., & Chandrasekaran, K. (2017). A survey of design techniques for conversational agents. In International conference on information, communication and computing technology (pp. 336–350).

Redondo-Hernández, A., & Pérez-Marín, D. (2011). Aprocedureto automatically adapt questions in student–pedagogic conversational agent dialogues. In International conference on user modeling, adaptation, and personalization (pp. 122–134).

Ruan, S., Jiang, L., Xu, Q., Liu, Z., Davis, G.M, Brunskill, E., & Landay, J.A. (2021). Englishbot: An ai-powered conversational system for second language learning. In 26th international conference on intelligent user interfaces (pp. 434–444).

Ruan, S., Willis, A., Xu, Q., Davis, G.M., Jiang, L., Brunskill, E., & Landay, J.A (2019). Bookbuddy: Turning digital materials into interactive foreign language lessonsthrough a voice chatbot. In Proceedings ofthe Sixth (2019) ACM conference on learning@ scale (pp. 1–4).

Sinha, S., Basak, S., Dey, Y., & Mondal, A. (2020). An educational chatbot for answering queries, Springer.

Smestad, T.L., & Volden, F. (2018). Chatbot personalities matters. In International conference on internet science (pp. 170–181).

Song, D., Oh, E.Y., & Rice, M. (2017). Interacting with a conversational agent system for educational purposes in online courses. In 2017 10th international conference on human system interactions (HSI) (pp. 78–82). IEEE.

Tegos, S., Psathas, G., Tsiatsos, T., Katsanos, C., Karakostas, A., Tsibanis, C., & Demetriadis, S. (2020). Enriching synchronous collaboration in online courses with configurable conversational agents. In International Conference on Intelligent Tutoring Systems (pp. 284–294).

Thomas, H. (2020). Critical literature review on chatbots in education.

Verleger, M., & Pembridge, J. (2018). Apilot study integrating an ai-driven chatbot in an introductory programming course. In 2018 ieee frontiers in education conference (FIE) (pp. 1–4).

Völkel, S.T., Buschek, D., Pranjic, J., & Hussmann, H. (2019). Understanding emoji interpretation through user personality and message context. In Proceedings of the 21st international conference on human-computer interaction with mobile devices and services (pp. 1–12).

Völkel, S.T., & Kaya, L. (2021). Examining user preference for agreeableness in chatbots. In CUI 2021-3rd conference on conversational user interfaces (pp. 1–6).

Wambsganss, T., Kueng, T., Soellner, M., & Leimeister, J.M. (2021). Arguetutor: an adaptive dialog-based learning system for argumentation skills. In Proceedings of the 2021 CHI conference on human factors in computing systems (pp. 1–13).

Wambsganss, T., Winkler, R., Schmid, P., & Söllner, M. (2020). Designing a conversational agent as a formative course evaluation tool.

Wambsganss, T., Winkler, R., Söllner, M., & Leimeister, J. M. (2020). A conversational agent to improve response quality in course evaluations. In Extended Abstracts of the 2020 CHI conference on human factors in computing systems (pp. 1–9).

West, A., Swanson, J., & Lipscomb, L. (2017). Ch. 11 scaffolding. Instructional methods, strategies and technologies to meet the needs of all learners.

Winkler, R., Hobert, S., Salovaara, A., Söllner, M., & Leimeister, Jan Marco (2020). Sara, the lecturer: Improving learning in online education with a scaffolding-based conversational agent. In Proceedings of the 2020 CHI conference on human factors in computing systems (pp. 1–14).

Winkler, R., & Söllner, M. (2018). Unleashing the potential of chatbots in education: A state-of-the-art analysis. In Academy of management annual meeting (AOM) .

Wollny, S., Schneider, J., Di Mitri, D., Weidlich, J., Rittberger, M., & Drachsler, H. (2021). Are we there yet?-a systematic literature review on chatbots in education. Frontiers in artificial intelligence 4 .

Xu, A., Liu, Z., Guo, Y., Sinha, V., & Akkiraju, R. (2017). A new chatbot for customer service on social media. In Proceedings of the 2017 CHI conference on human factors in computing systems (pp. 3506–3510).

Zedadra, A., Lafifi, Y., & Zedadra, O. (2014). Interpreting learners’traces in collaborative learning environments. In 2014 4th international symposium isko-maghreb: concepts and tools for knowledge management (isko-maghreb) (pp. 1–8).

Download references

No funding was received to assist with the preparation of this manuscript.

The authors have no relevant financial or non-financial interests to disclose

Author information

Authors and affiliations.

College of Technological Innovation, Zayed University, Abu Dhabi, 144534, United Arab Emirates

Mohammad Amin Kuhail

Information System department, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia

Nazik Alturki

College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia

Salwa Alramlawi

Independent Scholar, Yanbu, 46423, Saudi Arabia

Kholood Alhejori

You can also search for this author in PubMed   Google Scholar

Contributions

Mohammad Amin Kuhail: Conceptualization, Methodology, Validation, Investigation, Data Curation, Writing, Original Draft, Writing - Review & Editing, Supervision, Project administration

Nazik Alturki: Conceptualization, Methodology, Investigation, Data Curation, Writing, Writing - Review & Editing, Supervision.

Salwa Alramlawi: Investigation, Data Curation, Writing, Formal Analysis, Writing, Writing - Review & Editing.

Kholood Alhejori: Investigation, Data Curation, Writing, Writing - Review & Editing.

Corresponding author

Correspondence to Nazik Alturki .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Kuhail, M.A., Alturki, N., Alramlawi, S. et al. Interacting with educational chatbots: A systematic review. Educ Inf Technol 28 , 973–1018 (2023). https://doi.org/10.1007/s10639-022-11177-3

Download citation

Received : 28 February 2022

Accepted : 14 June 2022

Published : 09 July 2022

Issue Date : January 2023

DOI : https://doi.org/10.1007/s10639-022-11177-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Conversational Agent
  • Educational Bot
  • Literature Review
  • Interaction Styles
  • Human-Computer Interaction
  • Find a journal
  • Publish with us
  • Track your research

Help | Advanced Search

Computer Science > Computation and Language

Title: how well can llms echo us evaluating ai chatbots' role-play ability with echo.

Abstract: The role-play ability of Large Language Models (LLMs) has emerged as a popular research direction. However, existing studies focus on imitating well-known public figures or fictional characters, overlooking the potential for simulating ordinary individuals. Such an oversight limits the potential for advancements in digital human clones and non-player characters in video games. To bridge this gap, we introduce ECHO, an evaluative framework inspired by the Turing test. This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses. Notably, our framework focuses on emulating average individuals rather than historical or fictional figures, presenting a unique advantage to apply the Turing Test. We evaluated three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models, alongside the online application GPTs from OpenAI. Our results demonstrate that GPT-4 more effectively deceives human evaluators, and GPTs achieves a leading success rate of 48.3%. Furthermore, we investigated whether LLMs could discern between human-generated and machine-generated texts. While GPT-4 can identify differences, it could not determine which texts were human-produced. Our code and results of reproducing the role-playing LLMs are made publicly available via this https URL .

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. (PDF) Artificial Intelligence Chatbot in Android System using Open

    chatbot research paper

  2. Chatbot Research and Design: Third International Workshop

    chatbot research paper

  3. Top 10 AI Chatbot Research Papers from axXiv.org in 2019

    chatbot research paper

  4. (PDF) The Use of Chatbots in Digital Business Transformation: A

    chatbot research paper

  5. (PDF) Review of integrated applications with AIML based chatbot

    chatbot research paper

  6. (PDF) How to Program a Chatbot

    chatbot research paper

VIDEO

  1. Prof. Dumbot: Talk through ML research papers with GPT

  2. Generative Chatbot Adaptation for Odia Language: A Critical Evaluation

  3. Chatbot for Research paper Writing

  4. AI Chatbots: From Limitations to Possibilities

  5. How to Write a Research Paper using ChatGPT & Bard AI

  6. AI Chatbot for Research Paper Writing _ Chapple AI _ Unique AI Templates _ AI Tool

COMMENTS

  1. Chatbots: History, technology, and applications

    This literature review presents the History, Technology, and Applications of Natural Dialog Systems or simply chatbots. It aims to organize critical information that is a necessary background for further research activity in the field of chatbots. More specifically, while giving the historical evolution, from the generative idea to the present ...

  2. Future directions for chatbot research: an interdisciplinary research

    Chatbots are increasingly becoming important gateways to digital services and information—taken up within domains such as customer service, health, education, and work support. However, there is only limited knowledge concerning the impact of chatbots at the individual, group, and societal level. Furthermore, a number of challenges remain to be resolved before the potential of chatbots can ...

  3. Chatbots applications in education: A systematic review

    This paper presents a systematic review of previous studies on the use of Chatbots in education. A systematic review approach was used to analyse 53 articles from recognised digital databases. The review results provide a comprehensive understanding of prior research related to the use of Chatbots in education, including information on existing ...

  4. The human side of human-chatbot interaction: A systematic literature

    Overall, the most common research methods used in human-chatbot interaction research were quantitative (49.4% of the papers, N=41), while qualitative studies were reported in 10.8% of the papers (N=9). Papers reporting mixed methods accounted for 39.8% (N=33) of the review.

  5. AI and Deep Learning-driven Chatbots: A Comprehensive Analysis and

    Chatbots are the next big technological evolution in the realm of conversational assistants and voice assistants in the modern technology era. A chatbot, sometimes known as a bot, is a piece of code developed and built to respond effectively to users' input, utilizing natural skills in understanding inquiries and delivering appropriate replies. Top industries and organizations are embracing ...

  6. Exploring agent-based chatbots: a systematic literature review

    This section describes the definition of the structured research questions and the development of the review protocol describing the search strategy, the inclusion and exclusion criteria, the biases and disagreement resolution, and the quality criteria.. 3.1 Research questions. As introduced in Sect. 1, the research community has proposed the usage of multi-agent-based chatbots in recent years ...

  7. An Overview of Chatbot Technology

    This paper presents a historical overview of chatbot evolution, motivations, applications, and design issues. It also classifies chatbots based on various criteria and describes their architecture and platforms.

  8. [2201.06657] A Literature Survey of Recent Advances in Chatbots

    This paper reviews the latest research on chatbots, intelligent conversational systems that mimic human conversation. It covers the methods, algorithms, challenges and limitations of chatbots, and provides a journal reference and DOI link.

  9. Future directions for chatbot research: an interdisciplinary research

    Araujo T Conversational agent research toolkit: an alternative for creating and managing chatbots for experimental research Comput Commun Res 2020 2 1 35 51 Google Scholar; 6. Ashktorab Z, Jain M, Liao QV, Weisz JD (2019) Resilient chatbots: repair strategy preferences for conversational breakdowns. In: Proceedings of CHI 2019 (paper no. 254).

  10. PDF A Literature Survey of Recent Advances in Chatbots

    chatbots and their evolution through time, Section3describes the methodology, Section4 presents an analysis of the state of the art in terms of chatbots Deep Learning algorithms; including the datasets used for training and evaluation methods, in Section5we will discuss related works and we conclude the paper in Section7. 2. Chatbots Background

  11. (PDF) An Overview of Chatbot Technology

    An Overview of Chatbot Technology. May 2020. DOI: 10.1007/978-3-030-49186-4_31. Conference: IFIP International Conference on Artificial Intelligence Applications and Innovations. Authors: Eleni ...

  12. (PDF) Design and Development of CHATBOT: A Review

    This paper reviews the technique, terminology, and different platforms used to design and develop the CHATBOT. It also presents some actual practical life typical applications and examples of CHATBOT.

  13. A Literature Survey of Recent Advances in Chatbots

    This evaluation metrics has been applied in a few research papers to evaluate chatbots performances, such as in [67,68]. ... Firstly, although many survey papers on chatbots present a detailed explanation of chatbots' technologies and implementations, recent surveys lack information on most recent advances in language models that might be ...

  14. Are We There Yet?

    Chatbots are a promising technology with the potential to enhance workplaces and everyday life. In terms of scalability and accessibility, they also offer unique possibilities as communication and information tools for digital learning. In this paper, we present a systematic literature review investigating the areas of education where chatbots have already been applied, explore the pedagogical ...

  15. 10000 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on CHATBOT. Find methods information, sources, references or conduct a literature review on CHATBOT

  16. Chatbot Research and Design

    The post proceeding CONVERSATIONS 2022 present papers about open domain chatbots, as well as the variety of chatbot application areas, and much more. Chatbot Research and Design: 6th International Workshop, CONVERSATIONS 2022, Amsterdam, The Netherlands, November 22-23, 2022, Revised Selected Papers | SpringerLink

  17. An Overview of Chatbot Technology

    Abstract. The use of chatbots evolved rapidly in numerous fields in recent years, including Marketing, Supporting Systems, Education, Health Care, Cultural Heritage, and Entertainment. In this paper, we first present a historical overview of the evolution of the international community's interest in chatbots.

  18. Future directions for chatbot research: an interdisciplinary research

    In this paper, we propose a research agenda which has been distilled through a series of dedicated workshops on chatbot research—CONVERSATIONS—with intensive discussions among researchers and practitioners actively working on chatbots. ... Chatbot research is currently evolving within and across a range of disciplines and has a strong ...

  19. Conversational AI: Chatbots

    Chatbots are used in various domains like education, healthcare, business, etc. In the study undertaken, we reviewed several papers & discussed types of chatbots, their advantages & disadvantages. The review suggested that chatbots can be used everywhere because of its accuracy, lack of dependability on human resources & 24x7 accessibility. ...

  20. (PDF) The Impact of Artificial Intelligence on Chatbot Technology: A

    The purpose of this research paper is to explore the current advancements and leadin g innovations in AI -powered chatbot technology and examine their impact on various industries. The st udy aims to

  21. Chatbot for Health Care and Oncology Applications Using Artificial

    The rest of this paper is organized as follows: first, we introduce the developmental progress with a general overview of the architecture, ... Closed domain: responding to complex or specific questions requiring more in-depth research; may be the preferred chatbot type for treatment planning or recommendation.

  22. ChatGPT listed as author on research papers: many scientists ...

    The artificial-intelligence chatbot ChatGPT is disrupting many industries, including academia. ... because editorials go through a different management system from research papers.

  23. Interacting with educational chatbots: A systematic review

    Chatbots hold the promise of revolutionizing education by engaging learners, personalizing learning activities, supporting educators, and developing deep insight into learners' behavior. However, there is a lack of studies that analyze the recent evidence-based chatbot-learner interaction design techniques applied in education. This study presents a systematic review of 36 papers to ...

  24. How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability

    The role-play ability of Large Language Models (LLMs) has emerged as a popular research direction. However, existing studies focus on imitating well-known public figures or fictional characters, overlooking the potential for simulating ordinary individuals. Such an oversight limits the potential for advancements in digital human clones and non-player characters in video games. To bridge this ...