IGI Global

  • Get IGI Global News

US Flag

  • All Products
  • Book Chapters
  • Journal Articles
  • Video Lessons
  • Teaching Cases
  • Share with Librarian
  • Share with Colleague
  • Fair Use Policy
  • Access on Platform

Export Reference

Mendeley

  • e-Journal Collection
  • Education Knowledge Solutions e-Journal Collection
  • Computer Science and IT Knowledge Solutions e-Journal Collection
  • Business Knowledge Solutions e-Journal Collection

AI in Education: A Systematic Literature Review

AI in Education: A Systematic Literature Review

Introduction.

The use of technology in education dates back to the emergence of 1 st generation computers and their subsequent updated versions (Schindler et al., 2017). Teachers were seen using computers in teaching, researching, and recording students’ grades and in doing other things. Similarly, students, among other things, made use of computers in studying, researching, and solving problems. Also, computers have been used as an educational resource (analogous to a library or laboratory), as well as a means for maintaining databases of student information. (Jones, 1985). The use of technology in education is far advanced with the emergence of artificial intelligence (AI); a system where machines are designed to mimic humans. Artificial Intelligence is “ the science and engineering of making intelligent machines ” or “ a machine that behaves in a way that could be considered intelligent if it was a human being .” (Mccarthy, 2007).

This expression Artificial Intelligence (AI) was first coined by John McCarthy at the Dartmouth Artificial intelligence conference in 1956. Leading researchers from different disciplines converged to discuss topics on the abstraction of content from sensory inputs, the relationship of randomness to creative thinking, and others that developed the concept around “thinking machines”. Most participants envisaged the possibility of computers having capabilities to mimic the intelligence of human beings, but their biggest question was how and when it would happen. Currently, Artificial Intelligence is developing and spreading over every part of the world at an alarming rate (Tegmark, 2015). It plays an increasingly important role in our daily life. As the introduction of AI and Machine learning is catching on with many people, its use in different devices, applications, and services are becoming widespread (Zawacki-Richter et al., 2019). Applications such as Google duplex (chat agent that can carry out specific verbal tasks, such as making a reservation or appointment, over the phone) and FaceApp, which uses AI to identify persons that are tagged in other photos in Facebook are some AI applications and services. Other intelligent appliances such as autonomous vacuum cleaners are examples of AI applications. As indicated earlier, the use of AI in education cannot be overemphasized. Yuki and Sophia the humanoid robot are examples of AI applications in education (Retto, 2017).

AI is broadly categorized into two domains: the weak or domain-specific, which focuses on specific problems; and the strong or general with the ability to perform general intelligent actions. (Berker, 2018). Stephen Hawking’s and other researchers have proposed that the use of strong AI may lead to chaos and destruction of mankind, other AI researchers have propounded that the emergence of AI in education might displace teachers. In the context of this paper, we refer to AI as the Soft AI since machines currently have not assumed the capabilities to perform general intelligent actions.

Studies mainly in the Developed Countries have concentrated on challenges in the disruption of AI in Education whiles, the opportunities and benefits of AI in education have received infinitesimal attention. This study is one of the few that provides an integrated overview of the opportunities, benefits, and challenges that Artificial intelligence (AI) adoption presents to the educational discipline. And complementing it with the Technological-Organizational- Environmental (TOE) theoretical framework as a lens in discussing the challenges in AI adoption in Education.

The objective of the study is to analyze the existing state of the art in AI technology in education by investigating the challenges, opportunities, and benefits of adopting AI in education. The study seeks to review relevant studies to understand the current research focus and provide an in-depth understanding of AI technology in education to guide educators and researchers in designing new educational models. The study will also serve as a reference for future research in related works.

This paper is structured as follows: Section 1 presents the introduction and background to the study, followed by section 2 with the state-of-the-art on the types of AI systems in education, the challenges and opportunities, and benefits of AI in education and TOE theoretical framework. Section 3 presents the research methodology for the literature review, then section 4, where discussions of the opportunities, benefit, and challenges of AI adoption based on the literature review will be presented with a discussion of the practical implications of the findings, and finally, section 5, concludes with the future research topic and limitations of the research.

Complete Article List

Artificial Intelligence Applications in K-12 Education: A Systematic Literature Review

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Systematic Review Literature Searching: Can I Use ChatGPT?

  • Author By Veronica Parisi , Alia Galadari and Zahra Mohri
  • Publication date 25th March 2024
  • Categories: 21st-century learners , Asynchronous , Staff , Student
  • Categories: ChatGPT , Systematic Review , teaching dialogue , UCL Changemakers
  • No Comments on Systematic Review Literature Searching: Can I Use ChatGPT?

AI generated image for ChatGPT and systematic reviews

Systematic reviews  (a type of literature review which aims to collect, analyse and summarise all primary evidence on a specific topic) are increasingly becoming a core requirement across many academic programmes. This, paired with the advent of powerful and pervasive generative AI such as ChatGPT , have raised new questions about the feasibility of generative AI to help develop comprehensive database searches for  systematic reviews .

The title of this blog post reflects the inquisitive nature of our project and aims to provide some answers based on our collaborative approach.

We aim to present the initial findings of our ongoing student-staff partnership established as part of UCL ChangeMakers: Teaching Dialogue .

Teaching Dialogue is a new exciting initiative based on Student Reviewers of Teaching Practice and this year's focus is on the "Impact of AI on Education".

Our project aims:

  • to develop a mutual understanding of how generative AI, especially  ChatGPT , may be used for the development of  systematic review  searches,
  • to better understand and build a picture of how students may approach  ChatGPT  for their  systematic review  searching needs.
  • to provide an opportunity for the educators at UCL to learn from students and support them in their use of ChatGPT in performing database searches. 

The video below illustrates our process and initial findings. We hope you find it useful and that it could contribute to further conversations on this very interesting topic.

We are a multi-disciplinary team that consists of the following members: 

  • Ms Veronica Parisi , Training and Clinical Support Librarian, UCL Cruciform Hub 
  • Dr Alia Galadari , UCL Postgraduate Taught Student (MS Aesthetics) 
  • Dr Zahra Mohri , Lecturer and Programme Lead (MS Aesthetics)

Guimarães, N.S., Joviano-Santos, J.V., Reis, M.G. et al. (2024). 'Development of search strategies for systematic reviews in health using ChatGPT: a critical analysis', Journal of Translational Medicine, 22(1). Available at: https://doi.org/10.1186/s12967-023-04371-5 (Accessed: 05/02/2024).

Houston, A.B. & Corrado, E.M. (2023). 'Embracing ChatGPT: Implications of Emergent Language Models for Academia and Libraries', Technical Services Quarterly, 40(2), pp. 76-911. Available at: https://doi.org/10.1080/07317131.2023.2187110 (Accessed: 23/02/2024)

Lund, B.D., Khan, D. and Yuvaraj, M. (2024). 'ChatGPT in medical libraries, possibilities and future directions: An integrative review', Health Information & Libraries Journal, 41(1), pp.4-152. Available at: https://doi.org/10.1111/hir.12518 (Accessed: 05/02/2024).

Parisi, V. & Sutton, A. (2024). How can AI help you with your systematic literature review? Reflections from a two-day seminar. Available at: https://elated-broker-fc3.notion.site/How-can-AI-help-you-with-your-systematic-literature-review-Reflections-from-a-two-day-seminar-0c45dc93910145afaedfb0fceed273e3 (Accessed: 05/02/2024).

Sallam, M. (2023) ‘The Utility of ChatGPT as an Example of Large Language Models in Healthcare Education, Research and Practice: Systematic Review on the Future Perspectives and Potential Limitations’, medRxiv. Available at: https://doi.org/10.1101/2023.02.19.23286155 (Accessed: 28/02/2024).

Sutton, A. & Parisi, V. (2024) ‘ChatGPT: Game-changer or wildcard for systematic searching?’, Health Information & Libraries Journal, 41(1), pp. 1-31. Available at: https://doi.org/10.1111/hir.12517 (Accessed: 13/03/2024).

Share this post:

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Copyright © 2018 UCL
  • Freedom of Information
  • Accessibility
  • Privacy and Cookies
  • Slavery statement
  • Reflect policy

Please check your email to activate your account.

« Go back Accept

Logo UOC

Universitat Oberta de Catalunya

  • Repositori Institucional (O2)
  • Ciencias sociales

ai in education a systematic literature review

Advertisement

Advertisement

Artificial intelligence in online higher education: A systematic review of empirical research from 2011 to 2020

  • Published: 26 February 2022
  • Volume 27 , pages 7893–7925, ( 2022 )

Cite this article

  • Fan Ouyang   ORCID: orcid.org/0000-0002-4382-1381 1 ,
  • Luyi Zheng 1 &
  • Pengcheng Jiao 2  

16k Accesses

108 Citations

4 Altmetric

Explore all metrics

As online learning has been widely adopted in higher education in recent years, artificial intelligence (AI) has brought new ways for improving instruction and learning in online higher education. However, there is a lack of literature reviews that focuses on the functions, effects, and implications of applying AI in the online higher education context. In addition, what AI algorithms are commonly used and how they influence online higher education remain unclear. To fill these gaps, this systematic review provides an overview of empirical research on the applications of AI in online higher education. Specifically, this literature review examines the functions of AI in empirical researches, the algorithms used in empirical researches and the effects and implications generated by empirical research. According to the screening criteria, out of the 434 initially identified articles for the period between 2011 and 2020, 32 articles are included for the final synthesis. Results find that: (1) the functions of AI applications in online higher education include prediction of learning status, performance or satisfaction, resource recommendation, automatic assessment, and improvement of learning experience; (2) traditional AI technologies are commonly adopted while more advanced techniques (e.g., genetic algorithm, deep learning) are rarely used yet; and (3) effects generated by AI applications include a high quality of AI-enabled prediction with multiple input variables, a high quality of AI-enabled recommendations based on student characteristics, an improvement of students’ academic performance, and an improvement of online engagement and participation. This systematic review proposes the following theoretical, technological, and practical implications: (1) the integration of educational and learning theories into AI-enabled online learning; (2) the adoption of advanced AI technologies to collect and analyze real-time process data; and (3) the implementation of more empirical research to test actual effects of AI applications in online higher education.

Similar content being viewed by others

ai in education a systematic literature review

Artificial intelligence in education: Addressing ethical challenges in K-12 settings

Selin Akgun & Christine Greenhow

ai in education a systematic literature review

Systematic review of research on artificial intelligence applications in higher education – where are the educators?

Olaf Zawacki-Richter, Victoria I. Marín, … Franziska Gouverneur

ai in education a systematic literature review

Evolution and Revolution in Artificial Intelligence in Education

Ido Roll & Ruth Wylie

Avoid common mistakes on your manuscript.

1 Introduction

Advances in the Internet, wireless communication, and computing technologies have shed light on educational changes in online higher education, particularly the application of Artificial Intelligence in Education (AIEd) in recent years (Chen et al., 2020a , b ; Ouyang & Jiao,  2021 ). Online and distance learning refers to delivering lectures, virtual classroom meetings, and other teaching materials and activities via the Internet (Harasim, 2000 ; Holmberg, 2005 ). This educational model has been extensively integrated into higher education to transform instruction and learning modes as well as provide fair educational opportunities to online learners (Hu, 2021 ; Liu et al., 2020 ; Mubarak et al., 2020 ; Yang et al., 2014 ). In the online education context, AI applications (e.g. intelligent tutoring systems, teaching robots, learning analytics dashboards, adaptive learning systems) have been used to promote online students’ learning experience, performance, and quality (Chen et al., 2020a , b ; Hinojo-Lucena et al., 2019 ). Varied AIEd techniques (e.g., natural language processing, artificial neural networks, machine learning, deep learning, genetic algorithm) have been implemented to create intelligent learning environments for behavior detection, prediction model building, learning recommendation, etc. (Chen et al., 2020a , b ; Rowe, 2019 ). Overall, the applications of AI systems and technologies have transformed online higher education and have provided opportunities and challenges for improving higher education quality. Previous reviews have provided substantial insight into the AIEd field. For instance, existing review work has summarized the trends of AIEd (Tang et al., 2021 ; Xie et al., 2019 ; Zhai et al., 2021 ), applications (Alyahyan & Düştegör, 2020 ; Hooshyar et al., 2016 ; Liz-Domínguez et al., 2019 ; Shahiri et al., 2015 ), theoretical paradigms (Ouyang & Jiao,  2021 ) and AI roles in education ( Xu & Ouyang,  2021 ). However, there is limited literature reviews that examine the purposes and effects of applying AI techniques in the online higher education context. More important, a major challenge is to gain a deep understanding of the empirical effect of AI applications in online higher education. To achieve this purpose, this systematic review collects, reviews, and summarizes the empirical research of AI in online higher education with particular aims to analyze the application purposes, the AI algorithms used, and effects of AI techniques generated in online higher education.

2 Literature review

AIEd refers to the use of AI technologies or applications in educational settings to facilitate instruction, learning, and decision making processes of stakeholders, such as students, instructors, and administrators (Hwang et al., 2020 ). In online higher education, AI can support instructional design and development by providing automatic learning resources or paths (Christudas et al., 2018 ), offering automatic assessments (Aluthman, 2016 ) or predictions of student performance (Almeda et al., 2018; Moreno-Marcos et al., 2019 ). From the instructional perspective, AI can play the role as a tutor to observe students’ learning processes, analyze their learning performances, and provide chances for instructors to get rid of repetitive and tedious teaching tasks (Chen et al., 2020a , b ; Hwang et al., 2020 ). Moreover, from the learner perspective, One of the crucial objectives of AIEd is to providing personalized learning guidance or support based on students’ learning status, preferences, or personal characteristics (Hwang et al., 2020 ). For instance, AIEd can provide learning materials or paths based on students’ needs (Christudas et al., 2018 ), diagnose students’ strengths, weaknesses, or knowledge gaps (Liu et al., 2017 ), or provide automated feedback and promoting collaboration between students (Aluthman, 2016 ; Benhamdi et al., 2017 ; Zawacki-Richter et al., 2019 ). Furthermore, AIEd can help educational administrators make decisions about course development, pedagogical design and academic transformation. For example, AI algorithm models can mine and analyze available educational data from higher education system database to understand course status, student learning performance, which can help administrators or decision-makers to make changes needed in the course (George & Lal, 2019 ). In summary, AI-enhanced technology has played an essential role in education from the instructor, learner and administrator perspectives, with its potential to open new opportunities and challenges for higher education transformation.

Multiple AI algorithms have been applied in higher education to facilitate automatic recommendation, academic prediction, or assessment. For example, Sequential Pattern Mining (SPM) has been utilized in recommender systems for capturing historical learning sequence patterns in learner interactions with the system and discovering suitable recommendation items for learners’ learning sequences (Romero et al., 2013a ). Evolutionary algorithms such as Genetic Algorithms (GA), Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) have been used for learning content optimization (Christudas et al., 2018 ). Machine Learning (ML) has been used in academic prediction, such as predicting the academic success of students in online courses, whether students would successfully complete their college degree, or predict students’ selection of courses in higher education (Rico-Juan et al. 2019 ). Lykourentzou et al. ( 2009 ) used three machine learning techniques, namely feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP to predict dropout-prone students in early stages of the e-learning course. Moseley and Mead ( 2008 ) used a machine learning technique called decision trees to predict student attrition in higher educational nursing institutions. Natural language processing (NLP) has been used for code detection or emotional analysis. For example, Rico-Juan et al. ( 2019 ) adopted NLP for automatic detection of inconsistencies between numerical scores and textual feedback in peer-assessment process. In summary, different AI algorithms have been used in AIEd to achieve automatic recommendation, academic prediction, or assessment functions in order to improve instruction and learning quality.

Although there are existing systematic reviews on AIEd (e.g., AIEd trends, paradigms, tools, or applications)(Ouyang & Jiao,  2021 ; Tang et al., 2021 ), there is an inadequacy of review work examining AIEd in the higher education context. Among a collection of 37 AIEd review articles published between 2011 and 2021, only 6 review articles focus on higher education, published in the years of 2019 and 2020 (see Fig.  1 ). Among those six review articles, Hinojo-Lucena et al. ( 2019 ) used the bibliometric method to review the applications of AI in higher education. This review analyzed the number of authors, main source titles, organizations, authors, and countries of AIEd in higher education. Zawacki-Richter et al. ( 2019 ) synthesized 146 articles about the application of AI in higher education and concluded four major areas, namely profiling and prediction, intelligent tutoring systems, assessment, and evaluation, as well as adaptive learning systems. However, this review work did not conduct further analysis for examining the effects of AI in online higher education. Moreno-Marcos et al. ( 2018 ) used a systematic literature review to examine the AI models used for performance prediction in MOOCs. This review found that AI algorithms used for prediction were: regression, support vector machines (SVM), decision trees (DTs), random forest (RF), naive Bayes (NB), gradient boosting machine (GBM), neural networks (NN), etc. We concluded that existing literature review work mainly focuses on the application of AIEd in general, and few work specifically focuses on the online higher education. Among those work that focus on AI in online higher education, they mainly focus on describing the applications of AI in a specific educational context (e.g., MOOCs), which resulted in the lack of a holistic picture of the AIEd trends, categorizations, and applications in online higher education.

figure 1

The existing literature review of AIEd

As an effort to further understand AI in online higher education, this systematic review examines empirical research of AI applications in online higher education from the instructional and learning perspective and investigates the functions of AI applications, algorithms used, and the effects of AI on the instruction and learning process within online higher education. To be specific, this review focuses on the following three research questions:

RQ1: What are the functions of AI applications in online higher education?

RQ2: What AI algorithms are used to achieve those functions in online higher education?

RQ3: What are the effects and implications of AI applications on the instruction and learning processes in online higher education?

3 Methodology

The systematic review methodology used in this study is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) principles, which consists of a 27-item checklist and a four-phase flow diagram (Moher et al., 2009 ). The following section will introduce the systematic review procedures.

3.1 Database search

In order to locate the relevant articles, the systematic search was conducted on the following electronic databases: Web of Science, Scopus, ACM, IEEE, Taylor & Francis, Wiley, EBSCO. We selected these databases since they were considered as the major publisher databases (Guan et al., 2020 ). Filters were limited to the time period from January 2011 to December 2020 and were applied to the peer-reviewed and empirical research articles written in English in order to ensure the quality of the review articles. After the screening of the full articles, the snowballing approach was performed based on the guidelines to find the articles that were not extracted by using the search strings (Wohlin, 2014 ). At this stage, Google Scholar was used to searching specific articles.

3.2 Search terms

A structured search strategy was used for various bibliographic databases with keywords used according to each database’s specific requirements. In both the electronic and manual searches, specific keywords related to AI and commonly-used algorithms or techniques (i.e., “intelligence”, “AI”, and “AIEd”), AI applications (i.e., “intelligent tutoring system”, “expert system”, and “prediction model”), and algorithms (i.e., “decision tree”, “machine learning”, “neural network”, “deep learning”, “k-means”, “random forest”, “support vector machines”, “logistic regression”, “fuzzy-logic”, “Bayesian network”, “latent Dirichlet allocation”, “natural language processing”, “genetic algorithm”, and “genetic programming”) were used. In addition, as this review focused on the context in online higher education, the following keywords were added: “online education”, “online learning”, “e-learning”, “MOOC”, “SPOC” “blended learning”, “higher education”, “online higher education”, “undergraduate education”, and “graduated education”.

3.3 Inclusion and exclusion criteria

The search criteria were designed to locate the articles that focused on the applications of AI in online higher education. In terms of the research questions, a set of inclusion and exclusion criteria were adopted (see Table  1 ).

3.4 The screening process

The screening process involved the following steps: (1) removing the duplicated articles, (2) removing the articles that did not meet the inclusion criteria based on the titles and abstracts, (3) reading the full texts again and removing the articles that did not meet the inclusion criteria, (4) using the snowballing approach to further locate the articles in Google Scholar, and (5) extracting data from the final filtered articles (see Fig.  2 ). All articles were imported into the Mendeley software for screening.

The search produced 434 articles from the previously-used search terms, including 92 duplicates that were deleted. By reviewing the titles and abstracts, the number of articles was reduced to 91 based on the criteria (see Table  1 ). The selected articles were examined by the second author to determine whether they were suitable for the purpose of the study. The first author independently reviewed approximately 30% of the articles to confirm the reliability. The inter-rater agreement was initially 90% and then was brought to 100% agreement after discussion. Then, the full-text of articles were reviewed by both authors to verify that the articles met all the criteria for inclusion in the review. Eventually, a total of 32 articles that met the criteria were included in the final systematic review.

figure 2

PRISMA flow chart of study selection process

3.5 Analysis

The articles that met the inclusion criteria were analyzed using the bibliometric analysis approach (Neuendorf & Kumar, 2015 ). We calculated the frequencies for each category of AIEd in online higher education. The qualitative content analysis method was carried out to categorize the articles (Zupic & Čater, 2015 ). We classified the information from the articles relevant to the research questions. Three strategies were used to establish the credibility of the analysis. First, two researchers had ongoing meetings to verify the categories of the reviewed articles (Graneheim & Lundman, 2004 ). Second, detailed explanations of the categories that emerged as findings for each research question were provided in the result section (Hsieh & Shannon, 2005 ). Finally, we provided examples to demonstrate how well the categories represented the data to answer the research questions (Graneheim & Lundman, 2004 ).

Among the 32 empirical articles, 72% of the articles were published after 2016. The articles finally selected were published in 23 different journals. The major countries or areas for the 32 studies were also identified. The most prolific country or area in AIEd research was Spain that had 5 publications (16%), followed by the USA (4 publications, 13%), and Taiwan (4 publications, 13%). Furthermore, three journals were found with more than two relevant articles that met the criteria, including Computers in Human Behavior ( n  = 4, 13%), Computers & Education ( n  = 3, 9%), and Interactive Learning Environments ( n  = 3, 9%) (see Appendix Table  2 ).

4.1 RQ1: What are the functions of AI applications in online higher education?

There are four major functions of AI applications in online higher education: predictions of learning status, performance or satisfaction ( n  = 21, 66%), resource recommendation ( n  = 7, 22%), automatic assessment ( n  = 2, 6%), and improvement of learning experience ( n  = 2, 6%) (see Fig.  3 ).

figure 3

The pie chart of the functions of AI applications in online higher education

4.1.1 Predictions of learning status, performance or satisfaction

The first function of AI application is the prediction of student performances, that is to illustrate student learning status or performance in advance. Among 32 reviewed articles, 21 articles (66%) focused on the predictions of student performance in online higher education context. Further examinations identified three categories: prediction of dropout risks ( n  = 13, 41%), prediction of student academic performances ( n  = 7, 22%), and prediction of student satisfactions about online courses ( n  = 1, 3%) (see Appendix Table  2 ).

In the first category regarding the prediction models for diagnosing the risks of student dropout, Mubarak et al. ( 2020 ) constructed a prediction model to predict the students who were at-risk of dropout based on the interaction logs in the online learning environment. The results showed that the proposed models achieved an accuracy of 84%, which was better than the baseline of machine learning models. Aguiar et al. ( 2014 ) analyzed engineering students’ electronic portfolios to predict their persistence in online courses, and results proved a consistently better performance than those models based on the traditional academic data (SAT scores, GPA, demographics, etc.) alone. The second category is the prediction of student academic performance. For example, Almeda et al. ( 2018 ) used the classification models to predict whether students would succeed in the online courses and further used the regression models to predict students’ numerical scores. One key finding was that the features related to course comments were significant predictors of the final grades. Romero et al. ( 2013b ) collected the forum messages to predict student performance and they found that the students who actively participated in the forum, posted messages more frequently with a high quality were more likely to pass the course. The third category is the prediction of student satisfaction levels about online courses. Only one study was located: Hew et al. ( 2020 ) analyzed the course features of 249 randomly sampled MOOCs and 6,393 students’ perceptions were examined to understand what factors predicted student satisfaction. They found that the course instructor, content, assessment, and time schedule played significant roles in explaining the student satisfaction levels.

4.1.2 Resource recommendation

Among 32 reviewed articles, 7 articles (22%) focused on the resource recommendation in online higher education context. For example, Benhamdi et al. ( 2017 ) designed a recommendation approach to provide online students with the appropriate learning materials based on their preferences, interests, background knowledge, and memory capacities to store information. The results concluded that this recommendation system improved students’ learning quality. Christudas et al. ( 2018 ) used the compatible genetic algorithm (CGA) to provide suitable learning content for individual students based on the preferred learning objects they previously chose. The results showed that the students’ scores and satisfaction levels were improved in an e-learning environment. In the online programming courses, Cárdenas-Cobo et al. ( 2020 ) developed a system called CARAMBA to suggest suitable learning exercises for students in scratch programming. Results confirmed that those exercise recommendations in Scratch had improved the student’s programming capabilities. In summary, AI has been used in online higher education to recommend learners suitable and personalized resources based on learners’ fixed and dynamic characteristics.

4.1.3 Automatic assessment

Among 32 reviewed articles, 2 articles (6%) focused on the automatic assessment in online higher education context. Hooshyar et al. ( 2016 ) developed an ITS system called Tic-tac-toe Quiz for Single-Player (TRIS-Q-SP) to provide students formative assessment of their computer programming performances and problem-solving capacities. The empirical research demonstrated that the proposed system enhanced student learning interests, positive attitudes, degree of technology acceptance, and problem solving activities. Aluthman ( 2016 ) developed the automated essay evaluation (AEE) system to provide students with immediate assessment, feedback, and automated scores in an online English learning environment and examined the effects of utilizing AEE on undergraduate students’ writing performance. The results indicated that the AEE system had a positive effect on improving students’ writing performance. In summary, AI has been used in online higher education to automatically assess students’ performances and learning capacities, to provide timely feedback to students, and to improve students’ self-awareness and self-reflection.

4.1.4 Improvement of learning experience

Among 32 reviewed articles, 2 articles (6%) focused on the optimization of learning experiences by improving learner interactions with learning environments or resources in online higher education context. Ijaz et al. ( 2017 ) created virtual reality (VR) tool that applied the AI technique for history learning. The VR tool allowed students immerse themselves into the virtual environments of cities and learn by browsing and interacting with virtual citizens. The results confirmed that this AI-enabled virtual learning mode was more engaging and motivating for students, compared to simply reading the history texts or watching educational videos. Koć-Januchta et al. ( 2020 ) compared student interaction and learning quality between the designed AI-enriched biology books and traditional e-books. Results found that the students who used the AI-enriched book asked more questions and kept higher retention than those engaged in reading the traditional e-books.

4.2 RQ2: What AI algorithms are used to achieve those functions in online higher education?

Among the 32 reviewed articles, 24 articles specified the AI algorithms they used in the research. Those eight articles used AI systems or tools (e.g., recommendation systems) but did not specify the AI algorithms they used. Among the 24 articles, the most commonly used AI algorithms were DT ( n  = 14, 44%), NN ( n  = 8, 25%), NB ( n  = 7, 22%) and SVM ( n  = 7, 22%). Some research used multiple algorithms in one study (see Fig.  4 and Appendix Table  3 ).

figure 4

The distribution of AI algorithms.  Note:  DT: Decision Tree; NN: Neural Network; SVM: Support Vector Machine; NB: Naive Bayes; RF: Random Forest; LGR: Logistic Regression; LR: Liner Regression; KNN: K-Nearest Neighbours; NLP: Neural Language Processing; BN: Bayes Network; XGBoost: Extreme Gradient Boosting; SVC: Support Vector Classification; Splines: Multivariate Adaptive Regression Splines; SMO: Sequential Minimal Optimizer; RG: Regression; LDA: Linear Discriminant Analysis; IOHMM: Input-Output Hidden Markov Model; GaNB: Gaussian Naive Bayes; GA: Genetic Algorithm; CART: Classification and Regression Tree; MLP: Multi-Layer Perceptron; BART: Bayesian Additive Regressive Trees

The systematic review found that relatively traditional algorithms included DT, LGR, NB, SVM, and NLP. For example, Almeda et al. ( 2018 ) used the J48 DT to predict whether a student would successfully pass the course and further used the regression models to predict student’s numerical grades. Mubarak et al. ( 2020 ) proposed two AI models, namely the Logistic Regression and the input-output hidden Markov model, to predict student dropout risk. Yoo and Kim ( 2014 ) used online discussion participation as the predictor for class project performance and used Support Vector Machine (SVM) for data processing and automatic classifiers. Helal et al. ( 2018 ) created different classification models for predicting student performance, including two black-box methods, namely NB and SMO, an optimization algorithm for training SVM, and two white box methods (i.e., J48 and JRip). Natural language processing (NLP) was applied to automatically assess student performance and to detect student satisfaction. For example, Aluthman ( 2016 ) adopted the NLP techniques to automatically evaluate essays and provided students with both automated scores and immediate feedback. Hew et al. ( 2020 ) used the NLP techniques to identify what students commented to predict students’ satisfaction levels with MOOCs.

Advanced machine learning algorithms such as NN and GA were also used in some research. For example, Sukhbaatar et al. ( 2019 ) employed the NN method to predict student failure tendency based on multiple variables extracted from the online learning activities in a learning management system. Results showed that 25% of the failing students were correctly identified after the first quiz submission; after the mid-term examination, 65% of the failing students were correctly predicted. Yang et al. ( 2017 ) presented a time series NN method for predicting the evolution of student average CFA grade in two MOOCs. Results found that the NN-based algorithms consistently outperformed a baseline model that simply averaged historical CFA data. Christudas et al. ( 2018 ) presented a GA-enable approach for recommending personalized learning content for individual students in the e-learning system and the results found an improvement of students’ final scores in the course.

An important factor in the AI models is the choice of input variables. Primary variables used in the 32 articles included demographic, assignment, previous score, quiz access, forum access, course material access, login behavior, etc. (see Appendix Table  3 ). Demographic variables include students’ gender, age, race, ethnicity information. The assignment category includes whether the assignment is completed and submitted. The previous score refers to the historical grades of students for different courses or learning activities, the standardized high school test scores or continuous assessment activities. The quiz access includes the times of quiz attempt, whether the quiz is complete or not, quiz scores, and solving time. The forum is where student’s ideas on a particular subject have exchange and communicated with their instructor and peers. This category includes the number of forum views, number of forum posts, times of reply, post content, etc. The course material information contains the total time material viewed, and time consumed on materials. Login behaviors mainly include the online learning days, times of access, time consumption per week. Among those variables, forum and course material access were the most commonly used, followed by student scores and login behaviors.

When using AI algorithms, researchers tended to compare the efficiency and effectiveness of using different AI algorithms to address the same research purpose. For example, Moreno-Marcos et al. ( 2018 ) collected the data from a Java programming MOOC to determine what factors affected the predictions and in which way it was possible to predict scores. In this work, four algorithms, namely RG, SVM, DT, and RF, were used and results were compared to identify which one provided the best results. In a blended learning context, Sukhbaatar et al. ( 2019 ) proposed an early prediction scheme to identify the student who was at risk of failing. NN, SVM, DT, and NB methods were compared for the failure prediction in terms of prediction effectiveness. Baneres et al. ( 2019 ) presented an adaptive predictive model called GRADUAL AT-RISK MODEL and developed an early warning system, and an early feedback prediction system to intervene at-risk identification of students. Four classification algorithms, NB, CART DT, KNN, and SV, were tested to determine which classification algorithm best fit the GAR model. In addition, Howard et al. ( 2018 ) examined eight prediction methods, including BART, RF, PCR, Splines, KNN, NN, and SVM, to identify students’ final grades. Huang et al. ( 2020 ) collected the applied eight classifiers based on students’ online learning logs, namely GaNB, SVC, linear-SVC, LR, DT, RF, NN, and XGBoost, to predict the student academic performances. They also employed five evaluators, namely accuracy, precision, recall, the F1-measure, and AUC to measure the predictive performance for classification methods.

4.3 RQ3: What are the effects and implications of AI applications on the instruction and learning processes in online higher education?

The positive effects were identified by using AI applications in online higher education to improve the instruction and learning quality: a high quality of AI-enabled prediction with multiple input variables ( n  = 20, 62%), a high quality of AI-enabled recommendations based on student characteristics ( n  = 5, 16%), an improvement of students’ academic performances ( n  = 5, 16%), an improvement of online engagement and participation ( n  = 2, 6%) (see Fig.  5 and Appendix Table  4 ).

figure 5

The pie chart of the implications of AI applications in online higher education

4.3.1 A high quality of AI-enabled prediction with multiple input variables

Evidence has been reported that students enrolled in online courses have higher dropout rates than those in the traditional classroom settings (Breslow et al., 2013 ; Tyler-Smith, 2006 ). An increase in rates of dropout unavoidably leads to reduce graduation rates, which may have a negative effect on the online learning quality (Simpson, 2018 ).AI applications in online higher education are mainly the prediction models to predict the student’s risk of dropouts and final academic performance. The prediction of student academic performance help identifies the students who have difficulties understanding the course materials or who are at risk of failing the exam (Tomasevic et al., 2020 ). For example, the results obtained by Baneres et al. ( 2019 ) proved that the prediction system achieved early detection of potential at-risk students, offered the guidance and feedback with visualization dashboards, and enhanced the interaction with at-risk students. In this way, the prediction system helped instructors or administrators identify students’ learning issues, assist students in regulating and reflecting on the learning processes, and further provide students with instant intervention and guidance at an early stage during the course (Moreno-Marcos et al., 2018 ).

Articles reviewed in this work indicated a high quality of accuracy of the prediction models that have used multiple input variables and advanced AI algorithms. For example, Aguiar et al. ( 2014 ) found that the performance of the prediction models with ePortfolio data was consistently better than those models based on academic performance data alone. Costa et al. ( 2017 ) predicted students’ academic failure in introductory programming courses based on multiple student data, including age, gender, student registration, semester, campus, year of enrolling in the course, status on discipline, number of correct exercises, and performance of the students etc. Moreover, advanced AI algorithms such as genetic algorithms and input-output hidden Markov model has been applied in predicting systems, which was proved to achieve more accurate results than traditional algorithms (Mubarak et al., 2020 ). Therefore, to achieve an accurate prediction, AI-enabled models should first consider using multiple input variables from students’ learning processes rather than merely using summative performance scores, and second use advanced AI algorithms to achieve precisions of the relations between the learning inputs and performance outputs (Chassignol et al., 2018 ; Godwin & Kirn, 2020 ; Tomasevic et al., 2020 ).

4.3.2 A high quality of AI-enabled recommendations based on student characteristics

A high quality of recommendation requires the algorithm model take into consideration students’ diverse characteristics, such as knowledge levels, learning styles or preferences, learning profiles and interests, etc. Our review showed that five studies related to recommendation have reported that their methods generated a high-quality recommendation for students. For example, Benhamdi et al. ( 2017 ) proposed a new recommendation approach based on collaborative, content-based filtering to provide students the best learning materials according to their preferences, interests, background knowledge, and memory capacity. The experiment results showed there was a significant difference between the marks of pre-tests and post-tests, which indicated that students acquired more knowledge when they used the proposed recommender system. Additionally, Bousbahi & Chorfi ( 2015 ) used the case-based reasoning (CBR) approach and the special retrieval information technique to recommend the most appropriate MOOCs courses that best-suited student needs based on their learning profiles, needs, and knowledge levels. In addition to personalized learning recommendation. Dwivedi & Bharadwaj ( 2015 ) designed an e-Learning recommender system for a group of students by considering students’ learning styles, knowledge levels, and ratings of learners in a group. The results demonstrated the effectiveness of the proposed group recommendation strategy. Although those studies verified the short-term effects of recommendation systems, there is a lack of investigations on the effects of applying the recommendation systems or methods in students’ long-term learning. Future work should introduce both fixed and dynamic characteristics of students, carry out experiments with large sample size, in order to confirm the accuracy of recommendation systems.

4.3.3 An improvement of students’ academic performance

The results indicated that AI systems and tools helped improve students’ academic performances by optimizing learning environments and experiences, recommending learning resources or providing automatic feedback and assessment in online learning. For example. Ijaz et al. ( 2017 ) found that students in the VR context combined with AI techniques performed better in comprehending the materials than the control groups without AI support. Cárdenas-Cobo et al. ( 2020 ) presented an easy-to-use web application called CARAMBA involving Scratch alongside a recommender system for exercises. Results confirmed that, in terms of pass rates, the recommending exercises in Scratch had a positive effect on the student’s programming abilities. The pass rate was over 52%, which was 8% higher than that in the previous exercises with Scratch (without recommendation) and 21% higher than the historical results of traditional programming teaching (without Scratch). Compared to the traditional learning approaches (e.g., reading textbooks), AI can provide students with more intelligent and personalized interaction forms, such that the interaction between students and learning resources and the degree of participation in learning can be improved. More importantly, since improper contents that do not fit students’ learning styles, or their knowledge or ability levels may lead to information overload or lack of learning orientation, which would negatively affect student academic performance (Chen, 2008 ; Christudas et al., 2018 ). AI can optimized personalized resource recommendations based on students’ characteristics, which has been emphasized as a crucial issue in e-learning and online learning (Chang & Ke, 2013 ). In addition, providing automatic feedback was also a good way to improve student academic performance because it could give students personalized diagnoses and suggestions (e.g., Aluthman, 2016 ), that improves student’s learning motivations and effectiveness (Gardner et al., 2002 ; Henly, 2003 ). In conclusion, it can be summarized from the existing research that with the support of AI, student academic performance can be promoted in terms of the final grades, completion rate of course or learning satisfaction levels.

4.3.4 An improvement of online engagement and participation

AI systems or techniques can positively influence student’s online engagement through providing personalized resources, automatic assessment, and timely feedback. For example, Ijaz et al. ( 2017 ) investigated a technological combination of AI-enabled virtual reality with an aim to improve the learning experiences and learner engagement. The results found that compared to simply reading the history texts or watching educational videos, the AI-enabled learning mode was more engaging and motivating for the students. Koć-Januchta et al. ( 2020 ) explored students’ engagement and patterns of activity with AI book and traditional digital E-book. The research collected students’ pre- and post-test scores, cognitive load, motivation, usability questionnaires and interviews. The results found that students who used the AI-enriched books asked more questions and kept higher retention than those engaged in reading the traditional e-books, which indicated the improvement of student engagement. Therefore, the results indicate that AI supported learning has potential to improve students’ online engagement with learning materials, online courses and their peers. Given that online students often have a low level of participation in online learning, which would lead to problems such as dropping out or academic failure, AI technological support has potential to improve students’ online engagement with learning materials, online courses and their peers. This is a great way to improve students’ learning engagement and participation and thus reduce their academic failure to some extent.

5 Discussions and implications

The application of artificial intelligence (AI) has brought new challenges for improving instruction and learning in online higher education. Given that there is limited literature review examining the actual effects of AI in online higher education, it is necessary to gain a deep understanding of the functions, effects, and implications of AI applications in online higher education. Furthermore, there has been a critical gap between what AIEd technologies can do, how they are implemented in authentic online higher education settings, and to what extent the use of AI applications influence actual online instruction and learning (Kabudi et al., 2021 ; Ouyang & Jiao,  2021 ). The systematic literature review specifically focuses on AI applications in online higher education and the results show that performance prediction, resource recommendation, automatic assessment, and improvement of learning experiences are the four main funcitons of AI applications in online higher education. Regarding AI techniques, it is found that the algorithms such as DT, LRG, NN, and BT, are the most commonly adopted in the online educational contexts. Advanced DL algorithms such as GA and DNN have seldom been found, which is consistent with the findings from Zawacki-Richter et al. ( 2019 ) and Chen et al. ( 2020b ). Regarding the actual effects of AI in online higher education, several empirical research have reported positive effects of AI application in improving online instruction and learning quality, including a high quality of AI-enabled prediction, a high quality of AI-enabled recommendations, an improvement of academic performances as well as an improvement of online engagement and participation. Based on the review results, to achieve a high quality of prediction, assessment or recommendation, AI-enabled systems or models should first model take into consideration students’ diverse characteristics from both learning processes and summative performances, and second use advanced AI algorithms to achieve precisions of the outcome in order to improve students’ learning motivation, engagement and performance. With the innovation and advancement of AI technologies and techniques, the applications of AI promote the transformation of higher education from traditional, instructor-directed lecturing to AI-enabled, student-centered learning (Chen et al., 2020a ; Ouyang & Jiao,  2021 ).

Based on the results of this literature review, we propose the theoretical, technological, and practical implications for the applications of AI in online higher education. First, from the theoretical perspective, educational theories have not been adopted to underpin the application of AI in online higher education. Similar to the previous work (Chen et al., 2020b , Ouyang & Jiao,  2021 ; Zawacki-Richter et al., 2019 ), few studies have focused on building the connection between the educational and learning theories and AI-supported online higher education. Although advanced AI technology has the potential to improve online higher education quality (Holmes et al., 2019 ), good educational outcomes do not occur by merely using advanced AI technologies (Castañeda & Selwyn, 2018 ; Du Boulay, 2000 ; Selwyn, 2016 ). More importantly, the use of AI technologies and applications generally imply different pedagogical perspectives, which in turn pose critical influences on the design and implementation of instruction and learning (Hwang et al., 2020 ; Ouyang & Jiao,  2021 ). As Chen et al. ( 2020b ) suggested, social constructivism (Vygotsky, 1978 ), situational theory (Kim & Grunig, 2011 ), distributed cognition (Hollan et al., 2000 ) are worthwhile to be studied while integrating AI applications in online higher education. Ouyang & Jiao ( 2021 ) proposed the three paradigms of AIEd from the theoretical perspective (i.e., AI-directed, learner-as-recipient, AI-supported, learner-as-collaborator, and AI-empowered, learner-as-leader), which could serve as the reference framework to explore varied ways of addressing the learning and instructional issues with the AI applications. In summary, based on the existing educational and learning theories, researchers and practitioners can integrate pedagogy and learning sciences with AIEd applications to derive multiple perspectives and interpretations about AIEd in online higher education (Hwang et al., 2020 ; Hwang & Tu, 2021 ; Ouyang & Jiao,  2021 ).

Second, from the technological perspective, AI technologies, models and applications in online higher education are expected to seek the potential of integrating students’ learning process characteristics with AI, to connect and strengthen interactions between AI and educators and students, and to address the issues regarding the biases in AI algorithms, and non-transparency of why and how AI decisions are made (Hwang et al., 2020 ; Hwang & Tu, 2021 ; Ouyang & Jiao,  2021 ). This review has illustrated the importance of data collection and analysis of learning process data in addition to summative data in order to achieve a high quality of AI-enabled prediction or recommendation. The advancement of emerging computer technologies, such as quantum computing, wearable devices, robot control, and sensing devices, and 5G wireless communication technologies, have provided new affordances and opportunities to integrate AI with collection and analysis of online learning processes in online higher education (Chen et al., 2020b ; Hwang et al., 2020 ). When integrating into online higher education, AI has the potential to provide student with a practical or experiential learning experience, particularly when AI is integrated with other technologies such as VR, 3D, gaming, and simulation, and thereby improving the student learning experience and academic performance. To advance the state-of-the-art of AIEd technologies in online higher education, it is necessary to provide a bridge to facilitate the interaction and collaboration between educators or students with AI systems or tools, which can help obtain a multifaceted understanding of student status and achieve a good learning performance prediction (Giannakos et al., 2019 ). With the support of real-time AI algorithm models, it has the potential to collect and feedback information from human to AI systems in a timely fashion. In this way, AI applications can collect and make sense of the user-generated data to provide a deeper understanding of the real-time interaction between humans and technologies in online higher education (Giannakos et al., 2019 ; Ouyang & Jiao,  2021 ; Xie et al., 2019 ). Future research can consider to develop prediction models that can be used in heterogeneous contexts across platforms, thematic areas, and course durations, This approach has potential to enhance the predictive power of current models by improving algorithms or adding novel higher-order features from students or groups (Moreno-Marcos et al., 2019 ). Overall, since online higher education stresses learner-centered learning, integration of human intelligence and machine intelligence can help AIEd transform from traditional lecturing to learner-centered learning (Ouyang & Jiao,  2021 ).

Third, from the practical perspective, AIEd advancement calls for more empirical research to examine what are the different roles of AI in online higher education, how AI are connected to the existing educational and learning theories, and to what extent the use of AI technologies influence online learning quality (Hwang et al., 2020 ; Kabudi et al., 2021 ; Ouyang & Jiao,  2021 ). As researchers pointed out in a recent literature review, there has been a discrepancy between the potentials of AIEd and their actual implementations in online higher education (Kabudi et al., 2021 ). The discrepancy is caused by a separation of AI technology and the complex educational system (Xu & Ouyang, 2021 ). The review results also show a limited research work on examining the long-term effect and implication of applying AI to improve online instruction and learning. Therefore, AIEd needs to be designed and applied with the awareness that the AI technology is a part of a larger educational system, consisting of multiple components, e.g., learners, instructors, and information and resources (Riedl, 2019 ). To better examine the learning effects, AIEd empirical research should design more comprehensive assessment methods to incorporate various student features (e.g., student motivation, anxiety, higher-order thinking, behavioral patterns) in the AI model, and use multimodal learning analytics to collect and analyze data (e.g., process-oriented discourse data, physiological sensing data, eye-tracking) (Ouyang & Jiao,  2021 ). In addition, the review indicates that most empirical research is conducted in a short period of time duration, therefore more empirical research is needed to enlarge the sample size and experiment duration in order to verify the effects of AI applications in online higher education. Overall, a deep understanding can be achieved by conducting more empirical research to examine the roles of AI in online higher education, educational and learning theories underpinned AIEd, and the actual effects of AI on online learning quality (Gartner, 2019 ; Law, 2019 ; Tegmark, 2017 ).

6 Conclusions

This systematic review provides an overview of empirical research on the applications of AI in online higher education. Specifically, this literature review examines the functions of empirical researches, the algorithms used in empirical researches and the effects and implications generated by empirical research. Although the research and practice of AI applications in online higher education are still in the preliminary and exploratory stage, AI is proved to be positive to enhance online instruction and learning quality by offering accurate prediction, assessment and engaging students with online materials and environments (Yang et al., 2020 ; Zawacki-Richter et al., 2019 ). The innovative applications of AI in online higher education are conducive to reform instructional design and development methods, as well as advance the constructions of the intelligent, networked, personalized, and lifelong educational system (Arsovic & Stefanovic, 2020 ; Ouyang & Jiao,  2021 ; Yang et al., 2020 ).

This systematic review has several limitations, which lead to future research directions. First, the process of search query might not guarantee full completeness and absence of bias. Although we used the keyword list suggested by the previous review studies to search for the relevant articles, not all studies were included as diverse terms have been used to represent AI technologies. Second, the studies reviewed in this article were filtered from the seven prominent databases and the articles were limited to journal articles. For example, the recent conference proceedings were excluded, which may lead to the absence of the latest technical reports of AIEd in online higher education. Since AIEd is an interdisciplinary field where scholars come from different fields particularly computer science and education areas, studies might be published as conference papers that were not included. Therefore, future studies can adjust the screening criteria such that more relevant studies can be included. Third, the current study only provided a systematic overview of AI in online higher education, a formal meta-analysis would be beneficial to report the effect sizes of selected empirical research to gain a deeper understanding of the field.

Critical questions that need to be carefully considered include: How AI algorithms and models can be improved in online higher education? How AI systems or tools should be implemented to improve the instruction and learning practices in online higher education? How to conduct longitudinal empirical research in order to reveal authentic and long-term results of applying AI in online instruction and learning? This systematic review has provided initial implications for those questions, such as taking into consideration students’ diverse characteristics, using advanced AI algorithms to achieve precisions, conducting longitudinal research to examine long-term effect of AI applications. Future work should continue on this research and practice trend. Overall, consistent with previous work (e.g., Deeva et al., 2021 ; Holmes et al., 2019 ; Hwang et al., 2020 ), AIEd applications in online higher education are expected to enable learners to reflect on learning and inform AI systems to adapt accordingly, improve prediction, recommendation and assessment accuracy, and facilitate learner agency, empowerment, and personalization in student-centered learning.

* Reviewed articles ( n  = 32)

* Aguiar, E., Chawla, N. V., Brockman, J., Ambrose, G. A., & Goodrich, V. (2014). Engagement vs performance: using electronic portfolios to predict first semester engineering student retention. Journal of Learning Analytics, 1 (3), 7–33. https://doi.org/10.18608/jla.2014.13.3

* Almeda, M. V., Zuech, J., Utz, C., Higgins, G., Reynolds, R., & Baker, R. S. (2018). Comparing the factors that predict completion and grades among for-credit and open/MOOC students in online learning. Online Learning, 22 (1), 1–18. https://doi.org/10.24059/olj.v22i1.1060

* Aluthman, E. S. (2016). The effect of using automated essay evaluation on esl undergraduate students’ writing skill. International Journal of English Linguistics, 6 (5), 54. https://doi.org/10.5539/ijel.v6n5p54

Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: literature review and best practices. In International Journal of Educational Technology in Higher Education (Vol. 17, Issue 1). Springer. https://doi.org/10.1186/s41239-020-0177-7

Arsovic, B., & Stefanovic, N. (2020). E-learning based on the adaptive learning model:case study in Serbia. Sadhana-Academy Proceedings in Engineering Sciences, 45 (1), 266. https://doi.org/10.1007/s12046-020-01499-8 .

Article   Google Scholar  

* Baneres, D., Rodriguez-Gonzalez, E., M., & Serra, M. (2019). An early feedback prediction system for learners at-risk within a first-year higher education course. IEEE Transactions on Learning Technologies, 12 (2), 249–263. https://doi.org/10.1109/TLT.2019.2912167

* Benhamdi, S., Babouri, A., & Chiky, R. (2017). Personalized recommender system for e-Learning environment. Education and Information Technologies, 22 (4), 1455–1477. https://doi.org/10.1007/s10639-016-9504-y

* Bousbahi, F., & Chorfi, H. (2015). MOOC-Rec: A case based recommender system for MOOCs. Procedia - Social and Behavioral Sciences, 195 , 1813–1822. https://doi.org/10.1016/j.sbspro.2015.06.395

Breslow, L., Pritchard, D. E., DeBoer, J., Stump, G. S., Ho, A. D., & Seaton, D. T. (2013). Studying learning in the worldwide classroom: Research into edX’s first MOOC. Research & Practice in Assessment , 8 , 13–25. Retrieved from https://files.eric.ed.gov/fulltext/EJ1062850.pdf

* Burgos, C., Campanario, M. L., de la Peña, D., Lara, J. A., Lizcano, D., & Martínez, M. A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers and Electrical Engineering, 66 , 541–556. https://doi.org/10.1016/j.compeleceng.2017.03.005

* Cárdenas-Cobo, J., Puris, A., Novoa-Hernández, P., Galindo, J. A., & Benavides, D. (2020). Recommender systems and scratch: An integrated approach for enhancing computer programming learning. IEEE Transactions on Learning Technologies, 13 (2), 387–403. https://doi.org/10.1109/TLT.2019.2901457

Castañeda, L., & Selwyn, N. (2018). More than tools? Making sense of the ongoing digitizations of higher education. International Journal of Educational Technology in Higher Education, 15 (22), https://doi.org/10.1186/s41239-018-0109-y

Chassignol, M., Khoroshavin, A., Klimova, A., & Bilyatdinova, A. (2018). Artificial intelligence trends in education: A narrative overview. 7th International Young Scientist Conference on Computational Science. Procedia Computer Science, 136, 16-24

Chang, T. Y., & Ke, Y. R. (2013). A personalized e-course composition based on a genetic algorithm with forcing legality in an adaptive learning system. Journal of Network and Computer Applications, 36 (1), 533–542. https://doi.org/10.1016/j.jnca.2012.04.002

Chen, C. M. (2008). Intelligent web-based learning system with personalized learning path guidance. Computers & Education, 51 (2), 787–814. https://doi.org/10.1016/j.compedu.2007.08.004

* Chen, W., Niu, Z., Zhao, X., & Li, Y. (2014). A hybrid recommendation algorithm adapted in e-learning environments. World Wide Web, 17 (2), 271–284. https://doi.org/10.1007/s11280-012-0187-z

Chen, X., Xie, H., & Hwang, G. J. (2020a). A multi-perspective study on artificial intelligence in education: grants, conferences, journals, software tools, institutions, and researchers. Computers and Education: Artificial Intelligence , 1 , 100005. https://doi.org/10.1016/j.caeai.2020.100005

Chen, X., Xie, H., Zou, D., & Hwang, G. J. (2020b). Application and theory gaps during the rise of artificial intelligence in education. Computers and Education: Artificial Intelligence, 1 (July), 100002. https://doi.org/10.1016/j.caeai.2020.100002

* Christudas, B. C. L., Kirubakaran, E., & Thangaiah, P. R. J. (2018). An evolutionary approach for personalization of content delivery in e-learning systems based on learner behavior forcing compatibility of learning materials. Telematics and Informatics, 35 (3), 520–533. https://doi.org/10.1016/j.tele.2017.02.004

* Costa, E. B., Fonseca, B., Santana, M. A., de Araújo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73 , 247–256. https://doi.org/10.1016/j.chb.2017.01.047

Deeva, G., Bogdanova, D., Serral, E., Snoeck, M., & De Weerdt, J. (2021). A review of automated feedback systems for learners: Classification framework, challenges and opportunities. Computers & Education, 162 , 104094. https://doi.org/10.1016/j.compedu.2020.104094

Du Boulay, B. (2000). Can we learn from ITSs? In International conference on intelligent tutoring systems (pp. 9–17). Springer. https://link.springer.com/chapter/10.1007/3-540-45108-0_3

* Dwivedi, P., & Bharadwaj, K. K. (2015). E-Learning recommender system for a group of learners based on the unified learner profile approach. Expert Systems, 32 (2), 264–276. https://doi.org/10.1111/exsy.12061

Gardner, L., Sheridan, D., & White, D. (2002). A web-based learning and assessment system to support flexible education. Journal of Computer Assisted Learning, 18, 125e136. https://doi.org/10.1046/j.0266-4909.2001.00220.x

Gartner (2019). Hype cycle for emerging technologies , 2019. Gartner. Retrieved on 2021/1/1  https://www.gartner.com/en/documents/3956015/hype-cycle-for-emerging-technologies-2019

George, G., & Lal, A. M. (2019). Review of ontology-based recommender systems in e-learning. Computers & Education, 142 (July), 103642. https://doi.org/10.1016/j.compedu.2019.103642

Giannakos, M. N., Sharma, K., Pappas, I. O., Kostakos, V., & Velloso, E. (2019). Multimodal data as a means to understand the learning experience. International Journal of Information Management, 48, 108–119. https://doi.org/10.1016/j.ijinfomgt.2019.02.003

Godwin, A., & Kirn, A. (2020). Identity‐based motivation: Connections between first‐year students' engineering role identities and future‐time perspectives. Journal of Engineering Education, 109 (3), 362–383. https://doi.org/10.1002/jee.20324

Graneheim, U. H., & Lundman, B. (2004). Qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness. Nurse Education Today, 24 (2), 105e112. https://doi.org/10.1016/j.nedt.2003.10.001

Guan, C., Mou, J., & Jiang, Z. (2020). Artificial intelligence innovation in education: A twenty-year data-driven historical analysis. International Journal of Innovation Studies, 4 (4), 134–147. https://doi.org/10.1016/j.ijis.2020.09.001

Harasim, L. (2000). Shift happens: Online education as a new paradigm in learning. The Internet and higher education, 3 (1–2), 41–61. https://doi.org/10.1016/S1096-7516(00)00032-4

* Helal, S., Li, J., Liu, L., Ebrahimie, E., Dawson, S., Murray, D. J., & Long, Q. (2018). Predicting academic performance by considering student heterogeneity. Knowledge-Based Systems, 161 (July), 134–146. https://doi.org/10.1016/j.knosys.2018.07.042

Henly, D. C. (2003). Use of web-based formative assessment to support student learning in a metabolism/nutrition unit. European Journal of Dental Education, 7 (3), 116e122. https://doi.org/10.1034/j.1600-0579.2003.00310.x

* Hew, K. F., Hu, X., Qiao, C., & Tang, Y. (2020). What predicts student satisfaction with MOOCs: A gradient boosting trees supervised machine learning and sentiment analysis approach. Computers & Education, 145, 103724. https://doi.org/10.1016/j.compedu.2019.103724

Hinojo-Lucena, F. J., Aznar-Díaz, I., Cáceres-Reche, M. P., & Romero-Rodríguez, J. M. (2019). Artificial intelligence in higher education: A bibliometric study on its impact in the scientific literature. Education Sciences, 9 (1), 51. https://doi.org/10.3390/educsci9010051

Hollan, J., Hutchins, E., & Kirsh, D. (2000). Distributed cognition: Toward a new foundation for human-computer interaction research. ACM Transactions on Computer-Human Interaction (TOCHI), 7 (2).  https://doi.org/10.1145/353485.353487

Holmberg, B. (2005). Theory and practice of distance education . Routledge

Holmes, W., Bialik, M., & Fadel, C. (2019). Artifificial intelligence in education: Promises and implications for teaching and learning . Center for Curriculum Redesign

Google Scholar  

* Hooshyar, D., Ahmad, R. B., Yousefi, M., Fathi, M., Horng, S. J., & Lim, H. (2016). Applying an online game-based formative assessment in a flowchart-based intelligent tutoring system for improving problem-solving skills. Computers & Education, 94, 18–36. https://doi.org/10.1016/j.compedu.2015.10.013

* Howard, E., Meehan, M., & Parnell, A. (2018). Contrasting prediction methods for early warning systems at undergraduate level. The Internet and Higher Education, 37, 66–75. https://doi.org/10.1016/j.iheduc.2018.02.001

Hsieh, H. F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative Health Research, 15 (9), 1277–1288. https://doi.org/10.1177/1049732305276687

Hu, Y. H. (2021). Effects and acceptance of precision education in an AI-supported smart learning environment. Education and Information Technologies . https://doi.org/10.1007/s10639-021-10664-3

* Hu, Y., Lo, C., & Shih, S. (2014). Developing early warning systems to predict students’ online learning performance. Computers in Human Behavior, 36, 469–478. https://doi.org/10.1016/j.chb.2014.04.002

* Huang, A. Y. Q., Lu, O. H. T., Huang, J. C. H., Yin, C. J., & Yang, S. J. H. (2020). Predicting students’ academic performance by using educational big data and learning analytics: evaluation of classification methods and learning logs. Interactive Learning Environments, 28 (2), 206–230. https://doi.org/10.1080/10494820.2019.1636086

Hwang, G. J., & Tu, Y. F. (2021). Roles and research trends of artificial intelligence in mathematics education: A bibliometric mapping analysis and systematic review. Mathematics . https://doi.org/10.3390/math9060584

Hwang, G. J., Xie, H., Wah, B. W., & Gašević, D. (2020). Vision, challenges, roles and research issues of artificial intelligence in education. Computers and Education: Artificial Intelligence, 1, 100001. https://doi.org/10.1016/j.caeai.2020.100001

* Ijaz, K., Bogdanovych, A., & Trescak, T. (2017). Virtual worlds vs books and videos in history education. Interactive Learning Environments, 25 (7), 904–929. https://doi.org/10.1080/10494820.2016.1225099

* Jayaprakash, S. M., Moody, E. W., Eitel, J. M., Regan, J. R., & Baron, J. D. (2014). Early alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1, 6–47. https://doi.org/10.18608/jla.2014.11.3

Kabudi, T., Pappas, I., & Olsen, D. H. (2021). AI-enabled adaptive learning systems: A systematic mapping of the literature. Computers and Education: Artificial Intelligence, 2, 100017. https://doi.org/10.1016/j.caeai.2021.100017

Kim, J. N., & Grunig, J. E. (2011). Problem solving and communicative action: A situational theory of problem solving. Journal of Communication, 61 (1), 120–149. https://doi.org/10.1111/j.1460-2466.2010.01529.x

* Koć-Januchta, M. M., Schönborn, K. J., Tibell, L. A. E., Chaudhri, V. K., & Heller, H. C. (2020). Engaging with biology by asking questions: Investigating students’ interaction and learning with an artificial intelligence-enriched textbook. Journal of Educational Computing Research, 58 (6), 1190–1224. https://doi.org/10.1177/0735633120921581

Law, N. W. Y. (2019). Human development and augmented intelligence. In The 20th international conference on artificial intelligence in education (AIED 2019) . Springer. Retrieved on 2021/1/1 from https://www.sciencedirect.com/science/refhub/S2666-920X(21)00014-undefined/sref31

* Li, J., Chang, Y., Chu, C., & Tsai, C. (2012). Expert systems with applications a self-adjusting e-course generation process for personalized learning. Expert Systems With Applications, 39 (3), 3223–3232. https://doi.org/10.1016/j.eswa.2011.09.009

Liu, M., Kang, J., Zou, W., Lee, H., Pan, Z., & Corliss, S. (2017). Using data to understand how to better design adaptive learning. Technology Knowledge and Learning, 22 (3), 271–298. https://doi.org/10.1007/s10758-017-9326-z

Liu, S., Guo, D., Sun, J., Yu, J., & Zhou, D. (2020). MapOnLearn: The use of maps in online learning systems for education sustainability. Sustainability, 12 (17), 7018. https://doi.org/10.3390/su12177018

Liz-Domínguez, M., Caeiro-Rodríguez, M., Llamas-Nistal, M., & Mikic-Fonte, F. A. (2019). Systematic literature review of predictive analysis tools in higher education. Applied Sciences (Switzerland), 9 (24). MDPI AG. https://doi.org/10.3390/app9245569

Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., & Loumos, V. (2009). Dropout prediction in e-learning courses through the combination of machine learning techniques. Computers & Education, 53 (3), 950–965. https://doi.org/10.1016/j.compedu.2009.05.010

Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., Prisma Group. (2009). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS medicine, 6 (7), e1000097. https://doi.org/10.1371/journal.pmed.1000097.t001

Moreno-Marcos, P. M., Muñoz-Merino, P. J., Alario-Hoyos, C., Estévez-Ayres, I., & Delgado Kloos, C. (2018). Analysing the predictive power for anticipating assignment grades in a massive open online course. Behaviour & Information Technology, 37 (10–11), 1021–1036. https://doi.org/10.1080/0144929X.2018.1458904

Moreno-Marcos, P. M., Alario-Hoyos, C., Munoz-Merino, P. J., & Kloos, C. D. (2019). Prediction in MOOCs: A Review and Future Research Directions. IEEE Transactions on Learning Technologies, 12 (3), 384–401. https://doi.org/10.1109/TLT.2018.2856808

Moseley, L. G., & Mead, D. M. (2008). Predicting who will drop out of nursing courses: a machine learning exercise. Nurse Education Today, 28 (4), 469–475. https://doi.org/10.1016/j.nedt.2007.07.012

* Mubarak, A. A., Cao, H., & Zhang, W. (2020). Prediction of students’ early dropout based on their interaction logs in online learning environment. Interactive Learning Environments . https://doi.org/10.1080/10494820.2020.1727529

Neuendorf, K. A., & Kumar, A. (2015). Content analysis. The international Encyclopedia of Political Communication , 1-10. https://doi.org/10.1002/9781118541555.wbiepc065

Ouyang, F. & Jiao, P. (2021). Artificial Intelligence in Education: The Three Paradigms. Computers & Education: Artificial Intelligence , 100020. https://doi.org/10.1016/j.caeai.2021.100020

Rico-Juan, J. R., Gallego, A. J., & Calvo-Zaragoza, J. (2019). Automatic detection of inconsistencies between numerical scores and textual feedback in peer-assessment processes with machine learning. Computers & Education, 140, 103609. https://doi.org/10.1016/j.compedu.2019.103609

Riedl, M. O. (2019). Human-centered artificial intelligence and machine learning. Human Behavior and Emerging Technologies, 1 (1), 33–36. https://doi.org/10.1002/hbe2.117

Rowe, M. (2019). Shaping our algorithms before they shape us. Artificial Intelligence and Inclusive Education (pp. 151–163). Springer. https://doi.org/10.1007/978-981-13-8161-4_9

Romero, C., Espejo, P. G., Zafra, A., Romero, J. R., & Ventura, S. (2013). Web usage mining for predicting final marks of students that use Moodle courses. Computer Applications in Engineering Education, 21 (1), 135–146. https://doi.org/10.1002/cae.20456

* Romero, C., López, M. I., Luna, J. M., & Ventura, S. (2013). Predicting students’ final performance from participation in on-line discussion forums. Computers & Education, 68, 458–472. https://doi.org/10.1016/j.compedu.2013.06.009

Selwyn, N. (2016). Is technology good for education?  Polity Press. Retrieved on 2021/1/10 from http://au.wiley.com/WileyCDA/WileyTitle/productCd-0745696465.html

Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A review on predicting student’s performance using data mining techniques. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157

Simpson, O. (2018). Supporting students in online, open and distance learning (1st ed.). Routledge

Book   Google Scholar  

* Sukhbaatar, O., Usagawa, T., & Choimaa, L. (2019). An artificial neural network based early prediction of failure-prone students in blended learning course. International Journal of Emerging Technologies in Learning, 14 (19), 77–92. https://doi.org/10.3991/ijet.v14i19.10366

Tang, K. Y., Chang, C. Y., & Hwang, G. J. (2021). Trends in artificial intelligence supported e-learning: A systematic review and co-citation network analysis (1998-2019). Interactive Learning Environments . https://doi.org/10.1080/10494820.2021.1875001

Tegmark, M. (2017). Life 3.0: Being human in the age of artificial intelligence . (Knopf)

Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & Education, 143, 103676. https://doi.org/10.1016/j.compedu.2019.103676

Tyler-Smith, K. (2006). Early attrition among first time eLearners: A review of factors that contribute to drop-out, withdrawal and non-completion rates of adult learners undertaking eLearning programmes. Journal of Online Learning and Teaching , 2 (2), 73–85. Retrieved on 2021/1/11 from https://jolt.merlot.org/documents/Vol2_No2_TylerSmith_000.pdf

Vygotsky, L. (1978). Mind in society: The development of higher psychological processes . Harvard University Press

* Wakelam, E., Jefferies, A., Davey, N., & Sun, Y. (2020). The potential for student performance prediction in small cohorts with minimal available attributes. British Journal of Educational Technology, 51 (2), 347–370. https://doi.org/10.1111/bjet.12836

Wohlin, C. (2014). Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th international conference on evaluation and assessment in software engineering (pp. 1-10). https://doi.org/10.1145/2601248.2601268

Xie, H., Chu, H. C., Hwang, G. J., & Wang, C. C. (2019). Trends and development in technology-enhanced adaptive/personalized learning: A systematic review of journal publications from 2007 to 2017. Computers & Education, 140, 103599. https://doi.org/10.1016/j.compedu.2019.103599

* Xing, W., Chen, X., Stein, J., & Marcinkowski, M. (2016). Computers in human behavior temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Computers in Human Behavior, 58, 119–129. https://doi.org/10.1016/j.chb.2015.12.007

Xu, W. & Ouyang, F. (2021). A systematic review of AI role in the educational system based on a proposed conceptual framework. Education and Information Technologies. https://doi.org/10.1007/s10639-021-10774-y

* Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98 (April), 166–173. https://doi.org/10.1016/j.chb.2019.04.015

* Yang, T., Brinton, C. G., & Joe-wong, C. (2017). Behavior-based grade prediction for MOOCs via time series neural networks. IEEE Journal of Selected Topics in Signal Processing, 11 (5), 716–728. https://doi.org/10.1109/JSTSP.2017.2700227

Yang, Y. T. C., Gamble, J. H., Hung, Y. W., & Lin, T. Y. (2014). An online adaptive learning environment for critical-thinking-infused English literacy instruction. British Journal of Educational Technology, 45 (4), 723–747. https://doi.org/10.1111/bjet.12080

Yang, C., Huan, S., & Yang, Y. (2020). A practical teaching mode for collegessupported by Artificial Intelligence. International Journal of Emerging Technologies in Learning, 15 (17), 195–206. https://doi.org/10.3991/ijet.v15i17.16737

* Yoo, J., & Kim, J. (2014). Project performance? Investigating the roles of linguistic features and participation patterns. International Journal of Artificial Intelligence in Education, 24, 8–32. https://doi.org/10.1007/s40593-013-0010-8

Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where are the educators? International Journal of Educational Technology in Higher Education, 16 (1), 39. https://doi.org/10.1186/s41239-019-0171-0

Zhai, X., Chu, X., Chai, C. S., Siu, M., Jong, Y., Istenic, A. … Li, Y. (2021). A Review of Artificial Intelligence (AI) in Education from 2010 to 2020 . 2021 . Complexity, 8812542. https://doi.org/10.1155/2021/8812542

* Zohair, L. M. (2019). Prediction of student’s performance by modelling small dataset size. International Journal of Educational Technology in Higher Education , 16 (1). https://doi.org/10.1186/s41239-019-0160-3

Zupic, I., & Čater, T. (2015). Bibliometric methods in management and organization. Organizational Research Methods, 18 (3), 429–472. https://doi.org/10.1177/1094428114562629

Download references

Acknowledgements

This work is financially supported by the National Natural Science Foundation of China, No. 62177041.

Author information

Authors and affiliations.

College of Education, Zhejiang University, Hangzhou, Zhejiang, 310000, China

Fan Ouyang & Luyi Zheng

Ocean College, Zhejiang University, Zhoushan, Zhejiang, 316021, China

Pengcheng Jiao

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Fan Ouyang .

Ethics declarations

Conflict of interest.

There is no conflict of interest to declare.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Tables 2 , 3 and 4

Rights and permissions

Reprints and permissions

About this article

Ouyang, F., Zheng, L. & Jiao, P. Artificial intelligence in online higher education: A systematic review of empirical research from 2011 to 2020. Educ Inf Technol 27 , 7893–7925 (2022). https://doi.org/10.1007/s10639-022-10925-9

Download citation

Received : 05 October 2021

Accepted : 27 January 2022

Published : 26 February 2022

Issue Date : July 2022

DOI : https://doi.org/10.1007/s10639-022-10925-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial Intelligence in Education
  • Systematic review
  • Online higher education
  • Online learning
  • Empirical research
  • Find a journal
  • Publish with us
  • Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 09 April 2024

Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis

  • Isabelle Krakowski 1 , 2 , 3   na1 ,
  • Jiyeong Kim   ORCID: orcid.org/0000-0002-2869-5751 1 , 3   na1 ,
  • Zhuo Ran Cai 1 , 3 ,
  • Roxana Daneshjou   ORCID: orcid.org/0000-0001-7988-9356 4 ,
  • Jan Lapins 5 ,
  • Hanna Eriksson 2 , 6 ,
  • Anastasia Lykou 7 &
  • Eleni Linos   ORCID: orcid.org/0000-0002-5856-6301 1 , 3  

npj Digital Medicine volume  7 , Article number:  78 ( 2024 ) Cite this article

2 Altmetric

Metrics details

  • Skin cancer
  • Skin manifestations

The development of diagnostic tools for skin cancer based on artificial intelligence (AI) is increasing rapidly and will likely soon be widely implemented in clinical use. Even though the performance of these algorithms is promising in theory, there is limited evidence on the impact of AI assistance on human diagnostic decisions. Therefore, the aim of this systematic review and meta-analysis was to study the effect of AI assistance on the accuracy of skin cancer diagnosis. We searched PubMed, Embase, IEE Xplore, Scopus and conference proceedings for articles from 1/1/2017 to 11/8/2022. We included studies comparing the performance of clinicians diagnosing at least one skin cancer with and without deep learning-based AI assistance. Summary estimates of sensitivity and specificity of diagnostic accuracy with versus without AI assistance were computed using a bivariate random effects model. We identified 2983 studies, of which ten were eligible for meta-analysis. For clinicians without AI assistance, pooled sensitivity was 74.8% (95% CI 68.6–80.1) and specificity was 81.5% (95% CI 73.9–87.3). For AI-assisted clinicians, the overall sensitivity was 81.1% (95% CI 74.4–86.5) and specificity was 86.1% (95% CI 79.2–90.9). AI benefitted medical professionals of all experience levels in subgroup analyses, with the largest improvement among non-dermatologists. No publication bias was detected, and sensitivity analysis revealed that the findings were robust. AI in the hands of clinicians has the potential to improve diagnostic accuracy in skin cancer diagnosis. Given that most studies were conducted in experimental settings, we encourage future studies to further investigate these potential benefits in real-life settings.

Similar content being viewed by others

ai in education a systematic literature review

Deep learning-aided decision support for diagnosis of skin disease across skin tones

Matthew Groh, Omar Badri, … Rosalind Picard

ai in education a systematic literature review

Human–computer collaboration for skin cancer recognition

Philipp Tschandl, Christoph Rinner, … Harald Kittler

ai in education a systematic literature review

Systematic review of deep learning image analyses for the diagnosis and monitoring of skin disease

Shern Ping Choy, Byung Jin Kim, … Satveer K. Mahil

Introduction

As a result of increasing data availability and computational power, artificial intelligence (AI) algorithms—have reached a level of sophistication that enables them to take on complex tasks previously only conducted by human beings 1 . Several AI algorithms are now approved by the United States Food and Drug Administration (FDA) for medical use 2 , 3 , 4 . Though there are currently no image-based dermatology AI applications that have FDA approval, several are in development 2 .

Skin cancer diagnosis relies heavily on the interpretation of visual patterns, making it a complex task that requires extensive training in dermatology and dermatoscopy 5 , 6 . However, AI algorithms have been shown to accurately diagnose skin cancers, even outperforming experienced dermatologists in image classification tasks in constrained settings 7 , 8 , 9 . However, these algorithms can be sensitive to data distribution shifts. Therefore, AI-human partnerships could provide performance improvements that surmount the limitations of both human clinicians or AI alone. Notably, Tschandl et al. demonstrated in their 2020 paper that the accuracy of clinicians supported by AI algorithms surpassed that of either clinicians or AI algorithms working separately 10 . This approach of an AI-clinician partnership is considered the most likely clinical use of AI in dermatology, given the ethical and legal concerns of automated diagnosis alone. Therefore, there is an urgent need to better understand how the use of AI by clinicians affects decision making 11 . The goal of this study was to evaluate the diagnostic accuracy of clinicians with vs. without AI assistance using a systematic review and meta-analysis of the available literature.

Literature search and screening

For this systematic review and meta-analysis, 2983 records were initially retrieved, of which, 1972 abstracts were screened after the automatic duplicate removal by Covidence (Fig. 1 ). As 1936 articles were considered irrelevant and further excluded, the full text of 36 articles was reviewed. A total of 12 studies were included in the systematic review 10 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 and ten studies were included in the meta-analysis 10 , 12 , 13 , 14 , 15 , 17 , 19 , 20 , 21 , 22 , whereas the information needed to create contingency tables of AI-assisted and un-assisted medical professionals was unavailable in two studies 16 , 18 .

figure 1

Flow diagram of the study selection process.

Study characteristics

Tables 1 and 2 presents the characteristics of the included studies. Half of the studies were conducted in Asia (50%, South Korea=5, China=1) and the other half was done in North/South America (25%, USA = 1, Argentina=1, Chile=1), and Europe (25%, Austria=1, Germany=1, Switzerland=1). More studies were performed in experimental (67%, n  = 8) than clinical settings (33% n  = 4). A quarter of studies included only dermatologists (25%, n  = 3), more than a half (58%, n  = 7) included a combination of dermatology specialists (e.g., dermatologist and dermatology residents) and non-dermatology medical professionals (e.g., primary care physicians, nurse practitioners, medical students) and among these, two studies included lay persons, but this data was not included for meta-analysis. In two studies (17%), only non-dermatology medical professionals were included. The median number of study participants was 18.5, ranging from 7 to 302.

Clinical information was provided to study participants in addition to images or in-patient visits in half of the studies (50%, n  = 6). For diagnosis, outpatient clinical images were most frequently provided (42%, n  = 5), followed by dermoscopic images (33%, n  = 4) and in-patient visits (25%, n  = 3). Diagnostic task was either choosing the most likely diagnosis (58%, n  = 7) or rating the lesion as malignant vs. benign (42%, n  = 5). Most studies (75%, n  = 9) used a paired design with the same reader diagnosing the same case first without, then with AI assistance, whereas two studies provided different images between the two tasks. A fully crossed design (i.e., all readers diagnosing all cases in both modalities) was performed in four studies. One study only reported diagnosis with AI support, thus did not allow to analyze the effect of AI 16 . Studies included a reference standard that was either varying combinations of either histopathology, a dermatologist panel’s diagnosis or the treating physician, from medical records, clinical follow-up or in vivo confocal microscopy (75%, n  = 9) or histopathologic diagnosis on all images (17%, n  = 2). One study considered either histopathology or the study participant being in concordance with two AI tools that were studied as the reference standard 17 . Most AI algorithms did not provide explanation for their outputs or presentation beyond the top-1 or top-3 diagnoses along with their respective probabilities or a binary malignancy score. Content-based image retrieval (CBIR) was the only explainability method that was used, namely in two of the studies (17%) and Tschandl et al. 10 was the only study that delved into the effects of various representation of AI output on the diagnostic performance of physicians. Definition of target condition varied across studies, but all studies included at least one skin cancer among the differential diagnoses. The summary of methodological quality assessments can be found in Supplementary Table 1 . Although κ was low (κ = 0.33), the Bowker’s Test of Symmetry 23 was not significant, hence two raters were considered having the same propensity to select categories. All three assessors agreed with the final quality assessments.

Meta-analyses results

The summary estimate of sensitivity for clinicians overall was 74.8% (95% CI 68.6–80.1) and specificity 81.5% (73.9–87.3). The overall diagnostic accuracy increased with AI assistance to a pooled sensitivity and specificity of 81.1% (74.4–86.5) and 86.1% (79.2–90.9), respectively. The SROC curves and forest plots of ten studies for clinicians without vs. with AI assistance each are shown in Figs. 2 and 3 , respectively, where less heterogeneity is observed in the sensitivity of clinicians overall compared to clinicians with AI assistance.

figure 2

SE sensitivity, SP specificity. Performance of clinicians with no AI assistance ( a ) compared to AI-assisted clinicians ( b ) in the included studies.

figure 3

Forest plots. Meta-analysis results of the diagnostic performance of clinicians without ( a ) or with ( b ) AI assistance.

To investigate the effect of AI assistance in more detail, we conducted subgroup analyses based on clinical experience level, test task and image type (Table 3 ). We observed that dermatologists had the highest diagnostic accuracy in terms of sensitivity and specificity. Residents (including dermatology residents and interns) were the second most accurate group, followed by non-dermatologists (including primary care providers, nurse practitioners and medical students). Notably, AI assistance significantly improved the sensitivity and specificity of all groups of clinicians. The non-dermatologist group appeared to benefit the most from AI assistance regarding improvement of pooled sensitivity (+13 points) and specificity (+11 points). For classification task, the sensitivity of both binary classification (malignant vs. benign) and top diagnosis improved with AI assistance. Meanwhile, AI assistance significantly improved pooled specificity only for top classification, reaching a specificity of 88.8%, (86.5–90.8). No significant difference was observed for image type.

There was no evidence of a small-study effect in regression test asymmetry for both humans without ( p  = 0.33) and with AI assistance ( p  = 0.23). Please see Supplementary Fig. 1 for funnel plots. The Spearman correlation test found that the presence of positive threshold effect was low likely for both groups. Sensitivity analyses revealed that excluding outliers slightly increased the pooled sensitivity and specificity in both groups while the pooled sensitivity and specificity mostly remained unchanged when excluding the low-quality study (Supplementary Table 2 ).

This systematic review and meta-analysis included 12 studies and 67,700 diagnostic evaluations of potential skin cancer by clinicians with and without AI assistance. Our findings highlight the potential of AI-assisted decision-making in skin cancer diagnosis. All clinicians, regardless of their training level, showed improved diagnostic performance when assisted by AI algorithms. The degree of improvement, however, varied across specialties, with dermatologists exhibiting the smallest increase in diagnostic accuracy and non-dermatologists, including primary care providers, demonstrating the largest improvement. These results suggest that AI assistance may be especially beneficial for clinicians without extensive training in dermatology. Given that many dermatological AI devices have recently obtained regulatory approval in Europe, including some CE marked algorithms utilized in the analyzed studies 24 , 25 , AI assistance may soon be a standard part of a dermatologist’s toolbox. It is therefore important to better understand the interaction between human and AI in clinical decision-making.

While several studies have been conducted to evaluate the dermatologic use of new AI tools, our review of published studies found that most have only compared human clinician performance with that of AI tools, without considering how clinicians interact with these tools. Two of the studies in this systematic review and meta-analysis reported that clinicians perform worse when the AI tool provides incorrect recommendations 10 , 19 . This finding underscores the importance of accurate and reliable algorithms in ensuring that AI implementation enhances clinical outcomes, and highlights the need for further research to validate AI-assisted decision-making in medical practice. Notably, in a recent study by Barata et al. 26 , the authors demonstrated that a reinforcement learning model that incorporated human preferences outperformed a supervised learning model. Furthermore, it improved the performance of participating dermatologists in terms of both diagnostic accuracy and optimal management decisions of potential skin cancer when compared to either a supervised learning model or no AI assistance at all. Hence, the development of algorithms in collaboration with clinicians appears to be important for optimizing clinical outcomes.

Only two studies explored the impact of one explainability technique (CBIR) on physician’s diagnostic accuracy or perceived usefulness. The real clinical utility of explainability methods needs to be further examined, and current methods should be viewed as tools to interrogate and troubleshoot AI models 27 . Additionally, prior research has shown that human behavioral traits can affect trust and reliance on AI assistance in general 28 , 29 . For example, a clinician’s perception and confidence in the AI’s performance on a given task may influence whether they decide to incorporate AI advice in their decision 30 . Moreover, research has also shown that the human’s confidence in their decision, the AI’s confidence level, and whether the human and AI agree all influence if the human incorporates the AI’s advice 30 . To ensure that AI assistance supports and improves diagnostic accuracy, future research should investigate how factors such as personality traits 29 , cognitive style 28 and cognitive biases 31 affect diagnostic performance in real clinical situations. Such research would help inform the integration of AI into clinical practice.

Our findings suggest that AI assistance may be particularly beneficial for less experienced clinicians, consistent with prior studies of human-AI interaction in radiology 32 . This highlights the potential of AI assistance as an educational tool for non-dermatologists and for improving diagnostic performance in settings such as primary care or for dermatologists in training. In a subgroup analysis, we observed no significant difference between AI-assisted other medical professionals vs. unassisted dermatologists (data not shown). However, this area warrants further research.

Some limitations need to be considered when interpreting the findings. First, among the ten studies that provided sufficient data to conduct meta-analysis, there were differences in design, number and experience level of participants, target condition definition, classification task, and algorithm output and training. Taken together, this heterogeneity implies that direct comparisons should be interpreted carefully. Furthermore, caution is warranted for the interpretation of the subgroup analyses due to the small sample size of the subgroups (up to seven) and the data structure (i.e., repeated measures) since the same participants examined the clinical images both without and with AI assistance in most studies. Given the low number of studies, we refrained from performing further subgroup analyses, such as, comparing specific cancer diagnoses in the subset of articles where these are available. Despite these limitations, our results from this meta-analysis support the notion that AI assistance can yield a positive effect on clinician diagnostic performance. We were able to adjust for potential sources of heterogeneity, including diagnostic task and clinician experience level when comparing the diagnostic accuracy of clinicians with vs. without AI assistance. Moreover, no signs of publication bias and low likelihood of threshold effects were observed. Lastly, the findings were robust such that the pooled sensitivity and specificity nearly stayed the same after excluding outliers or low-quality studies.

Of note, few studies provided participating clinicians with both clinical data and dermoscopic images, which would be available in a real-life clinical situation. Previous research has shown that the use of dermoscopy enables a relative improvement of diagnostic accuracy of melanoma by almost 50% compared to the naked eye 5 . In one of such study, participants were explicitly not allowed to use dermoscopy during the patient examination 19 . Overall, only four studies were conducted in a prospective clinical setting, and three of these could be included for meta-analysis. Thus, most diagnostic ratings in this meta-analysis were made in experimental settings and do not necessarily reflect the decisions made in a clinical real-world situation.

One of the main concerns regarding the accuracy of AI tools rely on the quality of the data it has been trained on 33 . As only three studies used publicly available datasets, evaluation of the data quality is difficult. Furthermore, darker skin tones were underrepresented in the datasets of the included studies, which is a known problem in the field, as most papers do not report skin tone outputs 34 . However, datasets with diverse skin tones have been developed and made publicly available as an effort to reduce disparity in AI performance in dermatology 35 , 36 . Moreover, few studies provided detailed information about the origin and number of images that had been used for training, validation, and testing of the AI tool and different definitions of these terms were used across studies. There is a need for better transparency guidelines for AI tool reporting to enable users and readers to understand the limits and capabilities of these diagnostic tools. Efforts are being made to develop guidelines that are adapted for this purpose, including the STARD-AI 37 , TRIPOD-AI and, PROBAST-AI 38 guidelines, as well as the dermatology-specific CLEAR Derm guidelines 39 . In addition, PRISMA-AI 40 guidelines for systematic reviews and meta-analyses are being developed. These are promising initiatives that will hopefully make both the reporting and evaluation of AI diagnostic tool research more transparent.

The results of this systematic review and meta-analysis indicate that clinicians benefit from AI assistance in skin cancer diagnosis regardless of their experience level. Clinicians with the least experience in dermatology may benefit the most from AI assistance. Our findings are timely as AI is expected to be widely implemented in clinical work globally in the near future. Notably, only four of the identified studies were conducted in clinical settings, three of which could be included in the meta-analysis. Therefore, there is an urgent need for more prospective clinical studies conducted in real-life settings where AI is intended to be used, in order to better understand and anticipate the effect of AI on clinical decision making.

Search strategy and selection criteria

We searched four electronic databases, including PubMed, Embase, Institute of Electrical and Electronics Engineers Xplore (IEE Xplore) and Scopus for peer-reviewed articles of AI-assisted skin cancer diagnosis without language restriction from January 1, 2017, until November 8, 2022. Search terms were combined for four key concepts: (1) AI, (2) skin cancer, (3) diagnosis, (4) doctors. The full search strategy is available in the Supplementary material (Supplementary Table 3 ). We chose 2017 as the cutoff for this review since this was the year when deep learning was first reported as performing at a level comparable to dermatologists, notably in the seminal study by Esteva et al 9 , which suggested that AI technology had reached a clinically useful level in assisting skin cancer diagnosis.

We applied Google Translate software for abstract screening of non-English articles. Manual searches were performed for conference proceedings, including NeurIPS, HICSS, ICML, ICLR, AAAI, CVPR, CHIL and ML4Health, and to identify additional relevant articles by reviewing bibliographies and citations of the screened papers and searching Google Scholar.

We included studies comparing diagnostic accuracy of clinicians detecting skin cancer with and without AI assistance. If studies provided diagnostic data from medical professionals other than physicians this data was also included for analysis, as long as the study also included physicians. However, we excluded studies if (1) diagnosis was not made from either images of skin lesions or in-person visits (e.g., pathology slides), (2) diagnostic accuracy was only compared between clinicians and an AI algorithm, (3) non-deep learning techniques were used, or (4) the articles were editorials, reviews, and case reports. We did not limit participants’ expertise, study design or sample size, reference standard, or skin diagnosis if at least one skin malignancy was included in the study. We contacted nine authors to request additional data and clarifications required for the meta-analysis and received data from five of them 10 , 12 , 13 , 14 , 15 and clarifications from two 16 , 17 . In four studies 10 , 14 , 15 , 17 raw data was not available for all experiments or lesions, and our meta-analysis included the data that was available. Studies with insufficient data to construct contingency tables 16 , 18 were included in the systematic review but not in the meta-analysis.

Three reviewers performed eligibility assessment, data extraction, and study quality evaluations (IK, JK, ZRC). Commonly used standardized programs were employed for duplicate removal, title and abstract screening, and full-text review (Covidence) and data extraction (Microsoft Excel). Paired reviewers independently screened the titles and abstracts using predefined criteria and extracted data. Disagreement was resolved by discussions with the third reviewer. IK imported the extracted data into the summary table for systematic review and two reviewers (JK and ZRC) verified it. JK imported the extracted data and prepared it for meta-analysis and two reviewers (ZRC and IK) verified it. Biostatistician (AL) reviewed and confirmed the final data for meta-analysis. All co-authors reviewed the final tables and figures. This systematic review and meta-analysis followed the PRISMA DTA guidelines 41 and the study protocol was registered with PROSPERO, CRD42023391560.

Data analysis

We extracted key information, including true positive, false positive, false negative, and true negative information among clinicians with and without AI assistance. We generated contingency tables, where possible, to estimate diagnostic test accuracy in terms of pooled sensitivity and specificity. Additional information about the AI algorithm (e.g., architecture, image sources, validation and AI assistance method), participants, patients, target condition, reference standard, study setting and design, and funding were extracted.

A revised tool for the methodological quality assessment of diagnostic accuracy studies (QUADAS-2) 42 was used to assess risk of bias and concerns of applicability of each study in four domains, patient selection, index test, reference standard, and flow and timing (Supplementary Table 1 ). A pair of reviewers independently evaluated the domains, compared the ratings, and, if conflicted, reconciled the discrepancies through discussions led by the third reviewer (IK, JK, ZRC).

We used the Metandi package 43 for Stata 17 (College Station, TX) to compute summary estimates of sensitivity and specificity with 95% confidence intervals (95% CI) of humans with AI-assistance compared to humans without AI assistance using a bivariate model 44 . Summary Receiver Operating Characteristics (SROC) curves were plotted to visually present the summary estimates of sensitivity and specificity with 95% confidence region and the 95% prediction region, which refers to the confidence areas that the sensitivity and specificity of future studies likely fall into. The Bivariate models were performed separately for clinicians with vs. without AI assistance because the Metandi package could not handle the paired design of the data. We applied a random effects model to account for the anticipated heterogeneity across studies, potentially due to the variance of the data, including the use of different AI algorithms, medical professionals, and study settings. Heterogeneity was assessed by visual inspection of graphics, including SROC curve and forest plots 45 , 46 . Additionally, we conducted bivariate meta-regression analysis using the Meqrlogit package (Stata 17, College Station, TX) by the presence of AI assistance or not, for each experience level in dermatology (dermatologists, residents, non-dermatology medical professionals), type of diagnostic task (binary classification or top diagnosis) and type of image (clinical or dermoscopic) separately, to compare diagnostic accuracy by AI assistance and adjust for the potential heterogeneity caused by these factors 47 . To investigate the presence of a positive threshold effect, Spearman correlation coefficient was computed between sensitivity and specificity 48 . Pre-planned sensitivity analyses were conducted by excluding potential outliers, 49 studies with poor methodology (where at least three domains were rated as unclear or high bias), and studies with reference standards other than only histopathology. We examined publication bias using Deeks’ Funnel Plot Asymmetry Test, which ran a regression on the effective sample size funnel plots vs. diagnostic odds ratio 50 . We calculated κ statistics to evaluate the agreements between QUADAS-2 assessors. All statistical significance was determined at p  < 0.05.

Data availability

E.L. has full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. All study materials are available from the corresponding author upon reasonable request.

Code availability

The codes used in the analysis of this study will be made available from the corresponding author upon reasonable request.

Brynjolfsson, E. & Mitchell, T. What can machine learning do? Workforce implications. Science 358 , 1530–1534 (2017).

Article   CAS   PubMed   Google Scholar  

Wu, E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat. Med. 27 , 582–584 (2021).

Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2 , 719–731 (2018).

Article   PubMed   Google Scholar  

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25 , 44–56 (2019).

Kittler, H., Pehamberger, H., Wolff, K. & Binder, M. Diagnostic accuracy of dermoscopy. Lancet Oncol. 3 , 159–165 (2002).

Marghoob, A. A. & Scope, A. The complexity of diagnosing melanoma. J. Investig. Dermatol. 129 , 11–13 (2009).

Tschandl, P. et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 20 , 938–947 (2019).

Article   PubMed   PubMed Central   Google Scholar  

Haenssle, H. A. et al. Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann. Oncol. 31 , 137–143 (2020).

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 , 115–118 (2017).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Tschandl, P. et al. Human–computer collaboration for skin cancer recognition. Nat. Med. 26 , 1229–1234 (2020).

Ngiam, K. Y. & Khor, I. W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20 , e262–e273 (2019).

Lee, S. et al. Augmented decision-making for acral lentiginous melanoma detection using deep convolutional neural networks. J. Eur. Acad. Dermatol. Venereol. 34 , 1842–1850 (2020).

Cho, S. I. et al. Dermatologist-level classification of malignant lip diseases using a deep convolutional neural network. Br. J. Dermatol. 182 , 1388–1394 (2020).

Han, S. S. et al. Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J. Investig. Dermatol. 140 , 1753–1761 (2020).

Jain, A. et al. Development and assessment of an artificial intelligence–based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices. JAMA Netw. Open 4 , e217249–e217249 (2021).

Muñoz-López, C. et al. Performance of a deep neural network in teledermatology: a single-centre prospective diagnostic study. J. Eur. Acad. Dermatol. Venereol. 35 , 546–553 (2021).

Jahn, A. S. et al. Over-detection of melanoma-suspect lesions by a CE-certified smartphone app: performance in comparison to dermatologists, 2D and 3D convolutional neural networks in a prospective data set of 1204 pigmented skin lesions involving patients’ perception. Cancers 14 , 3829 (2022).

Lucius, M. et al. Deep neural frameworks improve the accuracy of general practitioners in the classification of pigmented skin lesions. Diagnostics 10 , 969 (2020).

Han, S. S. et al. Evaluation of artificial intelligence-assisted diagnosis of skin neoplasms: a single-center, paralleled, unmasked Randomized Controlled Trial. J. Investig. Dermatol. 142 , 2353–2362.e2352 (2022).

Kim, Y. J. et al. Augmenting the accuracy of trainee doctors in diagnosing skin lesions suspected of skin neoplasms in a real-world setting: a prospective controlled before-and-after study. PLoS One 17 , e0260895 (2022).

Ba, W. et al. Convolutional neural network assistance significantly improves dermatologists’ diagnosis of cutaneous tumours using clinical images. Eur. J. Cancer 169 , 156–165 (2022).

Maron, R. C. et al. Artificial intelligence and its effect on dermatologists’ accuracy in dermoscopic melanoma image classification: web-based survey study. J. Med. Internet Res. 22 , e18091 (2020).

Bowker, A. H. A test for symmetry in contingency tables. J. Am. Stat. Assoc. 43 , 572–574 (1948).

Beltrami, E. J. et al. Artificial intelligence in the detection of skin cancer. J. Am. Acad. Dermatol. 87 , 1336–1342 (2022).

Young, A. T., Xiong, M., Pfau, J., Keiser, M. J. & Wei, M. L. Artificial intelligence in dermatology: a primer. J. Investig. Dermatol. 140 , 1504–1512 (2020).

Barata, C. et al. A reinforcement learning model for AI-based decision support in skin cancer. Nat. Med. 29 , 1941–1946 (2023).

Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digital Health 3 , e745–e750 (2021).

Krakowski, S. M., Haftor, D., Luger, J., Pashkevich, N. & Raisch, S. Humans and algorithms in organizational decision making: evidence from a field experiment. Acad. Manag. Proc. 2019 , 16633 (2019).

Article   Google Scholar  

Park, J. & Woo, S. E. Who likes artificial intelligence? personality predictors of attitudes toward artificial intelligence. J. Psychol. 156 , 68–94 (2022).

Vodrahalli, K., Daneshjou, R., Gerstenberg, T. & Zou, J. Do humans trust advice more if it comes from AI? An analysis of human-ai interactions. In Proc. 2022 AAAI/ACM Conference on AI, Ethics, and Society 763–777 (Association for Computing Machinery, Oxford, United Kingdom, 2022).

Ludolph, R. & Schulz, P. J. Debiasing health-related judgments and decision making: a systematic review. Med. Decis. Mak. 38 , 3–13 (2018).

Gaube, S. et al. Non-task expert physicians benefit from correct explainable AI advice when reviewing X-rays. Sci. Rep. 13 , 1383 (2023).

Breck, E., Polyzotis, N., Roy, S., Whang, S. & Zinkevich, M. Data validation for machine learning. In Proceedings of the Conference on Systems and Machine Learning, (2019)

Daneshjou, R., Smith, M. P., Sun, M. D., Rotemberg, V. & Zou, J. Lack of transparency and potential bias in artificial intelligence data sets and algorithms: a scoping review. JAMA Dermatol. 157 , 1362–1369 (2021).

Daneshjou, R. et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci. Adv. 8 , eabq6147 (2022).

Groh, M. et al. Evaluating Deep Neural Networks Trained on Clinical Images in Dermatology with the Fitzpatrick 17k Dataset. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , 1820-1828 (2021).

Sounderajah, V. et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open 11 , e047709 (2021).

Collins, G. S. et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open 11 , e048008 (2021).

Daneshjou, R. et al. Checklist for evaluation of image-based artificial intelligence reports in dermatology: CLEAR derm consensus guidelines from the international skin imaging collaboration artificial intelligence working group. JAMA Dermatol. 158 , 90–96 (2022).

Cacciamani, G. E. et al. PRISMA AI reporting guidelines for systematic reviews and meta-analyses on AI in healthcare. Nat. Med. 29 , 14–15 (2023).

McInnes, M. D. F. et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA 319 , 388–396 (2018).

Whiting, P. F. et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann. Intern. Med. 155 , 529–536 (2011).

Harbord, R. M. & Whiting, P. metandi: meta–analysis of diagnostic accuracy using hierarchical logistic regression. Stata J. 9 , 211–229 (2009).

Reitsma, J. B. et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J. Clin. Epidemiol. 58 , 982–990 (2005).

Macaskill P, T. Y., et al. editor(s). In Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy 1–46 (Cochrane, London, 2022).

Kim, K. W., Lee, J., Choi, S. H., Huh, J. & Park, S. H. Systematic review and meta-analysis of studies evaluating diagnostic test accuracy: a practical review for clinical researchers-Part I. General Guidance and Tips. Korean J. Radio. 16 , 1175–1187 (2015).

Takwoingi, Y. et al. Chapter 10: Undertaking meta-analysis. Draft version (4 October 2022) for inclusion in: Deeks, J. J., Bossuyt, P. M., Leeflang, M. M., Takwoingi, Y. In Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy 1–77 (Cochrane, London, 2022).

Zamora, J., Abraira, V., Muriel, A., Khan, K. & Coomarasamy, A. Meta-DiSc: a software for meta-analysis of test accuracy data. BMC Med. Res. Methodol. 6 , 31–31 (2006).

Harrer, M., Cuijpers, P., Furukawa, T. A. & Ebert, D. D. Doing Meta-Analysis With R: A Hands-On Guide , (Chapman & Hall/CRC Press, Boca Raton, FL and London, 2021).

Deeks, J. J., Macaskill, P. & Irwig, L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J. Clin. Epidemiol. 58 , 882–893 (2005).

Download references

Acknowledgements

This project received no specific funding. E.L. is supported by the National Institutes of Health: Mid-career Investigator Award in Patient-Oriented Research (K24AR075060) and Research Project Grant (R01AR082109). I.K. received research funding from Radiumhemmet Research Funds (009614) and H.E. received funding from Radiumhemmet Research Funds (211063, 181083), Region Stockholm (FoUI-962339, FoUI-972654), the Swedish Cancer Society (2111617Pj, 210406JCIA01) and the Swedish Research Council (202201534). The funders played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.

Author information

These authors jointly supervised this work: Isabelle Krakowski, Jiyeong Kim.

Authors and Affiliations

Center for Digital Health, Stanford University School of Medicine, Stanford, CA, USA

Isabelle Krakowski, Jiyeong Kim, Zhuo Ran Cai & Eleni Linos

Department of Oncology and Pathology, Karolinska Institutet, Stockholm, Sweden

Isabelle Krakowski & Hanna Eriksson

Department of Dermatology, Stanford, Stanford University, Stanford, CA, USA

Department of Dermatology, Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA

Roxana Daneshjou

Department of Dermatology, Theme Inflammation, Karolinska University Hospital, Stockholm, Sweden

Theme Cancer, Unit of Head-Neck-, Lung- and Skin Cancer, Skin Cancer Center, Karolinska University Hospital, Stockholm, Sweden

Hanna Eriksson

Department of Education, University of Nicosia, Nicosia, Cyprus

Anastasia Lykou

You can also search for this author in PubMed   Google Scholar

Contributions

IK and JK contributed equally as joint first authors. Concept and design: EL, RD and IK. Literature search, screening process, data extraction and bias assessment: IK, JK and ZRC. Data analysis and interpretation: JK, AL, IK and EL. Drafting of the manuscript: IK and JK. Critical revision for important intellectual content and approval of the manuscript: All authors. Obtained funding: EL, HE and IK. Supervision: EL and AL.

Corresponding author

Correspondence to Eleni Linos .

Ethics declarations

Competing interests.

H.E. has served in advisory roles and delivered presentations for Novartis, BMS, GSK and Pierre Fabre and has obtained industry-sponsored research funding from SkylineDx. RD is an AAD AI committee member and Associate Editor at the Journal of Investigative Dermatology, has received consulting fees from Pfizer, L’Oreal, Frazier Healthcare Partners, and has stock options in Revea and MDAlgorithms. All other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Krakowski, I., Kim, J., Cai, Z.R. et al. Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis. npj Digit. Med. 7 , 78 (2024). https://doi.org/10.1038/s41746-024-01031-w

Download citation

Received : 22 September 2023

Accepted : 05 February 2024

Published : 09 April 2024

DOI : https://doi.org/10.1038/s41746-024-01031-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

ai in education a systematic literature review

IMAGES

  1. (PDF) Artificial intelligence and the conduct of literature reviews

    ai in education a systematic literature review

  2. (PDF) Impacts of Artificial Intelligence on Public Administration: A

    ai in education a systematic literature review

  3. Mastering Systematic Literature Reviews with AI Tools

    ai in education a systematic literature review

  4. (PDF) Using Gamification in Education: A Systematic Literature Review

    ai in education a systematic literature review

  5. (PDF) Ethics of AI: A Systematic Literature Review of Principles and

    ai in education a systematic literature review

  6. (PDF) A Systematic Literature Review of Business Intelligence

    ai in education a systematic literature review

VIDEO

  1. Reporting Systematic Review Results

  2. Systematic Literature Review Paper presentation

  3. RLUK24

  4. AI Literacy: Navigating the World of Automation and Generative AI

  5. Penggunaan Elicit untuk penulisan systematic literature review (SLR)

  6. How to write a literature review Fast

COMMENTS

  1. Role of AI chatbots in education: systematic literature review

    AI chatbots shook the world not long ago with their potential to revolutionize education systems in a myriad of ways. AI chatbots can provide immediate support by answering questions, offering explanations, and providing additional resources. Chatbots can also act as virtual teaching assistants, supporting educators through various means. In this paper, we try to understand the full benefits ...

  2. AI in Education: A Systematic Literature Review

    A review of. available and relevant literature was done using the systematic re view method t o identify the current. research focus and provide an in-depth understanding of AI technology in ...

  3. Artificial Intelligence in Education: a Systematic Review

    on the selection of the present systematic literature review approach, along with the . ... AI in education (freq. 11, pct. = 65 %), while the rest of the articles p ointed out the .

  4. AI in Education: A Systematic Literature Review

    The purpose of this study is to analyze the opportunities, benefits, and challenges of AI in education. A review of available and relevant literature was done using the systematic review method to identify the current research focus and provide an in-depth understanding of AI technology in education for educators and future research directions.

  5. A systematic review of AI education in K-12 classrooms from 2018 to

    This systematic review focuses on AI education in K-12, examining topics, instructional approaches, and learning outcomes. ... Academic dishonesty and trustworthy assessment in online learning: A systematic literature review. Journal of Computer Assisted Learning, 38 (6) (2022), pp. 1535-1553. CrossRef View in Scopus Google Scholar.

  6. AI in Education: A Systematic Literature Review

    The purpose of this study is to analyze the opportunities, benefits, and challenges of AI in education. A review of available and relevant literature was done using the systematic review method to identify the current research focus and provide an in-depth understanding of AI technology in education for educators and future research directions.

  7. A systematic review of AI role in the educational system based on a

    To fill this gap, this review research proposes a conceptual framework from complex adaptive systems theory perspective, uses a systematic literature review approach to locate and summarize articles, and categorizes the roles of AI in the educational system. The review results indicate that when AI is added into an educational system, its roles ...

  8. Artificial Intelligence Applications in K-12 Education: A Systematic

    Additionally, technologies and environments that contributed to employing AI in education were discussed. To this end, a systematic literature review was conducted on articles and conference papers published between 2011 and 2021 in the Web of Science and Scopus databases. As the result of the initial search, 2075 documents were extracted and ...

  9. PDF AI's Role and Application in Education: Systematic Review

    the diverse roles of AI in education and the extent of influence AI holds in educa-tional practices [6, 9, 10, 13]. The primary goal of this systematic review paper is to analyze, synthesize and identify the key themes of AI in education to fill the theoretical gap in the literature and offer a roadmap for policy makers, institutions

  10. PDF Role of AI chatbots in education: systematic literature review

    To summarize, incorporating AI chatbots in education brings personalized learn-ing for students and time efficiency for educators. Students benefit from flexible study aid and skill development. However, concerns arise regarding the accuracy of information, fair assessment practices, and ethical considerations.

  11. Systematic literature review on opportunities, challenges, and future

    In this study, AI in education (AIEd) refers to the application of AI technologies, such as intelligent tutoring systems, chatbots, robots, and the automated assessment of all modes of digitized artifacts that support and enhance education. ... A systematic review of the literature regarding socially assistive robots in pre-tertiary education ...

  12. PDF Human-Centred Learning Analytics and AI in Education: a Systematic

    Despite a shift towards human-centred design in recent LA and AIED research, there remain gaps in our understanding of the importance of human control, safety, reliability, and trustworthiness in the design and implementation of these systems. We conducted a systematic literature review to explore these concerns and gaps.

  13. AI's Role and Application in Education: Systematic Review

    A systematic review approach was used to identify the current research emphasis and offer an in-depth analysis of the function of AI technology in education. A total of 46 related articles were identified from the Scopus database. Three key themes emerged from the review: learning, education and teaching.

  14. Artificial Intelligence teaching and learning in K-12 from 2019 to 2022

    With this in mind, this systematic literature review was undertaken to gain a thorough understanding of current research on AI teaching and learning in schools. This paper reports on a review of research published during the last three years (2019-2022), relating to empirical evidence of AI education for young children, offering insights for ...

  15. PDF Artificial intelligence in mathematics education: A systematic

    A systematic literature review (SLR) was conducted using established and robust guidelines. We follow the preferred reporting items for systematic reviews and meta-analyses (PRISMA). We searched ScienceDirect, Scopus, Springer Link, ProQuest, and EBSCO Host for 20 AI studies published between 2017 and 2021.

  16. Opportunities and Challenges of AI towards Education: A Systematic

    The main objective of the systematic literature review is to provide an exhaustive assessment of the state of knowledge in the field and identify research gaps that need to be filled. Moreover, AI ...

  17. Role of AI chatbots in education: systematic literature review

    It is found that students primarily gain from AI-powered chatbots in three key areas: homework and study assistance, a personalized learning experience, and the development of various skills. AI chatbots shook the world not long ago with their potential to revolutionize education systems in a myriad of ways. AI chatbots can provide immediate support by answering questions, offering ...

  18. Adoptions of AI in Education: A Systematic Literature Review

    Aims to find out the adaptiveness of AI tools via Systematic Literature Review. 9. Swiecki et al. (2022) More of teacher-centric study focusing on the assessment patterns in AI paradigm. 10. Bhutoria (2022) Aims to find out the co-relation between personalized education and artificial intelligence via Systematic Literature Review. 11. Lim et al ...

  19. Revolutionizing learning in the digital age: a systematic literature

    This systematic literature review delves into the world of microlearning strategies to examine their role in addressing the challenges of learning in the digital age. Microlearning, characterized by its bite-sized, easily digestible content, is gaining prominence as a powerful tool for learners and organizations seeking effective, just-in-time ...

  20. Systematic Review Literature Searching: Can I Use ChatGPT?

    Systematic reviews (a type of literature review which aims to collect, analyse and summarise all primary evidence on a specific topic) are increasingly becoming a core requirement across many academic programmes. This, paired with the advent of powerful and pervasive generative AI such as ChatGPT, have raised new questions about the feasibility of generative AI to help develop comprehensive ...

  21. Repositori Institucional (O2): Empathic pedagogical conversational

    Artificial intelligence (AI) and natural language processing technologies have fuelled the growth of Pedagogical Conversational Agents (PCAs) with empathic conversational capabilities. However, no systematic literature review has explored the intersection between conversational agents, education and emotion.

  22. Artificial intelligence in online higher education: A systematic review

    However, this review work did not conduct further analysis for examining the effects of AI in online higher education. Moreno-Marcos et al. used a systematic literature review to examine the AI models used for performance prediction in MOOCs. This review found that AI algorithms used for prediction were: regression, support vector machines (SVM ...

  23. Human-AI interaction in skin cancer diagnosis: a systematic review and

    The goal of this study was to evaluate the diagnostic accuracy of clinicians with vs. without AI assistance using a systematic review and meta-analysis of the available literature.

  24. AI literacy in K-12: a systematic literature review

    AI literacy in K‑12: a systematic lit erature. review. Lorena Casal‑Otero 1 , Alejandro Catala 2,3* , Carmen Fernández‑Morante 1 , Maria Taboada2 , Beatriz Cebreiro 1 and Senén Barro3 ...

  25. Applied Sciences

    In this article, a systematic literature review and bibliometric analysis of the available research studies in the field of training and education of seafarers for MASS was performed. The bibliometric analysis helped to identify the leading countries, the most frequent researchers and the historical trends in the field of education and training ...

  26. Information

    AMA Style. Sequeira R, Reis A, Alves P, Branco F. Roadmap for Implementing Business Intelligence Systems in Higher Education Institutions: Systematic Literature Review.

  27. Evaluating population-level interventions to reduce inappropriate

    Background Inappropriate antibiotic use contributes significantly to the global challenge of antimicrobial resistance. While government-initiated population-level interventions are fundamental in addressing this issue, their full potential remains to be explored. This systematic review aims to assess the effectiveness of such interventions in reducing inappropriate antibiotic use among ...