Purdue University

  • Ask a Librarian

Artificial Intelligence (AI)

Ai for systematic review.

  • How to Cite AI Generated Content
  • Prompt Design
  • Resources for Educators
  • Purdue AI Resources
  • AI and Ethics
  • Publisher Policies
  • Selected Journals in AI

Various AI tools are invaluable throughout the systematic review or evidence synthesis process. While the consensus acknowledges the significant utility of AI tools across different review stages, it's imperative to grasp their inherent biases and weaknesses. Moreover, ethical considerations such as copyright and intellectual property must be at the forefront.

  • Application ChatGPT in conducting systematic reviews and meta-analyses
  • Are ChatGPT and large language models “the answer” to bringing us closer to systematic review automation?
  • Artificial intelligence in systematic reviews: promising when appropriately used
  • Harnessing the power of ChatGPT for automating systematic review process: methodology, case study, limitations, and future directions
  • In-depth evaluation of machine learning methods for semi-automating article screening in a systematic review of mechanistic
  • Tools to support the automation of systematic reviews: a scoping review
  • The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study
  • Using artificial intelligence methods for systematic review in health sciences: A systematic review

AI Tools for Systematic Review

  • DistillerSR Securely automate every stage of your literature review to produce evidence-based research faster, more accurately, and more transparently at scale.
  • Rayyan A web-tool designed to help researchers working on systematic reviews, scoping reviews and other knowledge synthesis projects, by dramatically speeding up the process of screening and selecting studies.
  • RobotReviewer A machine learning system aiming which aims to automate evidence synthesis.
  • << Previous: AI Tools
  • Next: How to Cite AI Generated Content >>
  • Last Edited: Mar 21, 2024 1:34 PM
  • URL: https://guides.lib.purdue.edu/ai

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 13, Issue 7
  • Artificial intelligence in systematic reviews: promising when appropriately used
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0003-1727-0608 Sanne H B van Dijk 1 , 2 ,
  • Marjolein G J Brusse-Keizer 1 , 3 ,
  • Charlotte C Bucsán 2 , 4 ,
  • http://orcid.org/0000-0003-1071-6769 Job van der Palen 3 , 4 ,
  • Carine J M Doggen 1 , 5 ,
  • http://orcid.org/0000-0002-2276-5691 Anke Lenferink 1 , 2 , 5
  • 1 Health Technology & Services Research, Technical Medical Centre , University of Twente , Enschede , The Netherlands
  • 2 Pulmonary Medicine , Medisch Spectrum Twente , Enschede , The Netherlands
  • 3 Medical School Twente , Medisch Spectrum Twente , Enschede , The Netherlands
  • 4 Cognition, Data & Education, Faculty of Behavioural, Management & Social Sciences , University of Twente , Enschede , The Netherlands
  • 5 Clinical Research Centre , Rijnstate Hospital , Arnhem , The Netherlands
  • Correspondence to Dr Anke Lenferink; a.lenferink{at}utwente.nl

Background Systematic reviews provide a structured overview of the available evidence in medical-scientific research. However, due to the increasing medical-scientific research output, it is a time-consuming task to conduct systematic reviews. To accelerate this process, artificial intelligence (AI) can be used in the review process. In this communication paper, we suggest how to conduct a transparent and reliable systematic review using the AI tool ‘ASReview’ in the title and abstract screening.

Methods Use of the AI tool consisted of several steps. First, the tool required training of its algorithm with several prelabelled articles prior to screening. Next, using a researcher-in-the-loop algorithm, the AI tool proposed the article with the highest probability of being relevant. The reviewer then decided on relevancy of each article proposed. This process was continued until the stopping criterion was reached. All articles labelled relevant by the reviewer were screened on full text.

Results Considerations to ensure methodological quality when using AI in systematic reviews included: the choice of whether to use AI, the need of both deduplication and checking for inter-reviewer agreement, how to choose a stopping criterion and the quality of reporting. Using the tool in our review resulted in much time saved: only 23% of the articles were assessed by the reviewer.

Conclusion The AI tool is a promising innovation for the current systematic reviewing practice, as long as it is appropriately used and methodological quality can be assured.

PROSPERO registration number CRD42022283952.

  • systematic review
  • statistics & research methods
  • information technology

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:  https://creativecommons.org/licenses/by/4.0/ .

https://doi.org/10.1136/bmjopen-2023-072254

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

Potential pitfalls regarding the use of artificial intelligence in systematic reviewing were identified.

Remedies for each pitfall were provided to ensure methodological quality. A time-efficient approach is suggested on how to conduct a transparent and reliable systematic review using an artificial intelligence tool.

The artificial intelligence tool described in the paper was not evaluated for its accuracy.

Medical-scientific research output has grown exponentially since the very first medical papers were published. 1–3 The output in the field of clinical medicine increased and keeps doing so. 4 To illustrate, a quick PubMed search for ‘cardiology’ shows a fivefold increase in annual publications from 10 420 (2007) to 52 537 (2021). Although the medical-scientific output growth rate is not higher when compared with other scientific fields, 1–3 this field creates the largest output. 3 Staying updated by reading all published articles is therefore not feasible. However, systematic reviews facilitate up-to-date and accessible summaries of evidence, as they synthesise previously published results in a transparent and reproducible manner. 5 6 Hence, conclusions can be drawn that provide the highest considered level of evidence in medical research. 5 7 Therefore, systematic reviews are not only crucial in science, but they have a large impact on clinical practice and policy-making as well. 6 They are, however, highly labour-intensive to conduct due to the necessity of screening a large amount of articles, which results in a high consumption of research resources. Thus, efficient and innovative reviewing methods are desired. 8

An open-source artificial intelligence (AI) tool ‘ASReview’ 9 was published in 2021 to facilitate the title and abstract screening process in systematic reviews. Applying this tool facilitates researchers to conduct more efficient systematic reviews: simulations already showed its time-saving potential. 9–11 We used the tool in the study selection of our own systematic review and came across scenarios that needed consideration to prevent loss of methodological quality. In this communication paper, we provide a reliable and transparent AI-supported systematic reviewing approach.

We first describe how the AI tool was used in a systematic review conducted by our research group. For more detailed information regarding searches and eligibility criteria of the review, we refer to the protocol (PROSPERO registry: CRD42022283952). Subsequently, when deciding on the AI screening-related methodology, we applied appropriate remedies against foreseen scenarios and their pitfalls to maintain a reliable and transparent approach. These potential scenarios, pitfalls and remedies will be discussed in the Results section.

In our systematic review, the AI tool ‘ASReview’ (V.0.17.1) 9 was used for the screening of titles and abstracts by the first reviewer (SHBvD). The tool uses an active researcher-in-the-loop machine learning algorithm to rank the articles from high to low probability of eligibility for inclusion by text mining. The AI tool offers several classifier models by which the relevancy of the included articles can be determined. 9 In a simulation study using six large systematic review datasets on various topics, a Naïve Bayes (NB) and a term frequency-inverse document frequency (TF-IDF) outperformed other model settings. 10 The NB classifier estimates the probability of an article being relevant, based on TF-IDF measurements. TF-IDF measures the originality of a certain word within the article relative to the total number of articles the word appears in. 12 This combination of NB and TF-IDF was chosen for our systematic review.

Before the AI tool can be used for the screening of relevant articles, its algorithm needs training with at least one relevant and one irrelevant article (ie, prior knowledge). It is assumed that the more prior knowledge, the better the algorithm is trained at the start of the screening process, and the faster it will identify relevant articles. 9 In our review, the prior knowledge consisted of three relevant articles 13–15 selected from a systematic review on the topic 16 and three randomly picked irrelevant articles.

After training with the prior knowledge, the AI tool made a first ranking of all unlabelled articles (ie, articles not yet decided on eligibility) from highest to lowest probability of being relevant. The first reviewer read the title and abstract of the number one ranked article and made a decision (‘relevant’ or ‘irrelevant’) following the eligibility criteria. Next, the AI tool took into account this additional knowledge and made a new ranking. Again, the next top ranked article was proposed to the reviewer, who made a decision regarding eligibility. This process of AI making rankings and the reviewer making decisions, which is also called ‘researcher-in-the-loop’, was repeated until the predefined data-driven stopping criterion of – in our case – 100 subsequent irrelevant articles was reached. After the reviewer rejected what the AI tool puts forward as ‘most probably relevant’ a hundred times, it was assumed that there were no relevant articles left in the unseen part of the dataset.

The articles that were labelled relevant during the title and abstract screening were each screened on full text independently by two reviewers (SHBvD and MGJB-K, AL, JvdP, CJMD, CCB) to minimise the influence of subjectivity on inclusion. Disagreements regarding inclusion were solved by a third independent reviewer.

How to maintain reliability and transparency when using AI in title and abstract screening

A summary of the potential scenarios, and their pitfalls and remedies, when using the AI tool in a systematic review is given in table 1 . These potential scenarios should not be ignored, but acted on to maintain reliability and transparency. Figure 1 shows when and where to act on during the screening process reflected by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart, 17 from literature search results to publishing the review.

  • Download figure
  • Open in new tab
  • Download powerpoint

Flowchart showing when and where to act on when using ASReview in systematic reviewing. Adapted the PRISMA flowchart from Haddaway et al . 17

  • View inline

Per-scenario overview of potential pitfalls and how to prevent these when using ASReview in a systematic review

In our systematic review, by means of broad literature searches in several scientific databases, a first set of potentially relevant articles was identified, yielding 8456 articles, enough to expect the AI tool to be efficient in the title and abstract screening (scenario ① was avoided, see table 1 ). Subsequently, this complete set of articles was uploaded in reference manager EndNote X9 18 and review manager Covidence, 19 where 3761 duplicate articles were removed. Given that EndNote has quite low sensitivity in identifying duplicates, additional deduplication in Covidence was considered beneficial. 20 Deduplication is usually applied in systematic reviewing, 20 but is increasingly important prior to the use of AI. Since multiple decisions regarding a duplicate article weigh more than one, this will disproportionately influence classification and possibly the results ( table 1 , scenario ② ). In our review, a deduplicated set of articles was uploaded in the AI tool. Prior to the actual AI-supported title and abstract screening, the reviewers (SHBvD and AL, MGJB-K) trained themselves with a small selection of 74 articles. The first reviewer became familiar with the ASReview software, and all three reviewers learnt how to apply the eligibility criteria, to minimise personal influence on the article selection ( table 1 , scenario ③ ).

Defining the stopping criterion used in the screening process is left to the reviewer. 9 An optimal stopping criterion in active learning is considered a perfectly balanced trade-off between a certain cost (in terms of time spent) of screening one more article versus the predictive performance (in terms of identifying a new relevant article) that could be increased by adding one more decision. 21 The optimal stopping criterion in systematic reviewing would be the moment that screening additional articles will not result in more relevant articles being identified. 22 Therefore, in our review, we predetermined a data-driven stopping criterion for the title and abstract screening as ‘100 consecutive irrelevant articles’ in order to prevent the screening from being stopped before or a long time after all relevant articles were identified ( table 1 , scenario ④ ).

Due to the fact that the stopping criterion was reached after 1063 of the 4695 articles, only a part of the total number of articles was seen. Therefore, this approach might be sensitive to possible mistakes when articles are screened by only one reviewer, influencing the algorithm, possibly resulting in an incomplete selection of articles ( table 1 , scenario ③ ). 23 As a remedy, second reviewers (AL, MGJB-K) checked 20% of the titles and abstracts seen by the first reviewer. This 20% had a comparable ratio regarding relevant versus irrelevant articles over all articles seen. The percentual agreement and Cohen’s Kappa (κ), a measure for the inter-reviewer agreement above chance, were calculated to express the reliability of the decisions taken. 24 The decisions were agreed in 96% and κ was 0.83. A κ equal of at least 0.6 is generally considered high, 24 and thus it was assumed that the algorithm was reliably trained by the first reviewer.

The reporting of the use of the AI tool should be transparent. If the choices made regarding the use of the AI tool are not entirely reported ( table 1 , scenario ⑤ ), the reader will not be able to properly assess the methodology of the review, and review results may even be graded as low-quality due to the lack of transparent reporting. The ASReview tool offers the possibility to extract a data file providing insight into all decisions made during the screening process, in contrast to various other ‘black box’ AI-reviewing tools. 9 This file will be published alongside our systematic review to provide full transparency of our AI-supported screening. This way, the screening with AI is reproducible (remedy to scenario ⑥ , table 1 ).

Results of AI-supported study selection in a systematic review

We experienced an efficient process of title and abstract screening in our systematic review. Whereas the screening was performed with a database of 4695 articles, the stopping criterion was reached after 1063 articles, so 23% were seen. Figure 2A shows the proportion of articles identified as being relevant at any point during the AI-supported screening process. It can be observed that the articles are indeed prioritised by the active learning algorithm: in the beginning, relatively many relevant articles were found, but this decreased as the stopping criterion (vertical red line) was approached. Figure 2B compares the screening progress when using the AI tool versus manual screening. The moment the stopping criterion was reached, approximately 32 records would have been found when the titles and abstract would have been screened manually, compared with 142 articles labelled relevant using the AI tool. After the inter-reviewer agreement check, 142 articles proceeded to the full text reviewing phase, of which 65 were excluded because these were no articles with an original research format, and three because the full text could not be retrieved. After full text reviewing of the remaining 74 articles, 18 articles from 13 individual studies were included in our review. After snowballing, one additional article from a study already included was added.

Relevant articles identified after a certain number of titles and abstracts were screened using the AI tool compared with manual screening.

In our systematic review, the AI tool considerably reduced the number of articles in the screening process. Since the AI tool is offered open source, many researchers may benefit from its time-saving potential in selecting articles. Choices in several scenarios regarding the use of AI, however, are still left open to the researcher, and need consideration to prevent pitfalls. These include the choice whether or not to use AI by weighing the costs versus the benefits, the importance of deduplication, double screening to check inter-reviewer agreement, a data-driven stopping criterion to optimally use the algorithm’s predictive performance and quality of reporting of the AI-related methodology chosen. This communication paper is, to our knowledge, the first elaborately explaining and discussing these choices regarding the application of this AI tool in an example systematic review.

The main advantage of using the AI tool is the amount of time saved. Indeed, in our study, only 23% of the total number of articles were screened before the predefined stopping criterion was met. Assuming that all relevant articles were found, the AI tool saved 77% of the time for title and abstract screening. However, time should be invested to become acquainted with the tool. Whether the expected screening time saved outweighs this time investment is context-dependent (eg, researcher’s digital skills, systematic reviewing skills, topic knowledge). An additional advantage is that research questions previously unanswerable due to the insurmountable number of articles to screen in a ‘classic’ (ie, manual) review, now actually are possible to answer. An example of the latter is a review screening over 60 000 articles, 25 which would probably never have been performed without AI supporting the article selection.

Since the introduction of the ASReview tool in 2021, it was applied in seven published reviews. 25–31 An important note to make is that only one 25 clearly reported AI-related choices in the methods and a complete and transparent flowchart reflecting the study selection process in the Results section. Two reviews reported a relatively small number (<400) of articles to screen, 26 27 of which more than 75% of the articles were screened before the stopping criterion was met, so the amount of time saved was limited. Also, three reviews reported many initial articles (>6000) 25 28 29 and one reported 892 articles, 31 of which only 5%–10% needed to be screened. So in these reviews, the AI tool saved an impressive amount of screening time. In our systematic review, 3% of the articles were labelled relevant during the title and abstract screening and eventually, <1% of all initial articles were included. These percentages are low, and are in line with the three above-mentioned reviews (1%–2% and 0%–1%, respectively). 25 28 29 Still, relevancy and inclusion rates are much lower when compared with ‘classic’ systematic reviews. A study evaluating the screening process in 25 ‘classic’ systematic reviews showed that approximately 18% was labelled relevant and 5% was actually included in the reviews. 32 This difference is probably due to more narrow literature searches in ‘classic’ reviews for feasibility purposes compared with AI-supported reviews, resulting in a higher proportion of included articles.

In this paper, we show how we applied the AI tool, but we did not evaluate it in terms of accuracy. This means that we have to deal with a certain degree of uncertainty. Despite the data-driven stopping criterion there is a chance that relevant articles were missed, as 77% was automatically excluded. Considering this might have been the case, first, this could be due to wrong decisions of the reviewer that would have undesirably influenced the training of the algorithm by which the articles were labelled as (ir)relevant and the order in which they were presented to the reviewer. Relevant articles could have therefore remained unseen if the stopping criterion was reached before they were presented to the reviewer. As a remedy, in our own systematic review, of the 20% of the articles screened by the first reviewer, relevancy was also assessed by another reviewer to assess inter-reviewer reliability, which was high. It should be noted, though, that ‘classic’ title and abstract screening is not necessarily better than using AI, as medical-scientific researchers tend to assess one out of nine abstracts wrongly. 32 Second, the AI tool may not have properly ranked highly relevant to irrelevant articles. However, given that simulations proved this AI tool’s accuracy before 9–11 this was not considered plausible. Since our study applied, but did not evaluate, the AI tool, we encourage future studies evaluating the performance of the tool across different scientific disciplines and contexts, since research suggests that the tool’s performance depends on the context, for example, the complexity of the research question. 33 This could not only enrich the knowledge about the AI tool, but also increases certainty about using it. Also, future studies should investigate the effects of choices made regarding the amount of prior knowledge that is provided to the tool, the number of articles defining the stopping criterion, and how duplicate screening is best performed, to guide future users of the tool.

Although various researcher-in-the-loop AI tools for title and abstract screening have been developed over the years, 9 23 34 they often do not develop into usable mature software, 34 which impedes AI to be permanently implemented in research practice. For medical-scientific research practice, it would therefore be helpful if large systematic review institutions, like Cochrane and PRISMA, would consider to ‘officially’ make AI part of systematic reviewing practice. When guidelines on the use of AI in systematic reviews are made available and widely recognised, AI-supported systematic reviews can be uniformly conducted and transparently reported. Only then we can really benefit from AI’s time-saving potential and reduce our research time waste.

Our experience with the AI tool during the title and abstract screening was positive as it has highly accelerated the literature selection process. However, users should consider applying appropriate remedies to scenarios that may form a threat to the methodological quality of the review. We provided an overview of these scenarios, their pitfalls and remedies. These encourage reliable use and transparent reporting of AI in systematic reviewing. To ensure the continuation of conducting systematic reviews in the future, and given their importance for medical guidelines and practice, we consider this tool as an important addition to the review process.

Ethics approval

Not applicable.

  • Bornmann L ,
  • Haunschild R ,
  • Michels C ,
  • Haghani M ,
  • Zwack CC , et al
  • McKenzie JE ,
  • Bossuyt PM , et al
  • Gurevitch J ,
  • Koricheva J ,
  • Nakagawa S , et al
  • Rohrich RJ ,
  • Bastian H ,
  • Glasziou P ,
  • van de Schoot R ,
  • de Bruin J ,
  • Schram R , et al
  • Ferdinands G ,
  • de Bruin J , et al
  • Ferdinands G
  • Havrlant L ,
  • Kreinovich V
  • Li Y , et al
  • Jalloul F ,
  • Ayed S , et al
  • Andrijevic I ,
  • Milutinov S ,
  • Lozanov Crvenkovic Z , et al
  • Hawkins NM ,
  • Virani SA , et al
  • Haddaway NR ,
  • Pritchard CC , et al
  • Clarivate Analytics
  • Veritas Health Innovation
  • McKeown S ,
  • Ishibashi H ,
  • Blaizot A ,
  • Veettil SK ,
  • Saidoung P , et al
  • Bernardes RC ,
  • Botina LL ,
  • Araújo R dos S , et al
  • Silva GFS ,
  • Fagundes TP ,
  • Teixeira BC , et al
  • Miranda L ,
  • Pütz B , et al
  • Schouw HM ,
  • Huisman LA ,
  • Janssen YF , et al
  • Schuengel C ,
  • Sterkenburg PS , et al
  • Procházková M ,
  • Lu J , et al
  • Lam L , et al
  • Tetzlaff J , et al
  • Marshall IJ ,

Contributors SHBvD proposed the methodology and conducted the study selection. MGJB-K, CJMD and AL critically reflected on the methodology. MGJB-K and AL contributed substantially to the study selection. CCB, JvdP and CJMD contributed to the study selection. The manuscript was primarily prepared by SHBvD and critically revised by all authors. All authors read and approved the final manuscript.

Funding The systematic review is conducted as part of the RE-SAMPLE project. RE-SAMPLE has received funding from the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 965315).

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:

Using artificial intelligence methods for systematic review in health sciences: A systematic review

Affiliations.

  • 1 Department of Pharmacotherapy, College of Pharmacy, University of Utah, Utah, USA.
  • 2 Faculty of Pharmacy, Chiang Mai University, Chiang Mai, Thailand.
  • 3 School of Computing, Robert Gordon University, Aberdeen, Scotland, UK.
  • 4 The Rowett Institute, University of Aberdeen, Aberdeen, Scotland, UK.
  • 5 School of Medicine, Faculty of Health and Medical Sciences, Taylors University, Selangor, Malaysia.
  • 6 School of Pharmacy, Monash University Malaysia, Selangor, Malaysia.
  • 7 IDEAS Center, Veterans Affairs Salt Lake City Healthcare System, Salt Lake City, Utah, USA.
  • PMID: 35174972
  • DOI: 10.1002/jrsm.1553

The exponential increase in published articles makes a thorough and expedient review of literature increasingly challenging. This review delineated automated tools and platforms that employ artificial intelligence (AI) approaches and evaluated the reported benefits and challenges in using such methods. A search was conducted in 4 databases (Medline, Embase, CDSR, and Epistemonikos) up to April 2021 for systematic reviews and other related reviews implementing AI methods. To be included, the review must use any form of AI method, including machine learning, deep learning, neural network, or any other applications used to enable the full or semi-autonomous performance of one or more stages in the development of evidence synthesis. Twelve reviews were included, using nine different tools to implement 15 different AI methods. Eleven methods were used in the screening stages of the review (73%). The rest were divided: two in data extraction (13%) and two in risk of bias assessment (13%). The ambiguous benefits of the data extractions, combined with the reported advantages from 10 reviews, indicating that AI platforms have taken hold with varying success in evidence synthesis. However, the results are qualified by the reliance on the self-reporting of the review authors. Extensive human validation still appears required at this stage in implementing AI methods, though further evaluation is required to define the overall contribution of such platforms in enhancing efficiency and quality in evidence synthesis.

Keywords: artificial intelligence; evidence synthesis; machine learning; systematic reviews.

© 2022 John Wiley & Sons Ltd.

Publication types

  • Systematic Review
  • Artificial Intelligence*
  • Machine Learning
  • Systematic Reviews as Topic*

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Correspondence
  • Published: 16 January 2023

PRISMA AI reporting guidelines for systematic reviews and meta-analyses on AI in healthcare

  • Giovanni E. Cacciamani   ORCID: orcid.org/0000-0002-8892-5539 1 , 2 , 3 , 4 , 5 ,
  • Timothy N. Chu 1 , 2 , 3 ,
  • Daniel I. Sanford 1 , 2 , 3 ,
  • Andre Abreu 1 , 2 , 3 , 4 , 5 ,
  • Vinay Duddalwar   ORCID: orcid.org/0000-0002-4808-5715 3 , 6 ,
  • Assad Oberai 7 , 8 ,
  • C.-C. Jay Kuo 9 ,
  • Xiaoxuan Liu 10 , 11 , 12 ,
  • Alastair K. Denniston   ORCID: orcid.org/0000-0001-7849-0087 10 , 11 , 12 , 13 , 14 ,
  • Baptiste Vasey 15 , 16 ,
  • Peter McCulloch   ORCID: orcid.org/0000-0002-3210-8273 15 ,
  • Robert F. Wolff 17 ,
  • Sue Mallett   ORCID: orcid.org/0000-0002-0596-8200 18 ,
  • John Mongan 19 , 20 ,
  • Charles E. Kahn Jr   ORCID: orcid.org/0000-0002-6654-7434 21 ,
  • Viknesh Sounderajah 22 ,
  • Ara Darzi   ORCID: orcid.org/0000-0001-7815-7989 22 ,
  • Philipp Dahm 23 ,
  • Karel G. M. Moons 24 ,
  • Eric Topol   ORCID: orcid.org/0000-0002-1478-4729 25 ,
  • Gary S. Collins   ORCID: orcid.org/0000-0002-2772-2316 26 ,
  • David Moher   ORCID: orcid.org/0000-0003-2434-4206 27 ,
  • Inderbir S. Gill 1 , 2 , 3 , 4 , 5 &
  • Andrew J. Hung 1 , 2 , 5  

Nature Medicine volume  29 ,  pages 14–15 ( 2023 ) Cite this article

3536 Accesses

14 Citations

83 Altmetric

Metrics details

  • Research data
  • Translational research

Systematic reviews and meta-analyses play an essential part in guiding clinical practice at the point of care, as well as in the formulation of clinical practice guidelines and health policy 1 , 2 . There are three essential components to an impactful systematic review. First, the design of a study should be based upon a robust research question and search strategy. Second, minimization of bias should be enhanced by using quality-assessment tools and study-design-specific eligibility criteria. Third, reporting of results should be conducted transparently through adherence to expert-derived reporting items. Thousands of systematic reviews, including meta-analyses, are produced annually, with an increasing proportion reporting on artificial intelligence (AI) interventions in health care. With this rapid expansion, there is a need for reporting guidelines tailored to AI 3 , 4 , 5 , 6 , 7 that will support high-quality, reproducible, and clinically relevant systematic reviews.

AI is being integrated rapidly into society and in medicine. A literature search of studies referencing AI in health care over the past 20 years returned more than 70,000 published articles. Given that interest in AI is reaching an all-time high, there arise new concerns regarding the quality of these studies, including a lack of: clear explainability of how AI algorithms function; strong evidence of effectiveness in clinical settings; and standardized reporting within primary studies. Efforts have been made to improve understanding of this technology to allow for critical appraisal of AI interventions and to reduce inconsistencies in how studies are structured, as well as reporting of data, methods and results 3 , 5 , 7 . As systematic reviews on AI interventions increase, so does the importance of the transparency and reproducibility of reported data.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

A whirl of radiomics-based biomarkers in cancer immunotherapy, why is large scale validation still lacking?

  • Marta Ligero
  • , Bente Gielen
  •  …  Raquel Perez-Lopez

npj Precision Oncology Open Access 21 February 2024

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 print issues and online access

195,33 € per year

only 16,28 € per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

Page, M. J. et al. Syst. Rev. 10 , 89 (2021).

Article   Google Scholar  

Moher, D. et al. Epidemiology 22 , 128 (2011).

Liu, X. et al. Nat. Med. 26 , 1364–1374 (2020).

Article   CAS   Google Scholar  

Mongan, J. et al. Radiol. Artif. Int. 2 , e200029 (2020).

Sounderajah, V. et al. Nat. Med. 26 , 807–808 (2020).

Vasey, B. et al. Nat. Med. 28 , 924–933 (2022).

Cruz Rivera, S. et al. Nat. Med. 26 , 1351–1363 (2020).

The CONSORT-AI and SPIRIT-AI Steering Group. Nat. Med. 25 , 1467–1468 (2019).

Sounderajah, V. et al. Nat. Med. 27 , 1663–1665 (2021).

Moher, D. et al. PLoS Med. 7 , e1000217 (2010).

Download references

Author information

Authors and affiliations.

USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA

Giovanni E. Cacciamani, Timothy N. Chu, Daniel I. Sanford, Andre Abreu, Inderbir S. Gill & Andrew J. Hung

AI Center at USC Urology, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA

Department of Radiology, University of Southern California, Los Angeles, CA, USA

Giovanni E. Cacciamani, Timothy N. Chu, Daniel I. Sanford, Andre Abreu, Vinay Duddalwar & Inderbir S. Gill

Center for Image-Guided and Focal Therapy for Prostate Cancer, Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA

Giovanni E. Cacciamani, Andre Abreu & Inderbir S. Gill

Norris Comprehensive Cancer Center, Institute of Urology, Keck School of Medicine of the University of Southern California, Los Angeles, CA, USA

Giovanni E. Cacciamani, Andre Abreu, Inderbir S. Gill & Andrew J. Hung

USC Radiomics Laboratory, Keck School of Medicine, Department of Radiology, University of Southern California, Los Angeles, CA, USA

Vinay Duddalwar

Department of Aerospace and Mechanical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA

Assad Oberai

Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA

Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA

C.-C. Jay Kuo

University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK

Xiaoxuan Liu & Alastair K. Denniston

Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK

Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK

NIHR Birmingham Biomedical Research Centre, University of Birmingham, Birmingham, UK

Alastair K. Denniston

Health Data Research, London, UK

Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK

Baptiste Vasey & Peter McCulloch

Department of Surgery, Geneva University Hospital, Geneva, Switzerland

Baptiste Vasey

Kleijnen Systematic Reviews Ltd, Escrick, York, UK

Robert F. Wolff

Centre for Medical Imaging, University College London, London, UK

Sue Mallett

Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA

John Mongan

Center for Intelligent Imaging, University of California San Francisco, San Francisco, CA, USA

Department of Radiology and Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA

Charles E. Kahn Jr

Institute of Global Health Innovation, Imperial College London, London, UK

Viknesh Sounderajah & Ara Darzi

Minneapolis VAMC, Urology Section and University of Minnesota, Department of Urology, Minneapolis, MN, USA

Philipp Dahm

Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, The Netherlands

Karel G. M. Moons

Scripps Research Translational Institute, Scripps Research, La Jolla, CA, USA

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK

Gary S. Collins

Centre for Journalology, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada

David Moher

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Giovanni E. Cacciamani .

Ethics declarations

Competing interests.

P.D. serves as coordinating editor of Cochrane Urology . G.S.C. is the director of the UK EQUATOR Centre. D.M. is the director of the Canadian EQUATOR Centre. X.L. is an industry fellow (observer) with Hardian Health. A.J.H. is a consultant for Intuitive. I.S.G. is a consultant for STEBA. J.M. is a consultant for Siemens. C.E.K. receives salary support as editor of Radiology: Artificial Intelligence . V.D. is a consultant for Radmetrix Inc. and Westat Inc., and is an advisory board member to Deeptek Inc. A.K.D. is chair of the Health Security initiative at Flagship Pioneering UK Ltd. The other authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Cacciamani, G.E., Chu, T.N., Sanford, D.I. et al. PRISMA AI reporting guidelines for systematic reviews and meta-analyses on AI in healthcare. Nat Med 29 , 14–15 (2023). https://doi.org/10.1038/s41591-022-02139-w

Download citation

Published : 16 January 2023

Issue Date : January 2023

DOI : https://doi.org/10.1038/s41591-022-02139-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

  • Bente Gielen
  • Raquel Perez-Lopez

npj Precision Oncology (2024)

Artificial intelligence and urology: ethical considerations for urologists and patients

  • Giovanni E. Cacciamani
  • Andrew Chen
  • Andrew J. Hung

Nature Reviews Urology (2024)

Updating the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) for reporting AI research

  • Ali S. Tejani
  • Michail E. Klontzas
  • Charles E. Kahn

Nature Machine Intelligence (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

systematic literature review using ai tools

The APRA

Mastering Systematic Literature Reviews with AI Tools

Dr. IQ

About Course

This comprehensive course is designed to provide learners with the knowledge and skills required to conduct high-quality systematic literature reviews using AI tools. The course will guide learners through every stage of the systematic literature review process, from conceptualizing and developing effective research questions, to identifying gaps in literature, developing an effective search strategy, synthesizing literature using AI tools, and finally reporting results using AI tools in accordance with established guidelines. Additionally, the course will provide a thorough understanding of the strengths and limitations of AI tools in literature reviews, and learners will have ample opportunities to apply these tools in practice, through interactive and practical learning methods. By the end of the course, learners will have a firm grasp of the most effective methods to utilize AI tools in their systematic literature reviews for research articles and PhD dissertations.

Teaching Methodology:

The course will be conducted through a combination of live sessions, video tutorials, interactive sessions, and review and feedback on the work submitted by learners. The live sessions will be conducted over four weeks, with one session per week, and will be delivered via Zoom. The live sessions will provide an opportunity for learners to interact with the course instructor and their peers, ask questions, and receive feedback on their progress.

The video tutorials will cover various aspects of conducting literature reviews using AI tools, and learners will be able to access these tutorials at their own pace. The interactive sessions will be designed to allow learners to apply the knowledge and skills gained in the course to their own research projects. Learners will also have the opportunity to receive feedback on their work and engage in peer-to-peer review.

Assessment and Feedback:

Learners will be assessed through a combination of assignments, quizzes, and a final project. The final project will be a literature review using AI tools, which learners will develop and submit for feedback. The instructor will provide feedback on learners’ progress throughout the course, and learners will also have the opportunity to engage in peer-to-peer review.

What Will You Learn?

  • Module 1: Introduction to Systematic Literature Reviews and AI Tools
  • Overview of the course and learning objectives
  • Introduction to systematic literature reviews and their importance in research
  • Types of AI tools available for use in literature reviews
  • Advantages and disadvantages of using AI tools in literature reviews
  • Module 2: Formulating effective research questions for Systematic Review
  • Formulating research questions and identifying search terms using AI tools
  • Designing search strategies and selecting appropriate databases
  • Conducting effective searches using AI tools
  • Prompt Engineering for Developing effective research questions using ChatGPT and AI
  • Module 3: Conducting a Systematic Literature Review using AI Tools
  • Screening literature for relevance using AI tools
  • Extracting data from selected literature using AI tools
  • Analyzing and synthesizing data using AI tools
  • Prompt Engineering for summarizing, comparing and synthesizing literature using ChatGPT and AI tools
  • Module 4: Reporting a Systematic Literature Review using AI Tools
  • Using AI tools to generate summary reports and visualizations
  • Preparing research articles for High impact journals
  • Examples and article templates
  • Tools included
  • R studio (Biblioshiney)
  • Microsoft Excel

Course Content

Module 1: introduction to systematic literature reviews and ai tools • overview of the course and learning objectives • introduction to systematic literature reviews and their importance in research • types of ai tools available for use in literature reviews • advantages and disadvantages of using ai tools in literature reviews, live session 1: introduction to the course and working template overview, introduction to systematic literature review, module 2: formulating effective research questions for systematic review • formulating research questions and identifying search terms using ai tools • designing search strategies and selecting appropriate databases • conducting effective searches using ai tools • prompt engineering for developing effective research questions using chatgpt and ai, fundamentals of research questions in systematic literature review, developing research question using ai, live session 2 (friday): using chatgpt for developing introduction, live session 2 (sunday): using chatgpt for developing introduction, from research questions to keywords, module 3: conducting a systematic literature review using ai tools • screening literature for relevance using ai tools • extracting data from selected literature using ai tools • analyzing and synthesizing data using ai tools • prompt engineering for summarizing, comparing and synthesizing literature using chatgpt and ai tools, live session 3: literature search using prisma guidelines and descriptive analysis, live session 3: (sunday) prisma guidelines and descriptive analysis, data filteration in microsoft excel, microsoft excel: how to merge file into one sheet, how to install biblioshiny r package, working with biblioshiny r package for bibliometric, literature clustering using thematic maps in biblioshiny, module 4: reporting a systematic literature review using ai tools • using ai tools to generate summary reports and visualizations • preparing research articles for high impact journals • examples and article templates, live session 4: friday (using ai for writing within themes / clusters), how to generate themes (biblioshiny), merge cluster into themes, live session 4 (sunday): how to use jenni.ai and writing conclusion, additional learning additional learning material, bibliometrics using vosviewer, how to use chatgpt for systematic literature review, how to write a high quality abstract, miro graphics: collaborative brainstorming and idea generation, student ratings & reviews.

Muhammad Aledeh

Vist our webiste for latest online courses, books and research journals. Dismiss

Insert/edit link

Enter the destination URL

Or link to existing content

Searching for Systematic Reviews & Evidence Synthesis: AI tools in evidence synthesis

  • Define your search question
  • Searching Databases
  • Drawing up your search strategy
  • Advanced search techniques
  • Using Filters
  • Grey Literature
  • Recording your search strategy and results
  • Managing References & Software Tools
  • Further information
  • Library Workshops, Drop ins and 1-2-1s
  • AI tools in evidence synthesis

Introduction

A variety of AI tools can be used during the systematic review or evidence synthesis process. These may be used to assist with developing a search strategy; locating relevant articles or resources; or during the data screening, data extraction or synthesis stage. They can also be used to draft plain language summaries.

The overall consensus is that the AI tools can be very useful in different stages of the systematic or other evidence review but that it is important to fully understand any bias and weakness they may bring to the process. In many cases using new AI tools, which previous research has not assessed rigorously, should happen in conjunction with existing validated methods. It is also essential to consider ethical, copyright and intellectual property issues for example if the process involves you uploading data or full text of articles to an AI tool.

 Below are some recent published articles on the topic:

  • Alshami, A.; Elsayed, M.; Ali, E.; Eltoukhy, A.E.E.; Zayed, T. Harnessing the Power of ChatGPT for Automating Systematic Review Process: Methodology, Case Study, Limitations, and Future Directions . Systems 2023, 11, 351. https://doi.org/10.3390/systems11070351 Explores the use of ChatGPT in (1) Preparation of Boolean research terms and article collection, (2) Abstract screening and articles categorization, (3) Full-text filtering and information extraction, and (4) Content analysis to identify trends, challenges, gaps, and proposed solutions.
  • Blaizot, A, Veettil, SK, Saidoung, P, et al.  Using artificial intelligence methods for systematic review in health sciences: A systematic review.   Res Syn Meth . 2022; 13(3): 353-362. doi: 10.1002/jrsm.1553 The review below delineated automated tools and platforms that employ artificial intelligence (AI) approaches and evaluated the reported benefits and challenges in using such methods.They report the usage of Rayyan Robot Reviewer EPPI-reviewer; K-means; SWIFT-review; SWIFT-Active Screener; Abstrackr; Wordstat; Qualitative Data Analysis (QDA);  Miner and NLP and assess the quality of the reviews which used these.
  • Kebede, MM, Le Cornet, C, Fortner, RT.  In-depth evaluation of machine learning methods for semi-automating article screening in a systematic review of mechanistic literature.   Res Syn Meth . 2023; 14(2): 156-172. doi: 10.1002/jrsm.1589 "We aimed to evaluate the performance of supervised machine learning algorithms in predicting articles relevant for full-text review in a systematic review." "Implementing machine learning approaches in title/abstract screening should be investigated further toward refining these tools and automating their implementation"  

Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: a scoping review .  J Clin Epidemiol  2022;  144:  22-42  https://www.jclinepi.com/article/S0895-4356(21)00402-9/fulltext  "The current scoping review identified that LitSuggest, Rayyan, Abstractr, BIBOT, R software, RobotAnalyst, DistillerSR, ExaCT and NetMetaXL have potential to be used for the automation of systematic reviews. However, they are not without limitations. The review also identified other studies that employed algorithms that have not yet been developed into user friendly tools. Some of these algorithms showed high validity and reliability but their use is conditional on user knowledge of computer science and algorithms."

Khraisha Q, Put S, Kappenberg J, Warraitch A, Hadfield K.  Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages .  Res Syn Meth . 2024; 1-11. doi: 10.1002/jrsm.1715 "Although our findings indicate that, currently, substantial caution should be exercised if LLMs are being used to conduct systematic reviews, they also offer preliminary evidence that, for certain review tasks delivered under specific conditions, LLMs can rival human performance."

Mahuli, S., Rai, A., Mahuli, A. et al. Application ChatGPT in conducting systematic reviews and meta-analyses . Br Dent J 235, 90–92 (2023). https://doi.org/10.1038/s41415-023-6132-y Explores using ChatGPT for conducting Risk of Bias analysis and data extraction from a randomised controlled trial.

Ovelman, C., Kugley, S., Gartlehner, G., & Viswanathan, M. (2024). The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study . Cochrane Evidence Synthesis and Methods, 2(2), e12041.  https://onlinelibrary.wiley.com/doi/abs/10.1002/cesm.12041 

Qureshi, R., Shaughnessy, D., Gill, K.A.R.  et al.   Are ChatGPT and large language models “the answer” to bringing us closer to systematic review automation? .  Syst Rev   12 , 72 (2023). https://doi.org/10.1186/s13643-023-02243-z "Our experience from exploring the responses of ChatGPT suggest that while ChatGPT and LLMs show some promise for aiding in SR-related tasks, the technology is in its infancy and needs much development for such applications. Furthermore, we advise that great caution should be taken by non-content experts in using these tools due to much of the output appearing, at a high level, to be valid, while much is erroneous and in need of active vetting."

van Dijk SHB, Brusse-Keizer MGJ, Bucsán CC, et al. Artificial intelligence in systematic reviews: promising when appropriately used . BMJ Open 2023;13:e072254. doi: 10.1136/bmjopen-2023-072254  Suggests how to conduct a transparent and reliable systematic review using the AI tool ‘ASReview’ in the title and abstract screening.

An update on machine learning AI in systematic reviews

June 2023 webinar including a panel discussion exploring the use of machine learning AI in Covidence (screening & data extraction tool).

CLEAR Framework for Prompt Engineering

  • The CLEAR path: A framework for enhancing information literacy through prompt engineering. This article introduces the CLEAR Framework for Prompt Engineering, designed to optimize interactions with AI language models like ChatGPT. The framework encompasses five core principles—Concise, Logical, Explicit, Adaptive, and Reflective—that facilitate more effective AI-generated content evaluation and creation. more... less... Lo, L. S. (2023). The CLEAR path: A framework for enhancing information literacy through prompt engineering. The Journal of Academic Librarianship, 49(4), 102720.

Selection of AI tools used in Evidence Synthesis

  • Systematic Review Toolbox The Systematic Review Toolbox is an online catalogue of tools that support various tasks within the systematic review and wider evidence synthesis process.
  • Rayyan Free web-tool designed to speed up the process of screening and selecting studies
  • Abstrackr Aids in citation screening. Please note you will need to create a free account before accessing the tool.
  • DistillerSR An online application designed to automate all stages of the systematic literature reviews. Priced packages available (please note we cannot offer support on using this system).
  • ExaCT Information Extraction system. Please note you will need to request a free account. The system is trained to find key information from scientific clinical trial publications, namely the descriptions of the trial's interventions, population, outcome measures, funding sources, and other critical characteristics. Please note you will need to request a free account.
  • RobotReviewer RobotReviewer is a machine learning system which aims to support evidence synthesis. The demonstration website allows users to upload RCT articles and see automatically determined information concerning the trial conduct (the 'PICO', study design, and whether there is a risk of bias).

Selection of tools to support the automation of systematic reviews (2022)

Khalil H, Ameen D, Zarnegar A. Tools to support the automation of systematic reviews: a scoping review. J Clin Epidemiol. 2022 Apr;144:22-42. doi: 10.1016/j.jclinepi.2021.12.005. Epub 2021 Dec 8. PMID: 34896236.  https://www.sciencedirect.com/science/article/pii/S0895435621004029?ref=pdf_download&fr=RR-2&rr=821cfdcf2d377762#tbl0004 [accessed 06-11-23].

Summary of validated tools available for each stage of the review

Screenshot of Table 4. Summary of validated tools available for each stage of the review

King’s guidance on generative AI for teaching, assessment and feedback

  • King’s guidance on generative AI for teaching, assessment and feedback comprehensive guidance aims to support the adoption and integration of generative AI at different institutional levels - macro (university), meso (department, programme, module), and micro (individual lecturers, especially those with assessment roles).

Leveraging GPT-4 for Systematic Reviews

Recording of 1 hour webinar exploring Artificial Intelligence (AI) and its potential impact on the process of systematic reviews (August 15th, 2023). Note PICO Portal is a systematic review platform that leverages artificial intelligence to accelerate research and innovation.

Moderator Dr Greg Martin. Presenters: Eitan Agai - PICO Portal Founder & AI Expert; Riaz Qureshi - U. of Colorado Anschutz Medical Campus; Kevin Kallmes - Chief Executive Officer, Cofounder; Jeff Johnson - Chef Design Officer.

  • << Previous: Library Workshops, Drop ins and 1-2-1s
  • Last Updated: Mar 21, 2024 2:17 PM
  • URL: https://libguides.kcl.ac.uk/systematicreview

© 2017 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454

Banner

Systematic and Literature Reviews

  • Literature Reviews - overview
  • Types of review
  • What's involved?
  • PRISMA 2020
  • Systematic Reviews - Planning
  • Systematic Reviews - Where to Search
  • Systematic Reviews - Screening and Assessing Records
  • Organising your articles
  • Endnote / Reference Management
  • Systematic Reviews - Resources
  • Scoping Reviews
  • Using AI's in reviews
  • Library Support
  • Systematic Review Reading List

systematic literature review using ai tools

Can Artificial Intelligence (AI) tools such as ChatGPT be used to produce systematic reviews?

ChatGPT can certainly produce a convincing looking review, but there are a few issues:

  • This includes referencing. ChatGPT will produce convincing looking references that do not refer to an actual source.
  • it may also mean that answers do not include recent evidence, depending on when the corpus of literature ChatGPT was trained on dates to, and how often it is retrained or updated. Which we don't know
  • For general enquiries these tools may produce 'good enough' answers, but for systematic reviews there is an expectation of transparency of method, rigour of assessment and so forth. These are absent from ChatGPT answers - we don't what it searched or how, or how it selected the references it chose to use in its answer. 

So this all sounds terrible - as quick and plausible as ChatGPT may be, the response it produces may include false information, false sources and a completely opaque methodology. So is there any way it can be used?

  • this does not include writing search strategies. ChatGPT can produce a convincing search strategy, but - surprise! - it has been shown to make up components such as MeSH terms that don't exist. While it can handle Boolean operators easily it seems to (so far) not make use of functions such as truncation, wildcard characters or proximity searches.
  • Elicit.org . Elicit does not generate answers, rather it uses the same Large Language Model (LLM) training as ChatgPT to interpret your question. It then searches  the 115M papers from the Semantic Scholar Academic Graph database and shows ranked snippets from the best results. It is best used to find research (especially on difficult to search topics) and generate ideas, than to produce any form of ready to go answer. It can  be prompted to extract aspects of interest from the results such as population, outcomes measured or main findings.
  • Perplexity.ai  Perplexity combines AI with web search to produce ready made answers. It cites its sources, which are real but tend not to be scholarly. Again it is possibly best suited to generating ideas and identifying sources than to any significant contribution to producing a review.

Of course all of this will change. The use of AI for evidence synthesis is a rapidly developing field, but for clinical use it will still be necessary that syntheses meet the underlying standards of transparency and rigour which are so far absent. Keep this in mind when reading the latest tech hype.

Further Reading

Guidance for Authors, Peer Reviewers, and Editors on Use of AI, Language Models, and Chatbots - JAMA July 2023

Systematic Reviewing and ChatGPT - PICO Portal webinar

What academic research is ChatGPT accessing?  LinkedIn post

How Q&A systems based on large language models (eg GPT4) will change things if they become the dominant search paradigm - 9 implications for libraries.  Blog

Using Large language models like GPT to do Q&A over papers (II) — using Perplexity.ai (free) over CORE, Scite.ai, Semantic Scholar etc domain s. Blog

Academic Publishers Are Missing the Point on ChatGPT . Blog - Scholarly Kitchen.

Using artificial intelligence methods for systematic review in health sciences: A systematic review.  Res Syn Meth . 2022; 13( 3): 353- 362. doi: 10.1002/jrsm.1553

  • << Previous: Scoping Reviews
  • Next: Library Support >>
  • Last Updated: Feb 9, 2024 1:35 PM
  • URL: https://libguides.mh.org.au/systematic_and_literature_reviews

systematic literature review using ai tools

System for Systematic Literature Review Using Multiple AI agents: Concept and an Empirical Evaluation

[Uncaptioned image]

Systematic Literature Reviews (SLRs) have become the foundation of evidence-based studies, enabling researchers to identify, classify, and combine existing studies based on specific research questions. Conducting an SLR is largely a manual process. Over the previous years, researchers have made significant progress in automating certain phases of the SLR process, aiming to reduce the effort and time needed to carry out high-quality SLRs. However, there is still a lack of AI agent based models that automate the entire SLR process. To this end, we introduce a novel multi-AI agent model designed to fully automate the process of conducting an SLR. By utilizing the capabilities of Large Language Models (LLMs), our proposed model streamlines the review process, enhancing efficiency and accuracy. The model operates through a user-friendly interface where researchers input their topic, and in response, the model generates a search string used to retrieve relevant academic papers. Subsequently, an inclusive and exclusive filtering process is applied, focusing on titles relevant to the specific research area. The model then autonomously summarizes the abstracts of these papers, retaining only those directly related to the field of study. In the final phase, the model conducts a thorough analysis of the selected papers in relation to predefined research questions. This paper details the development of the model, and its operational framework, and demonstrates how it significantly reduces the time and effort traditionally required for SLR while ensuring a high level of comprehensiveness and precision. We also conducted an evaluation of the proposed model by sharing it with ten competent software engineering researchers for testing and analysis. The researchers expressed strong satisfaction with the proposed model and provided feedback for further improvement. In the future, we plan to engage 50 practitioners and researchers to evaluate our model. Additionally, we aim to present our model to the audience at the SANER 2024 conference in Rovaniemi (Finland) for further testing, analysis, and feedback collection. The code for this project can be found on the GitHub repository at https://github.com/GPT-Laboratory/SLR-automation.

Keywords  Systematic Literature Review, Large Language Model, AI Agent, Software Engineering

1 Introduction

The Systematic Literature Review (SLR) is a fundamental component of academic research, offering a comprehensive and unbiased overview of existing literature on a specific topic Keele et al. ( 2007 ) . It involves a structured methodology for identifying, evaluating, and synthesizing all relevant research to address clearly defined research questions Kitchenham et al. ( 2009 ) . This process is critical for establishing the context and foundation of new research, identifying gaps in current knowledge, and informing future research directions van Dinter et al. ( 2021 ) . However, conducting an SLR is inherently time-consuming and labor-intensive. It requires meticulous planning, extensive searching, and rigorous screening of large volumes of literature. The complexity and scale of this task, especially in fields with vast and rapidly expanding bodies of work, can be daunting and resource-intensive. The challenge lies not only in the collection of relevant literature but also in the accurate synthesis and interpretation of the gathered data.

The emergence of Large Language Models (LLMs) in Artificial Intelligence (AI) presents new opportunities for automating and streamlining the SLR process Rasheed et al. ( 2024a ) , Rasheed et al. ( 2023 ) . LLMs, trained on extensive datasets of text, are adept at understanding and generating human-like language Carlini et al. ( 2021 ) . They can process and analyze large volumes of text rapidly, offering insights and summaries that would take humans significantly longer to compile. Their ability to understand context and nuances in language makes them particularly useful for tasks like identifying relevant literature, extracting key information, and summarizing research findings Hou et al. ( 2023 ) . By automating the more tedious and repetitive aspects of the SLR process, LLMs can significantly reduce the time and effort required, allowing researchers to focus on the more nuanced aspects of their research Rasheed et al. ( 2024b ) .

In this context, our proposed model utilizes the capabilities of LLMs to automate the whole SLR process. We developed a multi-AI agent model that automates each step of the SLR, from the initial literature search to the final analysis. The model begins with a simple user input–researchers enter their topic into a designated text box. This input is then processed by the LLM, which generates a precise search string tailored to retrieve the most relevant academic papers. The model’s next phase involves an intelligent filtering mechanism. It applies an inclusive and exclusive theory, screening titles and abstracts to retain only those studies that are directly relevant to the specified research area.

The final stage of our model autonomously summarizes the abstracts of the selected papers, ensuring that only content pertinent to the research questions is retained. It introduces a level of precision and consistency in data analysis that is challenging to achieve manually. Finally, the model conducts an in-depth analysis of the selected papers, aligning its findings directly with the research questions. This comprehensive approach ensures that the final output is not only a reflection of the vast array of literature available but also a focused and relevant resource tailored to the specific needs of the researcher. Our model, therefore, stands as a testament to the potential of integrating advanced AI technologies in academic research methodologies.

We also evaluated the efficiency and accuracy of our proposed model by sharing it with ten proficient software engineering researchers for a comprehensive test and analysis. The feedback received was overwhelmingly positive, highlighting the model’s effectiveness and paving the way for further enhancements. Looking ahead, we aim to expand our evaluation by involving 50 additional practitioners and researchers. Furthermore, we intend to showcase our model at the upcoming SANER 2024 conference in Rovaniemi (Finland), seeking to broaden its testing and analysis while gathering valuable feedback from a wider audience. This step is crucial in refining our model and ensuring its applicability and robustness in diverse real-world scenarios.

Our contribution can be summarized as follows:

We propose a novel multi-AI agent model that utilizes LLMs to automate the SLR process, significantly enhancing efficiency and accuracy.

Our model was evaluated by ten experienced software engineering researchers and practitioners, confirming its effectiveness and gathering insights for further refinement.

We plan to extend the evaluation to 30 additional practitioners and researchers and present the model at the SANER 2024 conference in Rovaniemi (Finland) for wider testing and feedback.

2 Related Work

Bartholomew ( 2002 ) conducted the first SLR to carried out systematic clinical trials to identify effective treatments for scurvy. His trials, which rigorously evaluated various potential remedies, notably highlighted the effectiveness of oranges and lemons as the most successful treatments Bartholomew ( 2002 ) . In the domain of SE research, the SLR approach was introduced by Kitchenham ( 2004 ) . This framework was instrumental in adapting the principles of systematic reviews, already prevalent in fields like healthcare and social sciences, to the specific challenges and needs of SE research. Following this development, SLRs have become an extensively used practice to support evidence-based material in SE. The success of SLRs in facilitating evidence-based studies has motivated other researchers to adopt this approach in their work Kitchenham et al. ( 2009 ) . However, Undertaking SLRs is often a challenging endeavor, encompassing various activities such as gathering, assessing, and recording evidence. These tasks within SLRs are typically done manually, without the aid of automation or decision support tools, making the process not only time-intensive but also susceptible to errors van Dinter et al. ( 2021 ) . Many researchers make progress to automate the process of SLR van Dinter et al. ( 2021 ) .

Current research efforts are primarily focused on refining the SLR process to optimize precision while ensuring high recall, addressing the precision shortcomings often found in existing methods O’Mara-Eves et al. ( 2015 ) . Additionally, there’s a significant push towards reducing human errors, especially since many steps in the review process are highly repetitive Marshall et al. ( 2016 ) . In this context, the works of K.R. Felizardo and J.C. Maldonado are notable. They have explored the shift from traditional, repetitive, and error-prone SLR methods towards the application of visual text mining. This approach, as outlined in their articles Felizardo et al. ( 2012 ) , Felizardo et al. ( 2014 ) , Felizardo et al. ( 2011 ) , Malheiros et al. ( 2007 ) leverages unsupervised learning to assist users in identifying relevant articles, though it does require users to have a background in machine learning and statistics.

Olorisade et al. ( 2016 ) presented an innovative ML model designed to automate the primary study selection process in SLRs, potentially streamlining this critical step and significantly reducing the manual effort involved in sifting through vast quantities of academic literature. Shakeel et al. ( 2018 ) provided valuable insights into potential threats that could arise when automating the SLR process. Feng et al. ( 2017 ) highlighted various text mining techniques currently employed in SLRs, a foundation upon which our tool builds. Significantly, Paynter et al. ( 2016 ) presented a comprehensive report delineating the application of text mining (TM) techniques in automating various stages of the SLR process, including selection, extraction, and updates. This aligns closely with our tool’s objectives. Clark et al. ( 2020 ) demonstrated the feasibility of completing an SLR in a markedly reduced time frame using multiple tools, a precedent for the efficiency our tool aims to achieve.

Michelson and Reuter ( 2019 ) provided an economic analysis and time estimates for SLRs, underscoring the need for automated solutions – a call that our tool directly responds to. In a similar vein, Beller et al. ( 2018 ) not only listed tools useful for automating SLRs but also established eight guidelines that have informed our tool’s development.

Jonnalagadda et al. ( 2015 ) detailed methods for data extraction from published reports, which has been instrumental in shaping our tool’s data handling capabilities. Moreover, Marshall and Wallace ( 2019 ) and O’Connor et al. ( 2019 ) have respectively listed useful tools for systematic reviews and articulated barriers to the adoption of such tools, providing a comprehensive understanding of the current landscape and user hesitance in this domain. Further contributions include O’Mara-Eves et al. ( 2015 ) and O’Mara-Eves et al. ( 2015 ) , who respectively conducted an SLR on text mining in the automation of SLRs and described the automation potential across different steps in the SLR process. These works have been pivotal in identifying areas where our tool can be most impactful. Additionally, Jaspers et al. ( 2018 ) and Thomas et al. ( 2011 ) have explored machine learning techniques and the application of TM techniques in automating the SLR process, which have been key influences in our tool’s design. Lastly, the survey by Van Altena et al. ( 2019 ) , highlighting the limited use of SLR tools among researchers, emphasizes the need for more user-friendly and efficient solutions like the one our tool aims to provide.

Despite these advancements in automating the SLR process, there remains a notable gap in the complete automation of SLRs using LLMs. Addressing this gap, we have developed a novel approach: a multi-agent model based on LLMs. This innovative model is designed to fully automate the SLR process, utilizing the advanced capabilities of LLMs to efficiently manage and synthesize vast amounts of data, which is a significant step forward in the field of automated literature reviews.

3 Research Method

This research aims to investigate how an LLM-based multi-agent model can be utilized to automate the entire process of SLRs. We also outline the methodology for testing and analyzing the capabilities of the proposed model. Below, we discuss how our LLM-based multi-agent model collaborates and performs such tasks. We have formulated the following research questions (RQs):

RQ1. How does a LLM-based multi-agent system transform traditional methodologies to automate the systematic literature review process in SE?

Motivation : The motivation for the research question arises from the need to enhance the efficiency and effectiveness of literature review processes in the rapidly evolving field of SE. Traditional methods of conducting literature reviews are often time-consuming and labor-intensive, potentially leading to delays in research progress and the dissemination of new knowledge. The integration of LLMs promises a paradigm shift, potentially automating and streamlining these processes. By exploring the transformation brought about by an LLM-based multi-agent system, this research seeks to reduce the time and effort required for comprehensive literature reviews and also to increase the accuracy and scope of these reviews. This could result in more timely and informed research outcomes in SE, a field where staying abreast of current trends, methodologies, and discoveries are crucial for technological advancement and innovation.

RQ2. How can the efficiency and accuracy of the proposed LLM-based multi-agent model be evaluated?

Motivation : The motivation behind this research question is based on the critical need to validate and quantify the performance of the proposed model, specifically in conducting SLRs. With the introduction of sophisticated models like LLM-based multi-agent systems, establishing rigorous evaluation criteria to assess their real-world applicability and reliability becomes imperative. This question addresses the necessity of a proposed model that can systematically measure its efficiency and accuracy in selecting and interpreting relevant literature. Evaluating the proposed model is crucial to ensure that integrating such models into academic workflows enhances, rather than compromises, the quality of research outputs.

3.1 LLM-Based Assisted Systematic Literature Review

The section focuses on research methodology for developing an LLM-based multi-agent model. This model is specifically engineered to automate the entire process of conducting SLRs. The innovation lies in its ability to transform a given research topic into a comprehensive review through a series of automated, interconnected steps. Each step is managed by a specialized agent within the model, working collaboratively to ensure a seamless and efficient literature review process. In Figure LABEL:fig:_multi-agent_system , we illustrate how agents collaborate with each other to generate a response. Below, we also provide the detailed functionality of each agent within this multi-agent system.

3.1.1 Planner agent

The first agent in our model is dedicated to generating research questions and purpose and search strings. Upon receiving a research objective from the end-user, For the search string we provide research questions and objective to the agent, it generates a search string. this agent employs advanced language understanding algorithms to interpret the topic’s key elements. The LLM-based algorithm, designed for deep semantic understanding, analyzes the topic to extract key concepts, themes, and terminologies. It then utilizes its extensive training on diverse textual data to construct a precise and comprehensive search string. This string is formulated by combining relevant keywords, synonyms, and technical terms that capture the essence of the research question. Furthermore, the algorithm is adept at understanding context and varying semantic structures, enabling it to refine the search string to match specific research domains. The generated search string is crucial in accurately retrieving relevant literature from various academic databases. By ensuring that the initial search is both thorough and focused, the agent significantly enhances the efficiency and quality of the literature collection process. This sets a solid foundation for the subsequent stages of the SLR, where the depth and breadth of the collected literature play a crucial role.

3.1.2 Literature identification agent

Following the generation of the search string, research questions, and the purpose of each question, the next agent takes over the task of literature retrieval. This agent is responsible for using the search string to query academic databases and retrieving initial sets of papers that are potentially relevant to the research topic. It employs sophisticated filtering algorithms to manage the vast amount of available data, selecting papers by their title which are most closely align with the predefined parameters of the search string. This step is crucial in narrowing down the pool of literature to a manageable size for in-depth review.

Refer to caption

3.1.3 Data extraction agent

The third agent in our model is tasked with refining the literature using inclusion and exclusion criteria based on the research objectives. Initially, it employs our LLM algorithm to analyze the titles of retrieved papers, discerning their relevance to the research topic. This step involves text analysis, where the LLM algorithm identifies key terms and concepts that align with the research objectives. By applying these predefined rules, the agent effectively filters out irrelevant material, ensuring the literature review remains focused and pertinent to the research questions.

Following the title analysis, the agent proceeds to analyze the abstracts of selected papers. LLM algorithm conducts a more in-depth text analysis, evaluating the context, methodologies, and findings in the abstracts to assess their relevance. The final and most comprehensive step involves analyzing the full content of each paper. This thorough examination encompasses the whole paper, allowing the agent to evaluate how each paper’s content and findings relate to the specific research question. This agent extracts key information, Answers each question of the filtered papers, and shows its data in tabular form, a. It then synthesizes this information to provide a comprehensive overview of the current state of research on the topic. This synthesis is crucial in understanding the broader context and implications of the findings within the selected literature.

3.1.4 Data compilation agent

The final agent in our multi-agent model is responsible for analyzing the synthesized data in relation to the research questions and objectives. It assesses trends, identifies gaps in the literature, and draws conclusions based on the aggregated information. This agent also prepares a report that summarizes the findings of the literature review, providing a clear and concise overview of the research landscape for the given topic.

Each agent in the LLM-based multi-agent model plays a vital role in automating the systematic literature review process. From generating search strings to reporting findings, the agents work in a coordinated manner to ensure a thorough, efficient, and accurate review. This methodology represents a significant advancement in the way SLRs are conducted, offering a more streamlined and effective approach to academic research.

3.2 Performance Validation

In this project, we engaged the expertise of 10 researchers to evaluate the efficiency and performance of our proposed model. These professionals were approached from various industries and research groups to ensure a diverse range of perspectives. Our evaluation methodology was anchored in a practice-oriented framework, focusing on rigorous scrutiny of the model’s utility and efficacy by seasoned industry professionals. This approach guaranteed a detailed and perceptive evaluation, factoring in the real-world applications and needs of experts in the corresponding fields. The inclusion of insights from a diverse group of experienced contributors was instrumental in our aim to deliver a comprehensive overview of the model’s functionality and its possible influence in the industry.

3.2.1 Professional-based evaluation

In the validation stage of our research, we collaborated with ten experts from diverse sectors, including academia and industry, to evaluate the efficacy of our LLM-based model. This selection was strategic, and aimed at capturing a wide range of perspectives and insights on the model’s performance across various professional contexts. To facilitate a thorough evaluation, each expert was provided with access to the model, accompanied by comprehensive instructions on its intended use and capabilities. This approach ensured that they were adequately equipped to conduct an in-depth assessment of the model within their respective domains.

Participant’s selection : Initially, we conducted a search for suitable researchers from various research groups and expanded our outreach to individuals through social networks. We reached out to researchers via their ResearchGate and Google Scholar profiles. From these platforms, we recruited five participants, and the remaining five were identified through social networking sites and professional contacts. Consequently, we assembled a group of ten participants, referred to as P1 to P10 in Table 1 , for the evaluation of our proposed model. Table 1 demonstrates that these participants come from a variety of fields, such as Software Engineering, and Machine Learning/Deep Learning Development.

Data collection : The methodology employed a systematic approach to gather feedback data. Participants were tasked with integrating the model into their research and providing a topic relevant to their expertise. They then compared the results generated by our model with those obtained manually. To collect their feedback, we implemented a comprehensive feedback mechanism that enabled them to document their experiences, observations, and critiques systematically. This mechanism was geared towards obtaining detailed responses about the model’s efficiency, user-friendliness, and accuracy in data analysis. The format of the feedback was meticulously crafted to extract in-depth qualitative insights, thereby facilitating the assessment of quantifiable elements of the model’s performance.

Data analysis : In the final phase of our validation process, analyzing participant feedback was essential for ongoing model improvement. Participants were allowed to choose any data source for input. We gathered and scrutinized their feedback to understand how the model performed in various scenarios. To evaluate the model’s performance, we used an extensive Likert scale ranging from ’Not Satisfied’ to ’Excellent,’ with intermediate options like ’Fair,’ ’Satisfactory,’ ’Good,’ and ’Very Good.’ This scale provided a nuanced spectrum for evaluation, enabling precise and comprehensive grading of the model’s effectiveness. Our iterative approach was driven by a focus on enhancing the model’s functionality, user experience, and overall efficacy, aiming to fulfill the varied needs and expectations of professionals in qualitative data analysis.

This section presents the results obtained from implementing an AI-agent-based model aimed at automating the SLR process in SE. The findings are detailed in accordance with two primary research questions (RQs) that steered the creation and evaluation of the model.

4.1 LLM Based Multi-Agent Model (RQ1)

Our research introduces an LLM-based multi-agent model that redefines the conventional approach to SLRs in SE. The multi-agent model developed for automating the SLR process has demonstrated its efficacy through a structured and sequential workflow. The process begins with the input of a research topic, as depicted in Figure 2 .

Refer to caption

Upon receiving the topic, the model systematically generates a pertinent set of research questions. As illustrated in Figure 2 , it formulated questions like ’How have large language models been utilized in various aspects of the software development process?’ and ’What challenges and limitations exist in the adoption and implementation of large language models in software development?’ These questions play a crucial role in guiding the literature search and analysis process. Subsequent to the research question formulation, the model proceeds to generate a search string. In this case, the search string “large language models OR software development” was created, coupled with a specified year to narrow down the search scope, thereby enhancing the relevance and precision of the search results.

Refer to caption

The subsequent phase involves retrieving papers that match the generated search string. This feature of the model is specifically designed to fetch research papers from various databases, as demonstrated by the list of papers shown in Figure 3 . For this demonstration, we focused solely on papers published in the year 2023, setting the model to retrieve only 10 papers from that year, all relevant to this field. The tool efficiently compiles relevant information such as the title, author, publication URL, journal name, DOI, paper type, affiliation country, and the affiliation institution. Moreover, the model is equipped with the capability to apply inclusive and exclusive criteria based on titles, which further refines the search results to ensure only the most pertinent literature is considered for review. As shown in Figure 4 , only three papers were selected for an in-depth analysis.

Finally, the model extracts data based on the formulated RQs. This advanced feature is exemplified in the demo where detailed answers are provided for the previously generated RQs. For instance, the answer to the first RQ discusses the varied applications of large language models in the software development lifecycle and highlights specific instances of their use, like the inference from the paper “InferLink End-to-End Program Repair with Large Language Models.”

Refer to caption

In conclusion, the automated SLR tool has showcased its ability to streamline the laborious process of literature review, from defining the scope of the research to extracting and synthesizing data pertinent to the research questions. The demonstration affirms the model’s potential in significantly reducing the time and effort conventionally required for conducting systematic literature reviews.

Refer to caption

4.2 Evaluation Result (RQ2)

The empirical evaluation of our tool was conducted by involving ten researchers and practitioners from diverse backgrounds within the SE community. Their engagement with the model provided a comprehensive perspective on its practical utility and user experience. The feedback was overwhelmingly positive, with 80% of the participants approving the tool’s functionality and recognizing its contribution to simplifying the SLR process.

Despite the general consensus on the model’s efficacy, 20% of the participants recommended improvements. Specific suggestions highlighted the need for a more nuanced interpretation of complex research queries and the generation of more refined search strings. This constructive feedback is invaluable as it directs the focus toward enhancing the model’s interpretative algorithms and its ability to handle ambiguous or multifaceted research questions.

In pursuit of continuous improvement, the model is scheduled for further exposure and evaluation. The SANER 2024 conference in Rovaniemi (Finland) presents an opportunity for a wide array of feedback from the SE research community, which will be instrumental in the model’s iterative development. Furthermore, a large-scale testing initiative is planned, where the model will be disseminated to a group of 50 researchers and practitioners for extensive evaluation.

This forthcoming phase is expected to yield deeper insights into the model’s generalizability and performance across various domains within SE. It will also help to pinpoint any subtle nuances in SLR processes that the model needs to accommodate. The comprehensive feedback will be integral to refining the model, ensuring that the final version is not only effective and efficient but also versatile and user-friendly. The ultimate objective is to deliver a robust, universally applicable tool that standardizes and automates SLRs, contributing to the advancement of research methodologies in SE.

5 Discussions

The results derived from the implementation of our multi-AI agent model for SLR have been both encouraging and insightful. The model successfully automated key components of the SLR process, including the generation of search strings, the selection and filtering of relevant literature, and the summarization of key findings. This automation significantly reduced the time and effort typically required in conducting SLRs, while maintaining, and in some aspects enhancing, the accuracy and comprehensiveness of the review. The model’s ability to process and analyze large volumes of text rapidly, and its precision in identifying relevant studies, demonstrated the substantial potential of integrating LLMs in academic research.

The implications of these results are far-reaching. Firstly, the model presents a valuable tool for researchers across various fields, reducing the barriers to conducting comprehensive literature reviews. This efficiency can accelerate the pace of research and discovery, enabling scholars to focus on more complex and creative aspects of their work. Additionally, the model’s standardization of the SLR process can potentially lead to more consistent and replicable research outcomes, a cornerstone in scientific research. The reduction in manual effort also opens up opportunities for researchers with limited resources or those facing constraints such as time pressures, broadening the scope of who can conduct thorough literature reviews.

Looking ahead, the future impact of our model is poised to be significant. A key milestone will be our attendance at the Sanner Conference on March 12th, where we will showcase our developed tool to a diverse audience. This event will not only serve as a platform to demonstrate the capabilities of our model but also as a critical opportunity to gather feedback from a wide range of users. This feedback will be invaluable in refining and enhancing the model further. Understanding how the model performs in real-world scenarios and gathering diverse perspectives will enable us to tailor it more closely to the needs of the research community. Following this, we plan to implement updates and improvements based on this feedback, ensuring that our tool remains at the forefront of innovation in SLR automation. The continued development and adaptation of our model in response to user input will ensure its relevance and utility in the ever-evolving landscape of academic research.

In addition, the long-term impact of our work on researchers and the broader academic community is expected to be substantial. Our model represents a paradigm shift in how SLRs are conducted, offering a tool that is not only efficient but also adaptable to the evolving needs of researchers. One significant future impact is the democratization of research. By simplifying the SLR process, our tool makes high-quality literature reviews accessible to a wider range of researchers, including those from institutions with fewer resources or those new to the field. This accessibility could lead to a more diverse range of voices and perspectives in academic research, enriching the field as a whole.

Furthermore, the model’s efficiency in handling large volumes of data makes it an invaluable asset in fields where literature is vast and rapidly growing, such as biomedical research, environmental studies, and technology. Researchers in these fields can stay abreast of the latest developments more effectively, ensuring their work is informed by the most current and comprehensive data available.

In the domain of interdisciplinary research, our model can facilitate the synthesis of information across different fields, potentially leading to novel insights and innovations. By efficiently collating and analyzing diverse sets of literature, the tool can help uncover connections between disciplines that might otherwise be overlooked.

The long-term adaptation of our model based on user feedback and technological advancements will also ensure its ongoing relevance. Continuous updates will allow the model to incorporate the latest AI advancements, further enhancing its capabilities and ensuring it remains a cutting-edge tool for SLR. Moreover, the model’s potential for customization will allow it to cater to the specific needs of different research domains. This customized approach means that the tool can be fine-tuned to deliver more targeted and relevant results, depending on the specific requirements of the research question or field.

6 Limitation

This study, while contributing valuable insights into the SE field. However, there is several limitations that necessitate attention in future iterations of the research. Primarily, the initial search strategy employed for identifying relevant literature was sub optimal. The absence of a comprehensive use of Boolean operators, notably the lack of "AND" in the search strings, potentially compromised the specificity and thoroughness of the literature search, leading to an incomplete representation of the available evidence. This issue underscores the need for a more rigorously defined search strategy to enhance the precision and relevance of retrieved documents.

Furthermore, the methodology exhibited a significant gap in its approach to literature selection, characterized by an absence of clearly defined criteria for primary and secondary exclusion. This oversight likely resulted in a less rigorous filtering process, diminishing the study’s ability to exclude irrelevant or low-quality studies systematically. Implementing explicit inclusion and exclusion criteria will be crucial for improving the reliability and validity of the literature review in subsequent versions of the paper.

Another critical limitation observed was in the data extraction phase. Although data were extracted based on predefined research questions, the reliability of the extracted information is questionable due to the lack of a robust analytical algorithm. The current methodology does not adequately ensure the accuracy and relevance of the extracted data, which is a cornerstone for drawing reliable conclusions. Future iterations of this research will benefit substantially from the integration of advanced analytical algorithms capable of more advanced data analysis. Such algorithms should not only extract data more efficiently but also evaluate the quality and applicability of the information in relation to the research objectives.

Addressing these limitations is essential for advancing the research’s contribution to the field. Enhancements in search strategy, literature screening, and data analysis will not only refine the methodological approach but also improve the study’s overall credibility and impact. Future work will focus on fixing these issues to establish a more reliable and comprehensive research framework.

7 Future Work

Addressing the identified limitations presents a pathway for enhancing the comprehensiveness of our research in future iterations. The upcoming version of this paper will aim to implement several key improvements.

Refinement of search strategy: To overcome the limitations posed by an inadequate search string, future work will involve the development of a more sophisticated search strategy. This will include the comprehensive use of Boolean operators, particularly the incorporation of "AND" to ensure the specificity and thoroughness of the literature search. A systematic approach to defining search strings will be adopted to enhance the precision and relevance of retrieved documents.

Implementation of explicit exclusion and inclusion criteria: Recognizing the absence of clearly defined criteria for primary and secondary literature exclusion as a significant gap, future efforts will focus on establishing explicit inclusion and exclusion criteria. This refinement will facilitate a more rigorous and systematic screening process, thereby improving the study’s ability to exclude irrelevant or low-quality studies systematically and ensuring a more reliable and valid literature review.

Advancement of data extraction methods: The preliminary phase highlighted the need for a more reliable data extraction mechanism. To address this, future work will incorporate advanced analytical algorithms designed to ensure the accuracy and relevance of the extracted data. These algorithms will not only facilitate more efficient data extraction but will also provide a means to critically evaluate the quality and applicability of the information in relation to the research objectives. The integration of machine learning and natural language processing techniques will be explored to automate and enhance the data extraction and analysis process.

Enhancement of analytical framework: Acknowledging the limitations in the initial data analysis, future research will aim to develop and implement a more robust analytical framework. This framework will be designed to analyze the extracted data comprehensively, incorporating both qualitative and quantitative methodologies as appropriate. Emphasis will be placed on ensuring the reliability and validity of the findings through rigorous statistical testing and sensitivity analyses.

Broadening of literature scope: To counteract any potential biases or gaps in the literature review caused by the initial search limitations, future research will broaden its scope to include a wider range of databases and grey literature. This expansion will ensure a more comprehensive coverage of the subject matter, encompassing diverse perspectives and emerging research trends.

Stakeholder engagement: Recognizing the value of stakeholder insights in refining research methodologies, future iterations will involve engaging with domain experts, researchers, and practitioners. This engagement will provide critical feedback on the research design, methodologies, and findings, contributing to a more nuanced and impactful research outcome.

By systematically addressing these limitations, future work will significantly enhance the study’s contribution to the field, providing a more robust, comprehensive, and reliable foundation for understanding the research topic. These improvements will not only address the current study’s shortcomings but also set a precedent for methodological rigor in similar research endeavors.

8 Conclusions

The development and implementation of our multi-AI agent model represent a significant advancement in the field of SLR. By integrating the capabilities of LLMs, this research demonstrates a novel approach to automating and optimizing the SLR process. Our model addresses the primary challenges associated with traditional SLR methods: the time-consuming nature of the process and the potential for human error or bias in literature selection and analysis. By automating the initial search, screening, summarization, and analysis phases, the model significantly reduces the manual effort and time required, while also enhancing the accuracy and consistency of the results.

The use of a simple user interface for topic input and subsequent generation of tailored search strings illustrates the model’s user-friendly approach, making complex SLR processes accessible to a broader range of researchers. The inclusive and exclusive filtering mechanism ensures that the literature review remains focused and relevant, directly aligning with the specified research questions. The autonomous summarization of abstracts and the final analytical phase underscore the model’s ability to extract vast amounts of data into clear, relevant information, a task that would be challenging without the aid of advanced AI.

This research contributes to the growing field of AI application in academic research, showcasing how LLMs can be effectively employed to enhance research methodologies. While the model significantly improves efficiency and accuracy, it is important to acknowledge the role of human oversight in guiding and interpreting the results, ensuring that the final output maintains the depth required in scholarly research.

  • Keele et al. [2007] Staffs Keele et al. Guidelines for performing systematic literature reviews in software engineering, 2007.
  • Kitchenham et al. [2009] Barbara Kitchenham, O Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. Systematic literature reviews in software engineering–a systematic literature review. Information and software technology , 51(1):7–15, 2009.
  • van Dinter et al. [2021] Raymon van Dinter, Bedir Tekinerdogan, and Cagatay Catal. Automation of systematic literature reviews: A systematic literature review. Information and Software Technology , 136:106589, 2021.
  • Rasheed et al. [2024a] Zeeshan Rasheed, Muhammad Waseem, Mika Saari, Kari Systä, and Pekka Abrahamsson. Codepori: Large scale model for autonomous software development by using multi-agents. arXiv preprint arXiv:2402.01411 , 2024a.
  • Rasheed et al. [2023] Zeeshan Rasheed, Muhammad Waseem, Kai-Kristian Kemell, Wang Xiaofeng, Anh Nguyen Duc, Kari Systä, and Pekka Abrahamsson. Autonomous agents in software development: A vision paper. arXiv preprint arXiv:2311.18440 , 2023.
  • Carlini et al. [2021] Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21) , pages 2633–2650, 2021.
  • Hou et al. [2023] Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620 , 2023.
  • Rasheed et al. [2024b] Zeeshan Rasheed, Muhammad Waseem, Aakash Ahmad, Kai-Kristian Kemell, Wang Xiaofeng, Anh Nguyen Duc, and Pekka Abrahamsson. Can large language models serve as data analysts? a multi-agent assisted approach for qualitative data analysis. arXiv preprint arXiv:2402.01386 , 2024b.
  • Bartholomew [2002] Mary Bartholomew. James lind’s treatise of the scurvy (1753). Postgraduate Medical Journal , 78(925):695–696, 2002.
  • Kitchenham [2004] Barbara Kitchenham. Procedures for performing systematic reviews. Keele, UK, Keele University , 33(2004):1–26, 2004.
  • O’Mara-Eves et al. [2015] Alison O’Mara-Eves, James Thomas, John McNaught, Makoto Miwa, and Sophia Ananiadou. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Systematic reviews , 4(1):1–22, 2015.
  • Marshall et al. [2016] Christopher Marshall et al. Tool support for systematic reviews in software engineering . PhD thesis, Keele University, 2016.
  • Felizardo et al. [2012] Katia R Felizardo, Gabriel F Andery, Fernando V Paulovich, Rosane Minghim, and José C Maldonado. A visual analysis approach to validate the selection review of primary studies in systematic reviews. Information and Software Technology , 54(10):1079–1091, 2012.
  • Felizardo et al. [2014] Katia Romero Felizardo, Elisa Yumi Nakagawa, Stephen G MacDonell, and José Carlos Maldonado. A visual analysis approach to update systematic reviews. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering , pages 1–10, 2014.
  • Felizardo et al. [2011] Katia Romero Felizardo, Mehwish Riaz, Muhammad Sulayman, Emilia Mendes, Stephen G MacDonell, and José Carlos Maldonado. Analysing the use of graphs to represent the results of systematic reviews in software engineering. In 2011 25th Brazilian Symposium on Software Engineering , pages 174–183. IEEE, 2011.
  • Malheiros et al. [2007] Viviane Malheiros, Erika Hohn, Roberto Pinho, Manoel Mendonca, and Jose Carlos Maldonado. A visual text mining approach for systematic reviews. In First international symposium on empirical software engineering and measurement (ESEM 2007) , pages 245–254. IEEE, 2007.
  • Olorisade et al. [2016] Babatunde K Olorisade, Ed de Quincey, Pearl Brereton, and Peter Andras. A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. In Proceedings of the 20th international conference on evaluation and assessment in software engineering , pages 1–11, 2016.
  • Shakeel et al. [2018] Yusra Shakeel, Jacob Krüger, Ivonne von Nostitz-Wallwitz, Christian Lausberger, Gabriel Campero Durand, Gunter Saake, and Thomas Leich. (automated) literature analysis: threats and experiences. In Proceedings of the International Workshop on Software Engineering for Science , pages 20–27, 2018.
  • Feng et al. [2017] L Feng, YK Chiam, and SK Lo. Text-mining techniques and tools for systematic literature reviews: A systematic literature review. in 2017 24th asia-pacific software engineering conference (apsec)(pp. 41–50). IEEE. https://doi. org/10.1109/apsec , 2017.
  • Paynter et al. [2016] Robin Paynter, Lionel L Bañez, Elise Berliner, Eileen Erinoff, Jennifer Lege-Matsuura, Shannon Potter, and Stacey Uhl. Epc methods: an exploration of the use of text-mining software in systematic reviews. 2016.
  • Clark et al. [2020] Justin Clark, Paul Glasziou, Chris Del Mar, Alexandra Bannach-Brown, Paulina Stehlik, and Anna Mae Scott. A full systematic review was completed in 2 weeks using automation tools: a case study. Journal of clinical epidemiology , 121:81–90, 2020.
  • Michelson and Reuter [2019] Matthew Michelson and Katja Reuter. The significant cost of systematic reviews and meta-analyses: a call for greater involvement of machine learning to assess the promise of clinical trials. Contemporary clinical trials communications , 16:100443, 2019.
  • Beller et al. [2018] Elaine Beller, Justin Clark, Guy Tsafnat, Clive Adams, Heinz Diehl, Hans Lund, Mourad Ouzzani, Kristina Thayer, James Thomas, Tari Turner, et al. Making progress with the automation of systematic reviews: principles of the international collaboration for the automation of systematic reviews (icasr). Systematic reviews , 7:1–7, 2018.
  • Jonnalagadda et al. [2015] Siddhartha R Jonnalagadda, Pawan Goyal, and Mark D Huffman. Automating data extraction in systematic reviews: a systematic review. Systematic reviews , 4(1):1–16, 2015.
  • Marshall and Wallace [2019] Iain J Marshall and Byron C Wallace. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Systematic reviews , 8:1–10, 2019.
  • O’Connor et al. [2019] Annette M O’Connor, Guy Tsafnat, James Thomas, Paul Glasziou, Stephen B Gilbert, and Brian Hutton. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Systematic reviews , 8(1):1–8, 2019.
  • Jaspers et al. [2018] Stijn Jaspers, Ewoud De Troyer, and Marc Aerts. Machine learning techniques for the automation of literature reviews and systematic reviews in efsa. EFSA Supporting Publications , 15(6):1427E, 2018.
  • Thomas et al. [2011] James Thomas, John McNaught, and Sophia Ananiadou. Applications of text mining within systematic reviews. Research synthesis methods , 2(1):1–14, 2011.
  • Van Altena et al. [2019] AJ Van Altena, R Spijker, and SD Olabarriaga. Usage of automation tools in systematic reviews. Research synthesis methods , 10(1):72–82, 2019.

systematic literature review using ai tools

  • Help Center

GET STARTED

Rayyan

COLLABORATE ON YOUR REVIEWS WITH ANYONE, ANYWHERE, ANYTIME

Rayyan for students

Save precious time and maximize your productivity with a Rayyan membership. Receive training, priority support, and access features to complete your systematic reviews efficiently.

Rayyan for Librarians

Rayyan Teams+ makes your job easier. It includes VIP Support, AI-powered in-app help, and powerful tools to create, share and organize systematic reviews, review teams, searches, and full-texts.

Rayyan for Researchers

RESEARCHERS

Rayyan makes collaborative systematic reviews faster, easier, and more convenient. Training, VIP support, and access to new features maximize your productivity. Get started now!

Over 500 million reference articles reviewed by research teams, and counting...

Intelligent, scalable and intuitive.

Rayyan understands language, learns from your decisions and helps you work quickly through even your largest systematic literature reviews.

WATCH A TUTORIAL NOW

Solutions for Organizations and Businesses

systematic literature review using ai tools

Rayyan Enterprise and Rayyan Teams+ make it faster, easier and more convenient for you to manage your research process across your organization.

  • Accelerate your research across your team or organization and save valuable researcher time.
  • Build and preserve institutional assets, including literature searches, systematic reviews, and full-text articles.
  • Onboard team members quickly with access to group trainings for beginners and experts.
  • Receive priority support to stay productive when questions arise.
  • SCHEDULE A DEMO
  • LEARN MORE ABOUT RAYYAN TEAMS+

RAYYAN SYSTEMATIC LITERATURE REVIEW OVERVIEW

systematic literature review using ai tools

LEARN ABOUT RAYYAN’S PICO HIGHLIGHTS AND FILTERS

systematic literature review using ai tools

Join now to learn why Rayyan is trusted by already more than 250,000 researchers

Individual plans, teams plans.

For early career researchers just getting started with research.

Free forever

  • 3 Active Reviews
  • Invite Unlimited Reviewers
  • Import Directly from Mendeley
  • Industry Leading De-Duplication
  • 5-Star Relevance Ranking
  • Advanced Filtration Facets
  • Mobile App Access
  • 100 Decisions on Mobile App
  • Standard Support
  • Revoke Reviewer
  • Online Training
  • PICO Highlights & Filters
  • PRISMA (Beta)
  • Auto-Resolver 
  • Multiple Teams & Management Roles
  • Monitor & Manage Users, Searches, Reviews, Full Texts
  • Onboarding and Regular Training

Professional

For researchers who want more tools for research acceleration.

Per month billed annually

  • Unlimited Active Reviews
  • Unlimited Decisions on Mobile App
  • Priority Support
  • Auto-Resolver

For students who want more tools to accelerate their research.

Per month billed annually

Billed monthly

For a team that wants professional licenses for all members.

Per-user, per month, billed annually

  • Single Team
  • High Priority Support

For teams that want support and advanced tools for members.

  • Multiple Teams
  • Management Roles

For organizations who want access to all of their members.

Annual Subscription

Contact Sales

  • Organizational Ownership
  • For an organization or a company
  • Access to all the premium features such as PICO Filters, Auto-Resolver, PRISMA and Mobile App
  • Store and Reuse Searches and Full Texts
  • A management console to view, organize and manage users, teams, review projects, searches and full texts
  • Highest tier of support – Support via email, chat and AI-powered in-app help
  • GDPR Compliant
  • Single Sign-On
  • API Integration
  • Training for Experts
  • Training Sessions Students Each Semester
  • More options for secure access control

ANNUAL ONLY

Per-user, billed monthly

Rayyan Subscription

membership starts with 2 users. You can select the number of additional members that you’d like to add to your membership.

Total amount:

Click Proceed to get started.

Great usability and functionality. Rayyan has saved me countless hours. I even received timely feedback from staff when I did not understand the capabilities of the system, and was pleasantly surprised with the time they dedicated to my problem. Thanks again!

This is a great piece of software. It has made the independent viewing process so much quicker. The whole thing is very intuitive.

Rayyan makes ordering articles and extracting data very easy. A great tool for undertaking literature and systematic reviews!

Excellent interface to do title and abstract screening. Also helps to keep a track on the the reasons for exclusion from the review. That too in a blinded manner.

Rayyan is a fantastic tool to save time and improve systematic reviews!!! It has changed my life as a researcher!!! thanks

Easy to use, friendly, has everything you need for cooperative work on the systematic review.

Rayyan makes life easy in every way when conducting a systematic review and it is easy to use.

  • Open access
  • Published: 15 January 2022

Automation of literature screening using machine learning in medical evidence synthesis: a diagnostic test accuracy systematic review protocol

  • Yuelun Zhang 1   na1 ,
  • Siyu Liang 2   na1 ,
  • Yunying Feng 3   na1 ,
  • Qing Wang 4 ,
  • Feng Sun 5 ,
  • Shi Chen 2 ,
  • Yiying Yang 3 ,
  • Huijuan Zhu 2 &
  • Hui Pan 2  

Systematic Reviews volume  11 , Article number:  11 ( 2022 ) Cite this article

8113 Accesses

15 Citations

13 Altmetric

Metrics details

Systematic review is an indispensable tool for optimal evidence collection and evaluation in evidence-based medicine. However, the explosive increase of the original literatures makes it difficult to accomplish critical appraisal and regular update. Artificial intelligence (AI) algorithms have been applied to automate the literature screening procedure in medical systematic reviews. In these studies, different algorithms were used and results with great variance were reported. It is therefore imperative to systematically review and analyse the developed automatic methods for literature screening and their effectiveness reported in current studies.

An electronic search will be conducted using PubMed, Embase, ACM Digital Library, and IEEE Xplore Digital Library databases, as well as literatures found through supplementary search in Google scholar, on automatic methods for literature screening in systematic reviews. Two reviewers will independently conduct the primary screening of the articles and data extraction, in which nonconformities will be solved by discussion with a methodologist. Data will be extracted from eligible studies, including the basic characteristics of study, the information of training set and validation set, and the function and performance of AI algorithms, and summarised in a table. The risk of bias and applicability of the eligible studies will be assessed by the two reviewers independently based on Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2). Quantitative analyses, if appropriate, will also be performed.

Automating systematic review process is of great help in reducing workload in evidence-based practice. Results from this systematic review will provide essential summary of the current development of AI algorithms for automatic literature screening in medical evidence synthesis and help to inspire further studies in this field.

Systematic review registration

PROSPERO CRD42020170815 (28 April 2020).

Peer Review reports

Systematic reviews synthesise the results of multiple original publications to provide clinicians with comprehensive knowledge and current optimal evidence in answering certain research questions. The major steps of a systematic review are defining a structured review question, developing inclusion criteria, searching in the databases, screening for relevant studies, collecting data from relevant studies, assessing the risk of bias critically, undertaking meta-analyses where appropriate, and assessing reporting biases [ 1 , 2 , 3 ]. A systematic review aims to provide a complete, exhaustive summary of current literature relevant to a research question with an objective and transparent approach. In the light of these characteristics, systematic reviews, in particular those combining high quality evidence, which used to be at the very top of the medical evidence pyramid [ 4 ] and now become regarded as an indispensable tool for evidence viewing [ 5 ], are widely used by reviewers in the practice of evidence-based medicine.

However, conducting systematic reviews for clinical decision making is time-consuming and labour-intensive, as the reviewers are supposed to perform a thorough search to identify any literatures that may be relevant, read through all abstracts of retrieved literatures, and identify the potential candidates for further full-text screening [ 6 ]. For original researches, the median time from the publication to their first inclusion in a systematic review ranged from 2.5 to 6.5 years [ 7 ]. It usually takes over a year to publish a systematic review from the time of literature search [ 8 ]. However, with advances in clinical research, this evidence and systematic review conclusions it generates may be out of date within several years. With the explosive increase of original research articles, reviewers have found difficulty identifying most relevant evidence in time, let alone updating systematic reviews periodically [ 9 ]. Therefore, researchers are exploring automatic methods to improve the efficacy of evidence synthesis while reducing the workload of systematic reviews.

Recent progresses in computer science show a promising future that more intelligent works can be accomplished with the aid of automatic technologies, such as pattern recognition and machine learning (ML). Being seen as a subset of artificial intelligence (AI), ML utilises algorithms to build mathematical models based on training data in order to make predictions or decisions without being explicitly programmed [ 10 ]. Various ML studies have been introduced in the medical field, such as diagnosis, prognosis, genetic analysis, and drug screening, to support clinical decision making [ 11 , 12 , 13 , 14 ]. When it comes to automatic methods for systematic reviews, models for automatic literature screening have been explored to reduce repetitive work and save time for reviewers [ 15 , 16 ].

To date, limited research has been focused on automatic methods used for biomedical literature screening in systematic review process. Automated literature classification systems [ 17 ] or hybrid relevance rating models [ 18 ] were tested in specific datasets, yet further extension of review datasets and performance improvement are required. To address this gap in knowledge, this article describes the protocol for a systematic review aiming at summarising existing automatic methods to screen relevant biomedical literature in the systematic review process, and evaluating the accuracy of the AI tools.

The primary objective of this review is to assess the diagnostic accuracy of AI algorithms (index test) compared with gold-standard human investigators (reference standard) for screening relevant literatures from original literatures identified by electronic search in systematic review. The secondary objective of this review is to describe the time and work saved by AI algorithms in literature screening. Additionally, we plan to conduct subgroup analyses to explore the potential factors that associate with the accuracy of AI algorithms.

Study registration

We prepared this protocol following the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) [ 19 ]. This systematic review has been registered on PROSPERO (Registration number: CRD42020170815, 28 April 2020).

Review question

Our review question was refined using PRISMA-DTA framework, as detailed in Table 1 . In this systematic review, “literatures” refer to the subjects of the diagnostic test (the “participants” in Table 1 ), and “studies” refer to the studies included in our review.

Inclusion and exclusion criteria

We will include studies in medical research that reported a structured study question, described the source of the training or validation sets, developed or employed AI models for automatic literature screening, and used the screening results from human investigators as the reference standard.

We will exclude traditional clinical studies in human participants, editorials, commentaries, or other non-original reports. Pure methodological studies in AI algorithms without application in evidence synthesis will be excluded as well.

Information source and search strategy

An experienced methodologist will conduct searches in major public electronic medical and computer science databases, including PubMed, Embase, ACM Digital Library, and IEEE Xplore Digital Library, for publications ranged from January 2000 to present. We set this time range because to the best of our knowledge, AI algorithms prior to 2000 are unlikely to be applicable in evidence synthesis [ 20 ]. In addition to the literature search, we will also find more relevant studies through checking the reference lists of included studies identified by electronic search. Related abstracts and preprints will be searched in Google scholar. There are no language restrictions in searches. We will use free text words, MeSH/EMTREE terms, IEEE Terms, INSPEC Terms, and ACM Computing Classification System to develop strategies related to three major concepts: systematic review, literature screening, and AI. Multiple synonyms for each concept will be incorporated into the search. The Systematic Review Toolbox ( http://systematicreviewtools.com/ ) will also be utilised to detect potential automation methods in medical research evidence synthesis. Detailed search strategy used in PubMed is shown in Supplementary Material 1.

Study selection

Literatures with titles and abstracts from online electronic databases will be downloaded and imported into EndNote X9.3.2 software (Thomson Reuters, Toronto, Ontario, Canada) for further process after removing duplications.

All studies will be screened independently by 2 authors based on the titles and abstracts. Those which do not meet the inclusion criteria will be excluded with specific reasons. Disagreements will be solved by discussion with a methodologist if necessary. After the initial screening, the full texts of the potentially relevant studies will be independently reviewed by the two authors to make decisions on final inclusions. Conflicts will be resolved in the same way as they were initially screened. Excluded studies will be listed and noted according to PRISMA-DTA flowchart.

Data collection

A data collection form will be used for information extraction. Data from the eligible studies will be independently extracted and verified by two investigators. Disagreements will be resolved through discussion and consultation with the original publication. We will also try to contact the authors to collect the missing data. If one study did not report detailed accuracy data or did not provide enough data that are essential to calculate the accuracy data, this study will be omitted from the quantitative data synthesis.

The following data will be extracted from the original studies: characteristics of study, information of training set and validation set, and the function and performance of AI algorithms. The definitions of variables in data extraction are shown in Table 2 .

Risk of bias assessment, applicability, and levels of evidence

Two authors will independently assess risk of bias and applicability with a checklist based on Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) [ 21 ]. The QUADAS-2 contains 4 domains, respectively regarding patient selection, index test, reference standard, and flow and timing risk of bias. The risk of bias is classified as “low”, “high”, or “unclear”. Studies with high risk of bias will be excluded in the sensitivity analysis.

In this systematic review, the “participants” are literatures rather than human subjects. The index test is AI model used for automatic literature screening. Therefore, we will slightly revise the QUADAS-2 to fit our research context (Table 3 ). We deleted one signal question in the QUADAS-2 “was there an appropriate interval between index test and reference standard”. The purpose of this signal question in the original version of the QUADAS-2 is to judge the bias caused by the change of disease status between the index test and the reference test. The “disease status”, or the final inclusion status of one literature in our research context, will not change; thus, there are no such concerns.

The levels of the evidence body will be evaluated by the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) framework [ 22 ].

Diagnostic accuracy measures

We will extract the data of per study in a two-by-two contingency table from the formal publication text, appendices, or by contacting the main authors to collect sensitivity, specificity, precision, negative predictive value (NPV), positive predictive value (PPV), negative likelihood ratio (NLR), positive likelihood ratio (PLR), diagnostic odds ratios (DOR), F-measure, and accuracy with 95% CI. If the outcomes cannot be formulated in a two-by-two contingency table, we will extract the reported performance data. If possible, we will also assess the area under the curve (AUC), as the two-by-two contingency table may not be available in some scenarios.

Qualitative and quantitative synthesis of results

We will qualitatively describe the application of AI in literature screening and evaluate and compare the accuracy of the AI tools. If there were adequate details and homogeneous data for the quantitative meta-analysis, we will combine the accuracy of AI algorithms in literature screening using the random-effects Rutter-Gatsonis hierarchical summarised receiver operating characteristic curve (HSROC) model which was recommended by the Cochrane Collaboration for combining the evidence for diagnostic accuracy [ 23 ]. The effect of threshold will be incorporated in the model in which heterogeneous thresholds among different studies will be allowed. The combined point estimates of accuracy will be retrieved from the summarised receiver operating characteristic curve (ROC).

Subgroup analyses and meta-regression will be used to explore the between-study heterogeneity. We will explore the following predefined sources of heterogeneity: (1) AI algorithm type, (2) study area of validation set (targeted specific diseases, interventions, or a general area), (3) searched electronic databases (PubMed, EMBASE, or others), and (4) proportion of eligible to original studies (the number of eligible literature identified in the screening step divided by the number of original literature identified during the electronic search). Furthermore, we will analyse the possible sources of heterogeneity from both dataset and methodological perspectives in HSROC as covariates following the recommendations from the Cochrane Handbook for Diagnostic Tests Review [ 23 ]. We regarded the factor as a source of heterogeneity if the coefficient of the covariate in the HSROC model was statistically significant. We will not evaluate the reporting bias (e.g. publication bias) since the hypothesis underlying the commonly used methods, such as funnel plot or Egger’s test, may not be satisfied in our research context. Data were analysed using R software, version 4.0.2 (R Foundation for Statistical Computing, Vienna, Austria) with two-tailed probability of type I error of 0.05 ( α =0.05).

Systematic review has developed rapidly within the last decades and plays a key role in enabling the spread of evidence-based practice. Systematic review, though costing less than primary research in money expenditure, is still time-consuming and labour-intensive. Conducting systematic review begins with electronic database searching for a specific research question, then at least two reviewers read each abstract of retrieved records to identify potential candidate literatures for full-text screening. Only 2.9% retrieved records are relevant and included in the final synthesis on average [ 24 ]; typically, reviewers have to find the proverbial needle in the haystack of irrelevant titles and abstracts. Computational scientists have developed various algorithms for automatic literature screening. Developing an automatic literature screening instrument will be source-saving and improve the quality of systematic review by liberating reviewers from repetitive work. In this systematic review, we aim to describe and evaluate the development process and algorithms used in various AI literature screening systems, in order to build a pipeline for the update of existing tools and creation of new models.

The accuracy of automatic literature screening instruments varied widely in different algorithms and review topics [ 17 ]. The automatic literature screening systems can reach a sensitivity as high as 95%, despite at the expense of specificity, since reviewers try to include every publication relative to the topic of review. As the automatic systems may have a low specificity, it is also important to evaluate how much reviewing work the reviewers can save in the step of screening. We will not only assess the diagnostic accuracy of AI screening algorithms compared with human investigators, but also collect the information of work saved by AI algorithms in literature screening. Additionally, we plan to conduct subgroup analyses to identify potential factors that associate with the accuracy and efficacy of AI algorithms.

As far as we know, this will be the first systematic review to evaluate AI algorithms for automatic literature screening in evidence synthesis. Few systematic reviews have focused on the application of AI algorithms in medical practice. The literature search strategies in previous published systematic reviews rarely use specific algorithms as search terms. Most of them generally use words such as “artificial intelligence” and “machine learning” in strategies, which may lose the studies that only reported one specific algorithm. In order to include AI-related studies as much as possible, our search strategy contained all of the AI algorithms commonly used in the past 50 years, and it was reviewed by an expert in ML. The process of literature screening can be assessed under the framework of the diagnostic test. Findings from this proposed systematic review will provide a comprehensive and essential summary of the application of AI algorithms for automatic literature screening in evidence synthesis. The proposed systematic review may also help to improve and promote the automatic methods in evidence synthesis in the future by locating and identifying the potential weakness in the current AI models and methods.

Availability of data and materials

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

  • Artificial intelligence

Area under the curve

Diagnostic odds ratio

Grading of Recommendations, Assessment, Development and Evaluations

Hierarchical summarised receiver operating characteristic curve

Negative likelihood ratio

Negative predictive value

Positive likelihood ratio

Positive predictive value

Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols

Quality Assessment of Diagnostic Accuracy Studies

Receiver operating characteristic curve

Support vector machine

Higgins J, Thomas J, Chandler J, et al. Cochrane handbook for systematic reviews of interventions version 6.0 (updated July 2019)Cochrane, 2019. Reference Source; 2020.

Google Scholar  

Mulrow CD, Cook D. Systematic reviews: synthesis of best evidence for health care decisions: ACP Press; 1998.

Armstrong R, Hall BJ, Doyle J, Waters E. ‘Scoping the scope’ of a cochrane review. J Public Health. 2011;33(1):147–50.

Article   Google Scholar  

Paul M, Leibovici L. Systematic review or meta-analysis? Their place in the evidence hierarchy. Clin Microbiol Infect. 2014;20(2):97–100. https://doi.org/10.1111/1469-069112489 2014(1469-0691 (Electronic)):97-100.

Article   CAS   PubMed   Google Scholar  

Murad MH, Asi N, Alsawas M, Alahdab F. New evidence pyramid. Evid Based Med. 2016;21(4):125.

Bigby M. Evidence-based medicine in a nutshell: a guide to finding and using the best evidence in caring for patients. Arch Dermatol. 1998;134(12):1609–18.

CAS   PubMed   Google Scholar  

Bragge P, Clavisi O, Turner T, Tavender E, Collie A, Gruen RL. The global evidence mapping initiative: scoping research in broad topic areas. BMC Med Res Methodol. 2011;11(1):92.

Sampson M, Shojania KG, Garritty C, Horsley T, Ocampo M, Moher D. Systematic reviews can be produced and published faster. J Clin Epidemiol. 2008;61(6):531–6.

Shojania K, Sampson M, Ansari M, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147(4):224–33 2007(1539-3704 (Electronic)):224-233.

Bishop CM. Pattern recognition and machine learning: Springer; 2006.

Wang L-Y, Chakraborty A, Comaniciu D. Molecular diagnosis and biomarker identification on SELDI proteomics data by ADTBoost method. Paper presented at: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. 2006.

Cetin MS, Houck JM, Vergara VM, Miller RL, Calhoun V. Multimodal based classification of schizophrenia patients. Paper presented at: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2015.

Sun Y, Loparo K. Information extraction from free text in clinical trials with knowledge-based distant supervision. Paper presented at: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 2019.

Li M, Lu Y, Niu Z, Wu F-X. United complex centrality for identification of essential proteins from PPI networks. IEEE/ACM Transact Comput Biol Bioinform. 2015;14(2):370–80.

Whittington C, Feinman T, Lewis SZ, Lieberman G, Del Aguila M. Clinical practice guidelines: machine learning and natural language processing for automating the rapid identification and annotation of new evidence. J Clin Oncol. 2019;37.

Turner MD, Chakrabarti C, Jones TB, et al. Automated annotation of functional imaging experiments via multi-label classification. Front Neurosci. 2013;7:240.

Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13(2):206–19.

Article   CAS   Google Scholar  

Rúbio TR, Gulo CA. Enhancing academic literature review through relevance recommendation: using bibliometric and text-based features for classification. Paper presented at: 2016 11th Iberian Conference on Information Systems and Technologies (CISTI). 2016.

Shamseer L, Moher D, Clarke M, et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;350:g7647.

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015;4:78.

Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.

Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924–6.

Macaskill P, Gatsonis C, Deeks J, Harbord R, Takwoingi Y. Cochrane handbook for systematic reviews of diagnostic test accuracy. Version 09 0. London: The Cochrane Collaboration; 2010.

Sampson M, Tetzlaff J, Urquhart C. Precision of healthcare systematic review searches in a cross-sectional sample. Res Synth Methods. 2011;2(2):119–25.

Download references

Acknowledgements

We thank Professor Siyan Zhan (Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, [email protected] ) for her critical comments in designing this study. We also thank Dr. Bin Zhang (Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, [email protected] ) for her critical suggestions in developing search strategies.

This study will be supported by the Undergraduate Innovation and Entrepreneurship Training Program (Number 202010023001). The sponsors have no role in study design, data collection, data analysis, interpretations of findings, and decisions for dissemination.

Author information

Yuelun Zhang, Siyu Liang, and Yunying Feng contributed equally to this work and should be regarded as co-first authors.

Authors and Affiliations

Medical Research Center, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Yuelun Zhang

Department of Endocrinology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 1 Shuaifuyuan, Dongcheng District, Beijing, China

Siyu Liang, Shi Chen, Huijuan Zhu & Hui Pan

Eight-year Program of Clinical Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Yunying Feng, Yiying Yang & Xin He

Research Institute of Information and Technology, Tsinghua University, Beijing, China

Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China

You can also search for this author in PubMed   Google Scholar

Contributions

H Pan conceived this research. This protocol was designed by YL Zhang, SY Liang, and YY Feng. YY Yang, X He, Q Wang, F Sun, S Chen, and HJ Zhu provided critical suggestions and comments on the manuscript. YL Zhang, SY Liang, and YY Feng wrote the manuscript. All authors read and approved the final manuscript. H Pan is the guarantor for this manuscript.

Corresponding author

Correspondence to Hui Pan .

Ethics declarations

Ethics approval and consent to participate.

This research is exempt from ethics approval because the work is carried out on published documents.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: supplementary table 1.

. Search strategy for PubMed.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhang, Y., Liang, S., Feng, Y. et al. Automation of literature screening using machine learning in medical evidence synthesis: a diagnostic test accuracy systematic review protocol. Syst Rev 11 , 11 (2022). https://doi.org/10.1186/s13643-021-01881-5

Download citation

Received : 20 August 2020

Accepted : 27 December 2021

Published : 15 January 2022

DOI : https://doi.org/10.1186/s13643-021-01881-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Evidence-based practice
  • Natural language process
  • Systematic review
  • Diagnostic test accuracy

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

systematic literature review using ai tools

web1.jpg

We generate robust evidence fast

What is silvi.ai    .

Silvi is an end-to-end screening and data extraction tool supporting Systematic Literature Review and Meta-analysis.

Silvi helps create systematic literature reviews and meta-analyses that follow Cochrane guidelines in a highly reduced time frame, giving a fast and easy overview. It supports the user through the full process, from literature search to data analyses. Silvi is directly connected with databases such as PubMed and ClinicalTrials.gov and is always updated with the latest published research. It also supports RIS files, making it possible to upload a search string from your favorite search engine (i.e., Ovid). Silvi has a tagging system that can be tailored to any project.

Silvi is transparent, meaning it documents and stores the choices (and the reasons behind them) the user makes. Whether publishing the results from the project in a journal, sending them to an authority, or collaborating on the project with several colleagues, transparency is optimal to create robust evidence.

Silvi is developed with the user experience in mind. The design is intuitive and easily available to new users. There is no need to become a super-user. However, if any questions should arise anyway, we have a series of super short, instructional videos to get back on track.

To see Silvi in use, watch our short introduction video.

  Short introduction video  

systematic literature review using ai tools

Learn more about Silvi’s specifications here.

"I like that I can highlight key inclusions and exclusions which makes the screening process really quick - I went through 2000+ titles and abstracts in just a few hours"

Eishaan Kamta Bhargava 

Consultant Paediatric ENT Surgeon, Sheffield Children's Hospital

"I really like how intuitive it is working with Silvi. I instantly felt like a superuser."

Henriette Kristensen

Senior Director, Ferring Pharmaceuticals

"The idea behind Silvi is great. Normally, I really dislike doing literature reviews, as they take up huge amounts of time. Silvi has made it so much easier! Thanks."

Claus Rehfeld

Senior Consultant, Nordic Healthcare Group

"AI has emerged as an indispensable tool for compiling evidence and conducting meta-analyses. Silvi.ai has proven to be the most comprehensive option I have explored, seamlessly integrating automated processes with the indispensable attributes of clarity and reproducibility essential for rigorous research practices."

Martin Södermark

M.Sc. Specialist in clinical adult psychology

weba.jpg

Silvi.ai was founded in 2018 by Professor in Health Economic Evidence, Tove Holm-Larsen, and expert in Machine Learning, Rasmus Hvingelby. The idea for Silvi stemmed from their own research, and the need to conduct systematic literature reviews and meta-analyses faster.

The ideas behind Silvi were originally a component of a larger project. In 2016, Tove founded the group “Evidensbaseret Medicin 2.0” in collaboration with researchers from Ghent University, Technical University of Denmark, University of Copenhagen, and other experts. EBM 2.0  wanted to optimize evidence-based medicine to its highest potential using Big Data and Artificial Intelligence, but needed a highly skilled person within AI.

Around this time, Tove met Rasmus, who shared the same visions. Tove teamed up with Rasmus, and Silvi.ai was created.

Our story  

Silvi ikon hvid (uden baggrund)

       Free Trial       

    No   card de t ails nee ded!  

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychiatry

A systematic literature review of AI-based digital decision support systems for post-traumatic stress disorder

Associated data.

The original contributions presented in this study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Over the last decade, an increase in research on medical decision support systems has been observed. However, compared to other disciplines, decision support systems in mental health are still in the minority, especially for rare diseases like post-traumatic stress disorder (PTSD). We aim to provide a comprehensive analysis of state-of-the-art digital decision support systems (DDSSs) for PTSD.

Based on our systematic literature review of DDSSs for PTSD, we created an analytical framework using thematic analysis for feature extraction and quantitative analysis for the literature. Based on this framework, we extracted information around the medical domain of DDSSs, the data used, the technology used for data collection, user interaction, decision-making, user groups, validation, decision type and maturity level. Extracting data for all of these framework dimensions ensures consistency in our analysis and gives a holistic overview of DDSSs.

Research on DDSSs for PTSD is rare and primarily deals with the algorithmic part of DDSSs ( n = 17). Only one DDSS was found to be a usable product. From a data perspective, mostly checklists or questionnaires were used ( n = 9). While the median sample size of 151 was rather low, the average accuracy was 82%. Validation, excluding algorithmic accuracy (like user acceptance), was mostly neglected, as was an analysis concerning possible user groups.

Based on a systematic literature review, we developed a framework covering all parts (medical domain, data used, technology used for data collection, user interaction, decision-making, user groups, validation, decision type and maturity level) of DDSSs. Our framework was then used to analyze DDSSs for post-traumatic stress disorder. We found that DDSSs are not ready-to-use products but are mostly algorithms based on secondary datasets. This shows that there is still a gap between technical possibilities and real-world clinical work.

Introduction

According to Sauter, Digital Decision Support Systems (DDSSs) are computer-based systems that bring together information from various sources, assist in the organization and analysis of information and facilitate the evaluation of assumptions underlying the use of specific models ( 1 ). The concept of decision support systems originated in the 1960s ( 2 ) when researchers began to study computerized methods to assist in decision-making ( 3 – 5 ). Since then, the idea has extended throughout a broad spectrum of domains, one of which is healthcare. This work focuses on decision support systems in mental health, more precisely on decision support systems for PTSD. The American Psychiatric Association defines PTSD as “a psychiatric disorder that can occur in people who have experienced or witnessed a traumatic event such as a natural disaster, a serious accident, a terrorist act, war/combat, rape or other violent personal assault” ( 6 ). People with PTSD experience recurrent thoughts about their traumatic experience that influence their daily life. The lifetime prevalence of PTSD is around 12.5% ( 7 ). However, people suffering from PTSD are often undiagnosed or misdiagnosed, resulting in incorrect, incomplete or missing treatment ( 8 ). To investigate whether DDSSs could be a solution to this problem, we aim to review available decision support systems for PTSD and map their technological approaches in order to understand possible research gaps and obstacles in introducing decision support systems to clinical processes. Since no available reference architecture for decision support systems is applicable to our research, we contribute by introducing a novel framework for decision support systems that can be used to analyze existing systems. Ultimately, this also accelerates the development of new systems by highlighting essential dimensions.

Designers of earlier DDSSs have applied multiple alternative approaches for converting real-world data into something that stimulates better decisions. Information-management-based DDSSs try to organize data into usable presentations; modeling-(or data-analytics)-based DDSSs attempt to apply statistical (learning) methods for finding patterns or calculating indicators; and knowledge-management-based systems apply externally prepared algorithms (expert rules) to find matching data or derive new facts ( 9 ). While AI has been an essential element of DDSSs throughout its history, only recently has a new generation of decision support been facilitated by the availability of powerful computing tools to properly manage big data and to analyze and generate new knowledge. The evaluation of AI’s earlier implementations was limited to the design and development phase; machine learning-based algorithms often do not generalize beyond the training data set ( 10 ). However, studies have still shown the benefits of machine learning algorithms in DDSSs ( 11 – 13 ). Current studies that test the application of healthcare AI algorithms often omit details of DDSS tools that apply AI models. A well-designed DDSS is likely to enable the real-world application of AI technology ( 14 ).

This review aims to contribute by introducing a framework for the features of DDSS implementation in mental health. We aim to identify the prevalent features of the current state of research on DDSS. Often, the development of information systems involves the continuous introduction of new features and quality improvements. We hypothesized that each available article presents only a selection of features, a selection which is dependent on the maturity of the DDSS. Maturity models are increasingly used as a means of benchmarking or self-assessment of development ( 15 ). In healthcare informatics, many maturity models are available [e.g., Hospital Information System Maturity Model ( 16 )], but none of these models strictly provides an informed approach for the assessment of research on decision support systems ( 17 ). The available maturity models instead tend to look at the level of organizational adoption of specific technologies (e.g., how much an organization values data analytics technology) and provide little support for deciding on the readiness of DDSS tools in their early phases of development. As AI is often an essential element of a DDSS, we also explored AI maturity models. AI maturity models mostly look into the level of AI adoption in an organization rather than the maturity of the AI technology itself ( 18 – 20 ).

A DDSS is not a single technology but rather a set of integrated technologies ( 21 – 25 ). Sauser et al. ( 26 ) suggested a measure of System Readiness Level (SRL), which expresses the level of maturity of a system consisting of a set of integrated technologies ( 26 ). Exploring AI technology readiness or maturity, we encountered suggestions to look separately into the AI system’s capacities of integrating existing data sources (machine-machine intelligence), interacting with human users (human-computer intelligence) and applying intelligent reasoning (core cognitive intelligence) ( 27 ).

To have a transparent and objective approach for this literature review, we decided to apply the five stages suggested by Kitchenham’s “Guidelines for performing Systematic Literature Reviews in Software Engineering” ( 28 ):

  • (1) Search Strategy
  • (2) Study Selection
  • (3) Study Quality Assessment
  • (4) Data Extraction
  • (5) Data Synthesis

Research questions

Since our aim is to understand current research on decision support systems for PTSD, this paper is based on two research questions. First, we look for state-of-the-art decision support systems for post-traumatic stress disorder (RQ1). Second, we investigate the component elements of current decision support systems for PTSD (RQ2).

Search strategy

We built a search string based on the research questions identified and applied it to the Scopus abstract and citation database. Scopus was chosen as the primary source because it is the largest abstract and citation database of research literature with 100% MEDLINE coverage ( 29 ). The initial search string consisted of the disease to investigate – post-traumatic stress disorder – its abbreviation PTSD as well as the term “decision support.” To find papers that covered the prediction and classification of PTSD, we also added Artificial Intelligence. In Scopus, we applied the search string to the title, abstract and tags of the research papers. We restricted our search to only include journal articles or conference proceedings in English. We also conducted a manual search using Google Scholar and the web to find additional research; however, this did not bring up any new articles not already covered by our database search and our reference screening process. We formed our search criteria as (“decision support” OR “Artificial Intelligence”) AND [PTSD OR (post AND traumatic AND stress AND disorder)].

We conducted the search in Scopus on 3 March 2021. It resulted in 75 papers; reference screening of the included literature brought up an additional 13 papers. Our search process is visualized in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is fpsyt-13-923613-g001.jpg

Search strategy.

Study selection

The titles and abstracts of the queried articles were analyzed to identify relevant articles from the results of the search string queries. Articles fitting the research questions and meeting the inclusion criteria (see section “inclusion criteria”) as well as the quality criteria (see section “study quality assessment”) were included. Since the goal of this research is to give an overview of the state of the art, we did not put any constraints on study types and designs. To reduce bias in the study selection process, the task was done by two researchers independently. The two result sets were then merged and deviations were discussed among the authors. This resulted in a total set of 17 research papers.

We then repeated this process step to extract relevant studies from the reference lists of the selected articles. This resulted in 13 new research papers.

Inclusion criteria

Table 1 presents the inclusion criteria applied to the articles in our review (Inclusion criteria).

Inclusion criteria.

Study quality assessment

Table 2 presents the inclusion criteria applied to the articles in our review (Quality criteria).

Quality criteria.

Data extraction and synthesis

Data extraction and synthesis were based on an inductive approach. We applied thematic analysis ( 30 ) to answer our research questions. First, clear, scoped questions for data extraction were formed. Two researchers read through all the articles and iteratively clustered all of the information available on decision support systems into the extraction parameters. These extraction parameters describe how decision support systems work. This process is shown in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is fpsyt-13-923613-g002.jpg

Extraction process.

The answers extracted from the EQs (see Table 3 ) were then combined upon the agreement of the authors to create a feature matrix. The extracted features were then further clustered to create a common terminology that allows further analysis and the possibility to compare results. In the end, we combined the developed extraction questions and the clustered scales of each question into a novel framework for decision support systems in mental health.

Extraction questions (EQ).

The selected 30 research articles ( 31 – 60 ) were published between 2001 and 2019. Three articles were published in journals about medical informatics, 10 in computer science journals or proceedings and 17 in medical journals. The following table shows how often each extraction parameter was present and indicates the terminology used in the selected studies. The terminology shown in Table 4 was developed by manual, iterative clustering of the extracted features until the authors were satisfied with the granularity.

Terminology extraction.

A framework for digital decision support systems

Based on our aim to find all relevant features of decision support systems in the PTSD area and our systematic literature review results, we propose a multidimensional framework that covers the different areas of DDSS. Each dimension represents one of our extraction parameters. Figure 3 illustrates our framework with the different dimensions of DDSSs. Based on the extracted data, we clustered the terminology to develop scales for dimensions in order to make results better analyzable.

An external file that holds a picture, illustration, etc.
Object name is fpsyt-13-923613-g003.jpg

Framework for DDSS.

Input Data: The input data dimension defines the information needed by a decision support system in order to function. Possible data could be structured like socio-demographic information or coded data [for example, with the International Statistical Classification of Diseases and Related Health Problems (ICD) ( 61 ) or the Diagnostic and Statistical Manual of Mental Disorders (DSM) ( 62 )] as well as semi-structured information like patient records or unstructured information like free text or medical images. A combination of different structured, semi-structured and/or unstructured data is also possible.

Technology: The technology dimension describes how the decision support system is implemented. This involves three sub-dimensions:

Decision technology: The decision technology explains the intelligence of the cognition of the system. This is the algorithm that powers the decision-making. Examples are different machine learning algorithms such as support vector machines or other statistical methods as well as rule-based approaches. Interaction technology: This sub-dimension describes the technology needed to interact with other systems or user groups in the clinical process. Interaction technology can be API-based interfaces to systems, graphical user interfaces (websites, mobile apps) or sensory input like conversational interfaces (chatbots). Data collection technology: The data collection technology sub-dimension defines how the data described in the input data dimension are collected. Examples are instance sensors, questionnaires or chatbots.

Validation: Validation describes how the success of decision support systems is measured.

Accuracy: The decision support system is evaluated by how many right or wrong decisions it makes. Examples are accuracy, recall (sensitivity), precision, specificity, area under the curve (AUC) values and F1 scores (harmonic mean of recall and precision). User acceptance: End-users are involved in the evaluation of the DDSS. Efficacy: The impact of the decision support system is evaluated based on potential benefits. Security: The DDSS is evaluated against security regulations. Legal: The legal compliance of the DDSS is evaluated.

User group: This dimension captures the different user groups interacting with the decision support system in the clinical process.

Medical domain: The medical domain dimension describes the disease for which the decision support system can be applied.

Decision: The following scale defines the decisions a digital decision support system can support:

Prediction: The system outputs a risk score based on the likelihood that someone gets a disease. Assessment: The patient is already sick (knowingly or unknowingly).
Diagnosis: Testing individuals with symptoms and/or suspicion of illness Screening: Testing for individuals without specific symptoms Monitoring: Decision support that evaluates symptom severity or treatment progress Treatment: Recommendation or intervention concerning care or therapy

Maturity: As none of the existing maturity models fits our research, we designed a DDSS maturity model based on the SLR scale ( 26 ), but with adaptions specific to healthcare. It introduces additional gradation for noticing the moment where human interaction is added to the core AI algorithm. Our maturity levels describe on a scale from one to seven how advanced the DDSS is. Not all of the abovementioned dimensions are necessarily present in each of the maturity levels. As the maturity level gets higher, more dimensions are described.

  • 1. Idea without implementation
  • 2. Implementation without real-world interaction (algorithm development)
  • 3. Implementation with real-world interaction but without patient intervention
  • 4. Fully functioning prototype, system triggers real-world action, e.g., clinical trial
  • 5. Operational product (at least one adopter, certified if required)
  • 6. Locally adopted product
  • 7. World-wide adopted product (transformational).

Data synthesis input data (EQ1)

The data used by digital decision support systems in the context of PTSD is diverse. Voice data ( 35 , 45 , 46 , 55 ), text data ( 38 , 48 , 50 ), checklists and questionnaires ( 32 , 33 , 37 , 41 – 43 , 52 , 53 , 59 ), bio signals ( 32 , 33 , 36 , 44 , 45 , 51 , 57 ) and electronic medical records ( 34 , 47 , 56 ) as well as secondary data from other clinical studies ( 31 , 40 , 49 , 54 ) are used. One article used the choices made by a virtual avatar in a role-playing game as input data ( 39 ). Of the 30 publications included in this review, 28 mentioned the sample size of the data they used to develop and test their decision support system. The minimum sample size was 10, and the maximum was 89,840 with a median (IQR) m = 151.5 (54.25 to 656.25). The violin plots ( Figures 4 , ​ ,5) 5 ) below show the distribution of the sample size. The top three outliers (89,840; 89,840; 5,972) were neglected in Figure 5 for better visibility.

An external file that holds a picture, illustration, etc.
Object name is fpsyt-13-923613-g004.jpg

Sample size distribution.

An external file that holds a picture, illustration, etc.
Object name is fpsyt-13-923613-g005.jpg

Sample size distribution excluding outliers.

Figure 6 shows the data dimension of the studies in our review and indicates how the data used correlate with the average maturity levels of the DDSS. It visualizes the frequency and maturity of DDSSs based on the different data sources.

An external file that holds a picture, illustration, etc.
Object name is fpsyt-13-923613-g006.jpg

Data dimension concepts.

Data synthesis implementation (EQ2)

The majority ( n = 15) of the investigated research uses a neural network approach (including support vector machines) in their systems. In 11 cases, support vector machines (SVM) were used. Other algorithms used were regressions, decision trees, random forest and rule-based approaches. We observed that 20 research papers did not have or mention any user interaction but worked solely on secondary data. The others used questionnaires or surveys, virtual humans or virtual reality. McWorther et al. proposed using temperature control, aromatherapy and auditory therapy capabilities for user interaction ( 36 ). Concerning maturity levels, AI algorithms are still mostly on maturity level two. Most advanced in terms of maturity were statistical methods and text mining methods, as indicated in Figure 7 . The categories “statistics” and “machine learning” (ML) arose because some studies mentioned only these broad categories without further specifics.

An external file that holds a picture, illustration, etc.
Object name is fpsyt-13-923613-g007.jpg

Decision technology concepts.

Data synthesis validation (EQ3)

The majority ( n = 23) of articles validated the accuracy of the DDSS studied. Three articles validated user acceptance, two validated efficacy and three did not mention validation. Comparing algorithmic validation among research papers was difficult since a variety of scores, such as F1 scores, area under the receiver operating curve ( 63 ) or overall accuracy, were used and they cannot be converted. To be able to provide an estimation of how well current DDSSs perform, we extracted all accuracy measurements present in each paper and aggregated each scale individually. The mean accuracy ( n = 11) of the DDSSs is μ = 82.2% with a median of η = 82% and a standard deviation of σ = 0.095. The mean area under the curve value ( n = 8) is μ = 0.845 with a median of η = 0.84 and a standard deviation of σ = 0.064.

Data synthesis user groups (EQ4)

The user groups mentioned were patients, clinicians and supporters of patients; however, the majority of papers did not explicitly mention specific user groups for their systems. Research covering decision support systems with higher maturity levels (four and above) included this information. Research dealing with decision support systems with lower maturity often lacked a clear user group since the process of using the proposed systems was not defined at that stage.

Data synthesis medical domain (EQ5)

In addition to PTSD, which was tackled by all 30 research papers, four investigated depression ( 46 – 48 , 55 ), two anxiety ( 34 , 48 ) and one paranoia ( 58 ).

Data synthesis decisions supported (EQ6)

Research focusing on predicting PTSD or its symptoms was most common ( n = 11). Six papers focused on screening ( 35 , 38 , 45 , 46 , 50 , 55 ) and six on treatment ( 32 , 36 , 43 , 51 , 53 , 56 ). Four papers investigated the diagnosis of PTSD ( 37 , 41 , 52 , 60 ) and five focused on monitoring PTSD ( 33 , 35 , 56 , 58 , 59 ).

Data synthesis maturity level (EQ7)

The decision support systems were ranked according to the maturity scale described in see section “a framework for digital decision support systems.” As stated by answering research question two, the majority of papers work with secondary data. This is supported by the high volume of research with a maturity level of two. Figure 8 shows the number of articles grouped by maturity level.

An external file that holds a picture, illustration, etc.
Object name is fpsyt-13-923613-g008.jpg

Bar chart – maturity levels.

This research highlights the state of the art in digital decision support systems for PTSD based on our proposed framework. We developed the framework to ensure a holistic overview of all features of a DDSS. The dimensions of the framework represent the topics of interest and the choice of features is based on the conceptualization of the terminology extracted from the included articles dimension by dimension.

Concerning the data dimension, we noticed that questionnaires and checklists are still the most common and most mature (see Figure 6 ) input for decision support systems. When examining clinical guidelines like NICE ( 64 ) for diagnosing PTSD, questionnaires and checklists are still the only approach mentioned for diagnostics. Even though some new technologies, such as virtual or augmented reality, were investigated in the research found in this review, we noticed an absence of input parameters based on smartphones or wearables like GPS sensors or accelerometers. We hypothesize that this is due to the short life cycle of modern technologies, making it difficult to offer clinical evidence of their benefits. Questionnaires and checklists, however, have been around for many years and the methodology for administering them has not changed, therefore there is more scientific evidence of their use. Researchers and medical professionals are more likely to research, invest and adopt technology with strong evidence. This could be another reason why DDSSs using new technology are not widely included in clinical processes.

The data dimension also showed that the sample size is on average small and the statistical significance of the results was not proven by the majority of the research articles. Several reasons contribute to this. In general, medical data are hard to obtain for research because secondary use is still not easy with many digital healthcare records and/or applications. Even if data can be obtained, they need to include the right parameters and have a structure that is usable for AI algorithms. Unstructured and text-based information is especially challenging to use for an AI. Further, most available datasets like the Jerusalem Trauma Outreach and Prevention Study do not include data on modern sensors ( 65 ).

The most common AI algorithm found during this literature review was support vector machines. Over the last few years, they have been developed to a de facto standard because they are easy to use, have good library support for programming and have low assumptions on the training data. We also observed that the number of research items resulting in usable products (maturity level ≥ 4) was low in three articles. Clinical studies with patient intervention (maturity level ≥ 3) were relatively low in nine papers out of 30. One reason for this could be that the small sample size of the research items does not provide sufficient evidence for clinical use.

All articles with a maturity level of 4 or more had, as one focus, validation of user acceptance and clearly defined user groups. Most articles with lower maturity levels did not have defined user groups. This could indicate a lack of strategic development and difficulties in bringing the research to a clinical setting. Our hypothesis is that interaction with users or integration into clinical processes is often much more challenging to solve than intelligence of cognition. Still, most papers focus on cognition, not user interaction; our framework’s validation dimension is evidence of this. We found 23 papers evaluating accuracy, which is an evaluation of AI technology, and five papers evaluating user acceptance or efficacy, meaning that they attempted to improve the current clinical process. Since most papers in our review are of maturity levels 1, 2 or 3 (meaning algorithm research), they do not include the clinical component necessary for user acceptance and efficacy evaluation. This shows a research gap when it comes to the enrichment of clinical processes with IT. The same goes for evaluating legal and IT-security constraints, which were not mentioned by any paper in our review. Since eHealth systems are getting increasingly focused by cyber attacks ( 66 ), IT and data security need to be a vital part of the evaluation to allow a safe DDSS adoption.

Further research has to be conducted on how the clinical process needs to be adapted for DDSSs to work, also in the context of the supported decisions. Most DDSS designers do not really understand the medical decision process but provide decisions in an “IT way.” One limitation of this general hypothesis is that our research focuses solely on DDSS for PTSD. However, the narrow approach to include only PTSD shows that even in a very well-scoped area, a DDSS is hard to implement.

Since we used an inductive research approach to design our framework based on currently available literature, some important framework dimensions might be missing. One example is that the framework includes many technical aspects of the implementations and fewer organizational and financial perspectives. We encourage further research to include dimensions that describe the adoption of DDSSs in clinical processes.

Introducing our novel framework for DDSS, we provide a guide for decision support system evaluation. The framework is complementary to other healthcare technology evaluation methods (clinical, organizational, financial) and thus supports the design of comprehensive evaluation systems for DDSSs. Applying the maturity dimension helped us to examine what features of a DDSS are present, thereby indicating the steps to take in order to move up in maturity when developing decision support systems. Since the framework was developed out of general considerations, it can be applied to decision support systems outside of PTSD or mental health. However, it should be further evaluated to examine whether the terminology suits other domains. Higher maturity scales in particular need additional verification, since only two papers in our review had a maturity level above 4.

Our research aimed to analyze existing decision support systems for PTSD. Based on this goal, we developed a generic framework covering all dimensions of digital decision support systems. Our framework not only accelerates the development and benchmarking of DDSSs, but also acts as the foundation for our systematic literature review. Extracting data for all framework dimensions ensures consistency in our analysis and gives a holistic overview of DDSSs. During our review, we found working DDSS prototypes for PTSD and described their components. However, most of the systems are not evaluated in production use; they are only algorithmic models based on secondary datasets. This shows that there is still a gap between technical possibilities and actual clinical work. We proposed some possible explanations: small sample size, missing domain expertise, lack of focus to bring research to production. However, this gap should be analyzed further by testing our hypothesis and examining it with data from research on DDSSs for other mental diseases. For now, we conclude that only a few rare DDSSs for PTSD are ready for large-scale adoption in healthcare. The long-promised revolution of AI and ML for diagnosis in psychiatry, at least for PTSD, is yet to come.

Data availability statement

Author contributions.

MB: conceptualization, methodology, investigation, resources, data curation, and writing – original draft. JM: investigation, data curation, and writing – original draft. PR: writing, review, editing, and supervision. All authors contributed to the article and approved the submitted version.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

  • Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

Automate your literature review with AI

Shubham Dogra

Table of Contents

Traditional methods of literature review can be susceptible to errors . Whether it’s overcoming human bias ">human bias or sifting through an incredibly large amount of scientific research being published today. Not to forget all the papers that have already been published in the past 100 years. Putting both together makes a heap of information that is humanly impossible to sift through. At least do so in an efficient way.

Thanks to artificial intelligence, long and tedious literature reviews are becoming quick and comprehensive. No longer do researchers have to spend endless hours combing through stacks of books and journals.

In this blog post, we'll dive deep into the world of automating your literature review with AI, exploring what a literature review is, why it's so crucial, and how you can harness AI tools to make the process more effective.

What is a literature review?

A literature review is essentially the foundation of a scientific research project, providing a comprehensive overview of existing knowledge on a specific topic. It gives an overview of your chosen topic and summarizes key findings, theories, and methodologies from various sources.

This critical analysis not only showcases the current state of understanding but also identifies gaps and trends in the scientific literature. In addition, it also shows your understanding of your field and can help provide credibility to your research paper .

Types of literature review

There are several types of literature reviews but for the most part, you will come across five versions. These are:

1. Narrative review: A narrative review provides a comprehensive overview of a topic, usually without a strict methodology for selection.

2. Systematic review: Systematic reviews are a strategic synthesis of a topic. This type of review follows a strict plan to identify, evaluate, and critique all relevant research on a topic to minimize bias.

3. Meta-analysis: It is a type of systematic review that uses research data from multiple articles to draw quantitative conclusions about a specific phenomenon.

4. Scoping review: As the name suggests, the purpose of a scoping review is to study a field, highlight the gaps in it, and underline the need for the following research paper.

5. Critical review: A critical literature review assesses and critiques the strengths and weaknesses of existing literature, challenging established ideas and theories.

Benefits of using literature review AI tools?

Using literature review AI tools can be a complete game changer in your research. They can make the literature review process smarter and hassle-free. Here are some practical benefits:

AI tools for literature review can skim through tons of research papers and find the most relevant one for your topic in no time, thus saving you hours of manual searching.

Comprehensive insights

No matter how complex the topic is or how long the research papers are, AI tools can find key insights like methodology, datasets, limitations, etc, by simply scanning the abstracts or PDF documents.

Eliminate bias

AI doesn't have favorites. Based on the data it’s fed, it evaluates research papers objectively and reduces as much bias in your literature review as possible.

Faster research questions

AI tools present loads of research papers in the same place. Some AI tools let you create visual maps and connections, thus helping you identify gaps in existing literature and arriving at your research question faster.

Consistency

AI tools ensure your review is consistently structured and formatted . They can also check for proper grammar and citation style, which is crucial for scholarly writing.

Multilingual support

There are heaps of non-native English-speaking researchers who can struggle with understanding scientific jargon in English. AI tools with multilingual support can help such academicians conduct their literature review in their own language.

How to write a literature review with AI

Now that we understand the benefits of a literature review using artificial intelligence, let's explore how you can automate the process. Literature reviews with AI-powered tools can save you countless hours and allow a more comprehensive and systematic approach. Here's one process you can follow:

Choose the right AI tool

Several AI search engines like Google Scholar, SciSpace, Semantic Scholar help you find the most relevant papers semantically. Or in other words even without the right keywords. These tools understand the context of your search query and deliver the results.

Find relevant research papers

Once you input your research question or keywords into a search engine like Google Scholar, Semantic Scholar, or SciSpace, it scours millions of papers worth of databases to find relevant articles. After that, you can narrow your search results to a certain time period, journals, number of citations, and other parameters for more accuracy.

Analyze the search results

Now that you have your list of relevant academic papers, the next step would be reviewing these results. A lot of AI-powered tools for literature review will often provide summaries along with the paper. Some sophisticated tools also help you gather key points from multiple papers at once and let you ask questions regarding that topic. This way, you can get an understanding of the topic and further have a better understanding of your field.

Organize your collection

Whether you’re writing a literature review or your paper, you will need to keep track of your references. Using AI tools, you can efficiently organize your findings, store them in reference managers, and instantly generate citations automatically, saving you the hassle of manually formatting references.

Write the literature review

Now that you’ve done your groundwork, you can start writing your literature review. Although you should be doing this yourself, you can use tools like paraphrasers, grammar checkers, and co-writers to help you refine your academic writing and get your point across with more clarity.

Best AI Tools for Literature Review

Since generative AI and ChatGPT came into the picture, there are heaps of AI tools for literature review available out there. Some of the most comprehensive ones are:

SciSpace is a valuable tool to have in your arsenal. It has a repository of 270M+ papers and makes it easy to find research articles. You can also extract key information to compare and contrast multiple papers at the same time. Then, go on to converse with individual papers using Copilot, your AI research assistant.

Love using SciSpace tools? Enjoy discounts! Use SR40 (40% off yearly) and SR20 (20% off monthly). Claim yours here 👉 SciSpace Premium

Research Rabbit

Research Rabbit is a research discovery tool that helps you find new, connected papers using a visual graph. You can essentially create maps around metadata, which helps you not only explore similar papers but also connections between them.

Iris AI is a specialized tool that understands the context of your research question, lets you apply smart filters, and finds relevant papers. Further, you can also extract summaries and other data from papers.

If you already don’t know about ChatGPT , you must be living under a rock. ChatGPT is a chatbot that creates text based on a prompt using natural language processing (NLP). You can use it to write the first draft of your literature review, refine your writing, format it properly, write a research presentation, and many more things.

Things to keep in mind when using literature review AI tools

While AI-powered tools can significantly streamline the literature review process, there are a few things you should keep in mind while employing them:

Quality control

Always review the results generated by AI tools. AI is powerful but not infallible. Ensure that you do further analysis by yourself and determine that the selected research articles are indeed relevant to your research.

Ethical considerations

Be aware of ethical concerns, such as plagiarism and AI writing. Use of AI is still frowned upon so make sure you do a thorough check for originality of your work, which is vital for maintaining academic integrity.

Stay updated

The world of AI is ever-evolving. Stay updated on the latest advancements in AI tools for literature review to make the most of your research.

In conclusion

Artificial intelligence is a game-changer for researchers, especially when it comes to literature reviews. It not only saves time but also enhances the quality and comprehensiveness of your work. With the right AI tool and a clear research question in hand, you can build an excellent literature review.

systematic literature review using ai tools

A few more good reads for you!

Types of Essays in Academic Writing

How to Write a Conclusion for a Research Paper

You might also like

How To Write An Argumentative Essay

How To Write An Argumentative Essay

Monali Ghosh

Beyond Google Scholar: Why SciSpace is the best alternative

Types of Literature Review — A Guide for Researchers

Types of Literature Review — A Guide for Researchers

Sumalatha G

A free, AI-powered research tool for scientific literature

  • Stephen Zavestoski

New & Improved API for Developers

Introducing semantic reader in beta.

Stay Connected With Semantic Scholar Sign Up What Is Semantic Scholar? Semantic Scholar is a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI.

Deepfake Detection: A Systematic Literature Review

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Subscribe to the PwC Newsletter

Join the community, edit social preview.

systematic literature review using ai tools

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row, remove a task, add a method, remove a method, edit datasets, system for systematic literature review using multiple ai agents: concept and an empirical evaluation.

13 Mar 2024  ·  Abdul Malik Sami , Zeeshan Rasheed , Kai-Kristian Kemell , Muhammad Waseem , Terhi Kilamo , Mika Saari , Anh Nguyen Duc , Kari Systä , Pekka Abrahamsson · Edit social preview

Systematic Literature Reviews (SLRs) have become the foundation of evidence-based studies, enabling researchers to identify, classify, and combine existing studies based on specific research questions. Conducting an SLR is largely a manual process. Over the previous years, researchers have made significant progress in automating certain phases of the SLR process, aiming to reduce the effort and time needed to carry out high-quality SLRs. However, there is still a lack of AI agent-based models that automate the entire SLR process. To this end, we introduce a novel multi-AI agent model designed to fully automate the process of conducting an SLR. By utilizing the capabilities of Large Language Models (LLMs), our proposed model streamlines the review process, enhancing efficiency and accuracy. The model operates through a user-friendly interface where researchers input their topic, and in response, the model generates a search string used to retrieve relevant academic papers. Subsequently, an inclusive and exclusive filtering process is applied, focusing on titles relevant to the specific research area. The model then autonomously summarizes the abstracts of these papers, retaining only those directly related to the field of study. In the final phase, the model conducts a thorough analysis of the selected papers concerning predefined research questions. We also evaluated the proposed model by sharing it with ten competent software engineering researchers for testing and analysis. The researchers expressed strong satisfaction with the proposed model and provided feedback for further improvement. The code for this project can be found on the GitHub repository at https://github.com/GPT-Laboratory/SLR-automation.

Code Edit Add Remove Mark official

Datasets edit.

Empowering education development through AIGC: A systematic literature review

  • Published: 29 February 2024

Cite this article

  • Xiaojiao Chen 1 ,
  • Zhebing Hu 2 &
  • Chengliang Wang   ORCID: orcid.org/0000-0003-2208-3508 3  

431 Accesses

Explore all metrics

As an exemplary representative of AIGC products, ChatGPT has ushered in new possibilities for the field of education. Leveraging its robust text generation and comprehension capabilities, it has had a revolutionary impact on pedagogy, learning experiences, personalized education and other aspects. However, to date, there has been no comprehensive review of AIGC technology’s application in education. In light of this gap, this study employs a systematic literature review and selects 134 relevant publications on AIGC’s educational application from 4 databases: EBSCO, EI Compendex, Scopus, and Web of Science. The study aims to explore the macro development status and future trends in AIGC’s educational application. The following findings emerge: 1) In the AIGC’s educational application field, the United States is the most active country. Theoretical research dominates the research types in this domain; 2) Research on AIGC’s educational application is primarily published in journals and academic conferences in the fields of educational technology and medicine; 3) Research topics primarily focus on five themes: AIGC technology performance assessment, AIGC technology instructional application, AIGC technology enhancing learning outcomes, AIGC technology educational application’s Advantages and Disadvantages analysis, and AIGC technology educational application prospects. 4) Through Grounded Theory, the study delves into the core advantages and potential risks of AIGC’s educational application, deconstructing the scenarios and logic of AIGC’s educational application. 5) Based on a review of existing literature, the study provides valuable future agendas from both theoretical and practical application perspectives. Discussing the future research agenda contributes to clarifying key issues related to the integration of AI and education, promoting more intelligent, effective, and sustainable educational methods and tools, which is of great significance for advancing innovation and development in the field of education.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

systematic literature review using ai tools

Similar content being viewed by others

systematic literature review using ai tools

Students’ voices on generative AI: perceptions, benefits, and challenges in higher education

Cecilia Ka Yuk Chan & Wenjie Hu

The AI generation gap: Are Gen Z students more interested in adopting generative AI such as ChatGPT in teaching and learning than their Gen X and millennial generation teachers?

Cecilia Ka Yuk Chan & Katherine K. W. Lee

systematic literature review using ai tools

How should we change teaching and assessment in response to increasingly powerful generative Artificial Intelligence? Outcomes of the ChatGPT teacher survey

Matt Bower, Jodie Torrington, … Mark Alfano

Data availability

The datasets (Coding results) generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Abdelghani, R., Wang, Y. H., Yuan, X., Wang, T., Lucas, P., Sauzéon, H., & Oudeyer, P. Y. (2023). Gpt-3-driven pedagogical agents to train children’s curious question-asking skills. International Journal of Artificial Intelligence in Education , 1–36. https://doi.org/10.1007/s40593-023-00340-7

Abdulai, A. F., & Hung, L. (2023). Will ChatGPT undermine ethical values in nursing education, research, and practice. Nursing Inquiry . e12556–e12556. https://doi.org/10.1111/nin.12556

Ahmed, S. K. (2023). The Impact of ChatGPT on the Nursing Profession: Revolutionizing Patient Care and Education. Annals of Biomedical Engineering , 1–2. https://doi.org/10.1007/s10439-023-03262-6

Albeshri, A., & Thayananthan, V. (2018). Analytical techniques for decision making on information security for big data breaches. International Journal of Information Technology & Decision Making, 17 (02), 527–545. https://doi.org/10.1142/S0219622017500432

Article   Google Scholar  

Allen, B., Dreyer, K., Stibolt Jr, R., Agarwal, S., Coombs, L., Treml, C., ... & Wald, C. (2021). Evaluation and real-world performance monitoring of artificial intelligence models in clinical practice: Try it, buy it, check it. Journal of the American College of Radiology , 18 (11), 1489–1496. https://doi.org/10.1016/j.jacr.2021.08.022

Alnaqbi, N. M., & Fouda, W. (2023). Exploring the role of ChatGPT and social media in enhancing student evaluation of teaching styles in higher education using neutrosophic sets. International Journal of Neutrosophic Science, 20 (4), 181–190. https://doi.org/10.1111/nin.12556

Alqahtani, T., Badreldin, H. A., Alrashed, M., Alshaya, A. I., Alghamdi, S. S., bin Saleh, K., ... & Albekairy, A. M. (2023). The emergent role of artificial intelligence, natural learning processing, and large language models in higher education and research. Research in Social and Administrative Pharmacy. https://doi.org/10.1016/j.sapharm.2023.05.016

Ancillai, C., Sabatini, A., Gatti, M., & Perna, A. (2023). Digital technology and business model innovation: A systematic literature review and future research agenda. Technological Forecasting and Social Change, 188 , 122307. https://doi.org/10.1016/j.techfore.2022.122307

Banić, B., Konecki, M., & Konecki, M. (2023, May). Pair Programming Education Aided by ChatGPT. In 2023 46th MIPRO ICT and Electronics Convention (MIPRO) (pp. 911–915). IEEE.

Busch, F., Adams, L. C., & Bressem, K. K. (2023). Biomedical ethical aspects towards the implementation of artificial intelligence in medical education. Medical Science Educator., 33 , 1007–1012. https://doi.org/10.1007/s40670-023-01815-x

Article   PubMed   PubMed Central   Google Scholar  

Chang, C.-Y., Kuo, S.-Y., & Hwang, G.-H. (2022). Chatbot-facilitated nursing education: Incorporating a knowledge-based Chatbot system into a nursing training program. Educational Technology & Society , 25 (1), 15–27. Retrieved December 19, 2023, from https://www.jstor.org/stable/48647027

Charmaz, K., & Thornberg, R. (2021). The pursuit of quality in grounded theory. Qualitative Research in Psychology, 18 (3), 305–327. https://doi.org/10.1080/14780887.2020.1780357

Choi, E. P. H., Lee, J. J., Ho, M. H., Kwok, J. Y. Y., & Lok, K. Y. W. (2023). Chatting or cheating? The impacts of ChatGPT and other artificial intelligence language models on nurse education. Nurse Education Today, 125 , 105796–105796. https://doi.org/10.1016/j.nedt.2023.105796

Article   PubMed   Google Scholar  

Cooper, G. (2023). Examining science education in chatgpt: An exploratory study of generative artificial intelligence. Journal of Science Education and Technology, 32 (3), 444–452. https://doi.org/10.1007/s10956-023-10039-y

Article   ADS   Google Scholar  

Cross, J., Robinson, R., Devaraju, S., Vaughans, A., Hood, R., Kayalackakom, T., ... & Robinson, R. E. (2023). Transforming medical education: Assessing the integration of ChatGPT into faculty workflows at a Caribbean medical school. Cureus , 15 (7). https://doi.org/10.7759/cureus.41399

Currie, G. M. (2023, May). Academic integrity and artificial intelligence: Is ChatGPT hype, hero or heresy? In Seminars in Nuclear Medicine. WB Saunders. https://doi.org/10.1053/j.semnuclmed.2023.04.008

Das, D., Kumar, N., Longjam, L. A., Sinha, R., Roy, A. D., Mondal, H., & Gupta, P. (2023). Assessing the capability of ChatGPT in answering first-and second-order knowledge questions on microbiology as per competency-based medical education curriculum. Cureus , 15 (3). https://doi.org/10.7759/cureus.36034

Deacon, B., Laufer, M., & Schäfer, L. O. (2023). Infusing educational technologies in the heart of the university-a systematic literature review from an organisational perspective. British Journal of Educational Technology, 54 (2), 441–466. https://doi.org/10.1111/bjet.13277

Deeley, S. J. (2018). Using technology to facilitate effective assessment for learning and feedback in higher education. Assessment & Evaluation in Higher Education, 43 (3), 439–448. https://doi.org/10.1080/02602938.2017.1356906

Deng, X., & Yu, Z. (2023). A meta-analysis and systematic review of the effect of chatbot technology use in sustainable education. Sustainability, 15 (4), 2940. https://doi.org/10.3390/su15042940

Diekemper, R. L., Ireland, B. K., & Merz, L. R. (2015). Development of the documentation and appraisal review tool for systematic reviews. World Journal of Meta-Analysis, 3 (3), 142–150. https://doi.org/10.13105/wjma.v3.i3.142

Engel, A., & Coll, C. (2022). Hybrid teaching and learning environments to promote personalized learning. RIED-Revista Iberoamericana de Educacion a Distancia , 225–242. https://doi.org/10.5944/ried.25.1.31489

Escotet, M. Á. (2023). The optimistic future of Artificial Intelligence in higher education. Prospects, 1–10. https://doi.org/10.1007/s11125-023-09642-z

Esplugas, M. (2023). The use of artificial intelligence (AI) to enhance academic communication, education and research: A balanced approach. Journal of Hand Surgery (European Volume) , 48 (8), 819–822.  https://doi.org/10.1177/17531934231185746

Extance, A. (2023). ChatGPT has entered the classroom: How LLMs could transform education. Nature, 623 , 474–477. https://doi.org/10.1038/d41586-023-03507-3

Article   CAS   PubMed   ADS   Google Scholar  

Foroughi, B., Senali, M. G., Iranmanesh, M., Khanfar, A., Ghobakhloo, M., Annamalai, N., & Naghmeh-Abbaspour, B. (2023). Determinants of Intention to Use ChatGPT for Educational Purposes: Findings from PLS-SEM and fsQCA. International Journal of Human-Computer Interaction , 1–20. https://doi.org/10.1080/10447318.2023.2226495

Gaur, A., & Kumar, M. (2018). A systematic approach to conducting review studies: An assessment of content analysis in 25 years of IB research. Journal of World Business, 53 (2), 280–289. https://doi.org/10.1016/j.jwb.2017.11.003

Ghorbani, M., Bahaghighat, M., Xin, Q., & Özen, F. (2020). ConvLSTMConv network: A deep learning approach for sentiment analysis in cloud computing. Journal of Cloud Computing, 9 (1), 1–12. https://doi.org/10.1186/s13677-020-00162-1

Glaser, B., & Strauss, A. (2017). Discovery of grounded theory: Strategies for qualitative research . Routledge https://doi.org/10.1016/j.jwb.2017.11.003

Gough, D., Oliver, S., & Thomas, J. (Eds.). (2017). An introduction to systematic reviews . Sage https://doi.org/10.5124/jkma.2014.57.1.49

Grant, N., & Metz, C. (2022). A new chat bot is a ‘code red’ for Google's search business, The New York Times. Available at: https://www.nytimes.com/2022/12/21/technology/ai-chatgpt-google-search.html . Accessed 19 Dec 2023

Hadi, M. S., & Junor, R. S. (2022). Speaking to devices: Can we use Google assistant to Foster Students' speaking skills? Journal of Languages and Language Teaching, 10 (4), 570–578. https://doi.org/10.33394/jollt.v10i4.5808

Heng, J. J., Teo, D. B., & Tan, L. F. (2023). The impact of Chat Generative Pre-trained Transformer (ChatGPT) on medical education. Postgraduate Medical Journal , qgad058. https://doi.org/10.1093/postmj/qgad058

Ho, W., & Lee, D. (2023). Enhancing engineering education in the roblox metaverse: Utilizing chatgpt for game development for electrical machine course. International Journal on Advanced Science, Engineering & Information Technology , 13 (3). https://doi.org/10.18517/ijaseit.13.3.18458

Holmes, W., & Kay, J. (2023, June). AI in education. Coming of age? The community voice. In International conference on artificial intelligence in education (pp. 85–90). Springer Nature Switzerland.

Google Scholar  

Hsu, Y. C., & Ching, Y. H. (2023). Generative Artificial Intelligence in Education, Part One: the Dynamic Frontier. TechTrends , 1–5. https://doi.org/10.1007/s11528-023-00863-9

Hwang, G. J., & Chang, C. Y. (2021). A review of opportunities and challenges of chatbots in education. Interactive Learning Environments , 1–14. https://doi.org/10.1080/10494820.2021.1952615

Jalil, S., Rafi, S., LaToza, T. D., Moran, K., & Lam, W. (2023, April). Chatgpt and software testing education: Promises & perils. In 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW) (pp. 4130–4137). IEEE.

Jeon, J., & Lee, S. (2023). Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT. Education and Information Technologies , 1–20. https://doi.org/10.1007/s10639-023-11834-1

Jing, Y., Wang, C., Chen, Y., Wang, H., Yu, T., & Shadiev, R. (2023). Bibliometric mapping techniques in educational technology research: A systematic literature review. Education and Information Technologies , 1–29. https://doi.org/10.1007/s10639-023-12178-6

Johnson, R. B., & Onwuegbuzie, A. J. (2004). Mixed methods research: A research paradigm whose time has come. Educational Researcher, 33 (7), 14–26. https://doi.org/10.3102/0013189X033007014

Karabacak, M., Ozkara, B. B., Margetis, K., Wintermark, M., & Bisdas, S. (2023). The advent of generative language models in medical education. JMIR Medical Education, 9 , e48163. https://doi.org/10.2196/48163

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., … Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103 , 102274. https://doi.org/10.1016/j.lindif.2023.102274

Kepuska, V., & Bohouta, G. (2018, January). Next-generation of virtual personal assistants (microsoft cortana, apple siri, amazon alexa and google home). In 2018 IEEE 8th annual computing and communication workshop and conference (CCWC) (pp. 99–103). IEEE.

Kerneža, M. (2023). Fundamental And Basic Cognitive Skills Required For Teachers To Effectively Use Chatbots In Education. In Science And Technology Education: New Developments And Innovations (pp. 99–110). Scientia Socialis, UAB.

Kılıçkaya, F. (2020). Using a chatbot, Replika, to practice writing through conversations in L2 English: A Case study. In New Technological applications for foreign and second language learning and teaching (pp. 221–238). IGI Global. https://doi.org/10.4018/978-1-7998-2591-3.ch011

Killian, C. M., Marttinen, R., Howley, D., Sargent, J., & Jones, E. M. (2023). “Knock, Knock... Who’s There?” ChatGPT and Artificial Intelligence-Powered Large Language Models: Reflections on Potential Impacts Within Health and Physical Education Teacher Education. Journal of Teaching in Physical Education , 1 (aop), 1–5. https://doi.org/10.1123/jtpe.2023-0058

Kohnke, L. (2022). A pedagogical Chatbot: A supplemental language learning Tool. RELC Journal , 00336882211067054. https://doi.org/10.1177/00336882211067054

Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digital Health, 2 (2), e0000198. https://doi.org/10.1371/journal.pdig.0000198

Lee, D., & Yeo, S. (2022). Developing an AI-based chatbot for practicing responsive teaching in mathematics. Computers & Education, 191 , 104646. https://doi.org/10.1016/j.compedu.2022.104646

Lee, L. W., Dabirian, A., McCarthy, I. P., & Kietzmann, J. (2020). Making sense of text: Artificial intelligence-enabled content analysis. European Journal of Marketing, 54 (3), 615–644. https://doi.org/10.1108/EJM-02-2019-0219

Li, L., Ma, Z., Fan, L., Lee, S., Yu, H., & Hemphill, L. (2023). ChatGPT in education: A discourse analysis of worries and concerns on social media. arXiv preprint arXiv :2305.02201. https://doi.org/10.48550/arXiv.2305.02201

Lim, W. M., Gunasekara, A., Pallant, J. L., Pallant, J. I., & Pechenkina, E. (2023). Generative AI and the future of education: Ragnarök or reformation? A paradoxical perspective from management educators. The International Journal of Management Education, 21 (2), 100790. https://doi.org/10.1016/j.ijme.2023.100790

Lin, T. J., & Lan, Y. J. (2015). Language learning in virtual reality environments: Past, present, and future. Journal of Educational Technology & Society, 18 (4), 486–497. Retrieved December 19, 2023, from http://www.jstor.org/stable/jeductechsoci.18.4.486

Lodge, J. M., Thompson, K., & Corrin, L. (2023). Mapping out a research agenda for generative artificial intelligence in tertiary education. Australasian Journal of Educational Technology, 39 (1), 1–8. https://doi.org/10.14742/ajet.8695

Luo, H., Li, G., Feng, Q., Yang, Y., & Zuo, M. (2021). Virtual reality in K-12 and higher education: A systematic review of the literature from 2000 to 2019. Journal of Computer Assisted Learning, 37 (3), 887–901. https://doi.org/10.1111/jcal.12538

Mariani, M. M., Hashemi, N., & Wirtz, J. (2023). Artificial intelligence empowered conversational agents: A systematic literature review and research agenda. Journal of Business Research, 161 , 113838. https://doi.org/10.1016/j.jbusres.2023.113838

Mohamed, A. M. (2023). Exploring the potential of an AI-based Chatbot (ChatGPT) in enhancing English as a Foreign Language (EFL) teaching: perceptions of EFL Faculty Members. Education and Information Technologies, 1–23. https://doi.org/10.1007/s10639-023-11917-z

Mohammad, B., Supti, T., Alzubaidi, M., Shah, H., Alam, T., Shah, Z., & Househ, M. (2023). The pros and cons of using ChatGPT in medical education: A scoping review. Student Health Technology Information, 305 , 644–647. https://doi.org/10.3233/SHTI230580

Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & PRISMA Group. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Annals of Internal Medicine, 151 , 264–269. https://doi.org/10.7326/0003-4819-151-4-200908180-00135

Mokmin, N. A. M., & Ibrahim, N. A. (2021). The evaluation of chatbot as a tool for health literacy education among undergraduate students. Education and Information Technologies, 26 (5), 6033–6049. https://doi.org/10.1007/s10639-021-10542-y

Patel, N., Nagpal, P., Shah, T., Sharma, A., Malvi, S., & Lomas, D. (2023). Improving mathematics assessment readability: Do large language models help? Journal of Computer Assisted Learning, 39 (3), 804–822. https://doi.org/10.1111/jcal.12776

Paul, J., Lim, W. M., O’Cass, A., Hao, A. W., & Bresciani, S. (2021). Scientific procedures and rationales for systematic literature reviews (SPAR-4-SLR). International Journal of Consumer Studies, 45 (4), O1–O16. https://doi.org/10.1111/ijcs.12695

Pentina, I., Xie, T., Hancock, T., & Bailey, A. (2023). Consumer–machine relationships in the age of artificial intelligence: Systematic literature review and research directions. Psychology & Marketing, 40 (8), 1593–1614. https://doi.org/10.1002/mar.21853

Pereira, R., Reis, A., Barroso, J., Sousa, J., & Pinto, T. (2022). Virtual assistants applications in education. In International conference on technology and innovation in learning, teaching and education (pp. 468–480). Springer Nature Switzerland.

Pinto, A. S., Abreu, A., Costa, E., & Paiva, J. (2023). How Machine Learning (ML) is transforming higher education: A systematic literature review. Journal of Information Systems Engineering and Management, 8 (2). https://doi.org/10.55267/iadt.07.13227

Prikshat, V., Islam, M., Patel, P., Malik, A., Budhwar, P., & Gupta, S. (2023). AI-augmented HRM: Literature review and a proposed multilevel framework for future research. Technological Forecasting and Social Change, 193 , 122645. https://doi.org/10.1016/j.techfore.2023.122645

Radianti, J., Majchrzak, T. A., Fromm, J., & Wohlgenannt, I. (2020). A systematic review of immersive virtual reality applications for higher education: Design elements, lessons learned, and research agenda. Computers & Education, 147 , 103778. https://doi.org/10.1016/j.compedu.2019.103778

Rahimzadeh, V., Kostick-Quenet, K., Blumenthal Barby, J., & McGuire, A. L. (2023). Ethics education for healthcare professionals in the era of chatGPT and other large language models: Do we still need it?. The American Journal of Bioethics , 1–11. https://doi.org/10.1080/15265161.2023.2233358

Rahman, M. M., & Watanobe, Y. (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13 (9), 5783. https://doi.org/10.3390/app13095783

Article   CAS   Google Scholar  

Rasul, T., Nair, S., Kalendra, D., Robin, M., de Oliveira Santini, F., Ladeira, W. J., ... & Heathcote, L. (2023). The role of ChatGPT in higher education: Benefits, challenges, and future research directions. Journal of Applied Learning and Teaching, 6 (1). https://doi.org/10.37074/jalt.2023.6.1

Sallam, M. (2023a). ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare (Vol. 11, No. 6, p. 887). MDPI. https://doi.org/10.3390/healthcare11060887

Sallam, M. (2023b). ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare, 11 (6), 887. https://doi.org/10.3390/healthcare11060887

Sánchez-Ruiz, L. M., Moll-López, S., Nuñez-Pérez, A., Moraño-Fernández, J. A., & Vega-Fleitas, E. (2023). ChatGPT challenges blended learning methodologies in engineering education: A case study in mathematics. Applied Sciences, 13 (10), 6039. https://doi.org/10.3390/app13106039

Sandu, N., & Gide, E. (2019). Adoption of AI-Chatbots to enhance student learning experience in higher education in India. In 2019 18th International Conference on Information Technology Based Higher Education and Training (ITHET) (pp. 1–5). IEEE.

Schmulian, A., & Coetzee, S. A. (2019). Students’ experience of team assessment with immediate feedback in a large accounting class. Assessment & Evaluation in Higher Education, 44 (4), 516–532. https://doi.org/10.1080/02602938.2018.1522295

Seetharaman, R. (2023). Revolutionizing medical education: Can ChatGPT boost subjective learning and expression? Journal of Medical Systems, 47 (1), 1–4. https://doi.org/10.1007/s10916-023-01957-w

Sharma, M., & Sharma, S. (2023). A holistic approach to remote patient monitoring, fueled by ChatGPT and Metaverse technology: The future of nursing education. Nurse Education Today, 131 , 105972. https://doi.org/10.1016/j.nedt.2023.105972

Shlonsky, A., Noonan, E., Littell, J. H., & Montgomery, P. (2011). The role of systematic reviews and the Campbell collaboration in the realization of evidence-informed practice. Clinical Social Work Journal, 39 , 362–368. https://doi.org/10.1007/s10615-010-0307-0

Shoja, M. M., Van de Ridder, J. M., & Rajput, V. (2023). The emerging role of generative artificial intelligence in medical education, research, and practice. Cureus, 15 (6), e40883. https://doi.org/10.7759/cureus.40883

Siegle, D. (2023). A role for ChatGPT and AI in gifted education. Gifted Child Today, 46 (3), 211–219. https://doi.org/10.1177/10762175231168443

Smith, A., Hachen, S., Schleifer, R., Bhugra, D., Buadze, A., & Liebrenz, M. (2023). Old dog, new tricks? Exploring the potential functionalities of ChatGPT in supporting educational methods in social psychiatry. International Journal of Social Psychiatry . https://doi.org/10.1177/0020764023117845

Strzelecki, A. (2023). To use or not to use ChatGPT in higher education? A study of students’ acceptance and use of technology. Interactive Learning Environments , 1–14. https://doi.org/10.1080/10494820.2023.2209881

Tam, W., Huynh, T., Tang, A., Luong, S., Khatri, Y., & Zhou, W. (2023). Nursing education in the age of artificial intelligence powered Chatbots (AI-Chatbots): Are we ready yet? Nurse Education Today, 129 , 105917. https://doi.org/10.1016/j.nedt.2023.105917

Teel, Z. A., Wang, T., & Lund, B. (2023). ChatGPT conundrums: Probing plagiarism and parroting problems in higher education practices. College & Research Libraries News, 84 (6), 205. https://doi.org/10.5860/crln.84.6.205

Tlili, A., Shehata, B., Adarkwah, M. A., Bozkurt, A., Hickey, D. T., Huang, R., & Agyemang, B. (2023). What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. Smart Learning Environments, 10 (1), 15. https://doi.org/10.1186/s40561-023-00237-x

Tsang, R. (2023). Practical applications of ChatGPT in undergraduate medical education. Journal of Medical Education and Curricular Development , 10 . https://doi.org/10.1177/23821205231178449

Yan, D. (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation. Education and Information Technologies, 1–25. https://doi.org/10.1007/s10639-023-11742-4

Zhang, R., Zou, D., & Cheng, G. (2023a). A review of chatbot-assisted learning: pedagogical approaches, implementations, factors leading to effectiveness, theories, and future directions. Interactive Learning Environments , 1–29. https://doi.org/10.1080/10494820.2023.2202704

Zhang, S., Shan, C., Lee, J. S. Y., Che, S., & Kim, J. H. (2023b). Effect of chatbot-assisted language learning: A meta-analysis. Education and Information Technologies , 1–21. https://doi.org/10.1007/s10639-023-11805-6

Zhu, C., Sun, M., Luo, J., Li, T., & Wang, M. (2023). How to harness the potential of ChatGPT in education? Knowledge Management & E-Learning, 15 (2), 133. https://doi.org/10.34105/j.kmel.2023.15.008

Download references

Author information

Authors and affiliations.

College of Educational Science and Technology, Zhejiang University of Technology, Hangzhou, China

Xiaojiao Chen

College of Foreign Languages, Zhejiang University of Technology, Hangzhou, China

Department of Education Information Technology, Faculty of Education, East China Normal University, Shanghai, China

Chengliang Wang

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Chengliang Wang .

Ethics declarations

Conflict of interest.

During the research, the authors indicate that no commercial or financial ties that may be regarded a possible conflict of interest existed.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Chen, X., Hu, Z. & Wang, C. Empowering education development through AIGC: A systematic literature review. Educ Inf Technol (2024). https://doi.org/10.1007/s10639-024-12549-7

Download citation

Received : 19 October 2023

Accepted : 05 February 2024

Published : 29 February 2024

DOI : https://doi.org/10.1007/s10639-024-12549-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial intelligence generated content
  • Artificial intelligence
  • Systematic literature review
  • Educational technology
  • Find a journal
  • Publish with us
  • Track your research
  • Open supplemental data
  • Reference Manager
  • Simple TEXT file

People also looked at

Systematic review article, a systematic review and meta-analysis of the diagnosis and surgical management of carcinoid heart disease.

systematic literature review using ai tools

  • 1 Department of Cardiology, Lyell McEwin Hospital, SA Health, Elizabeth Vale, SA, Australia
  • 2 Adelaide Medical School, The University of Adelaide, North Terrace Adelaide, SA, Australia
  • 3 SA Health Library Service, Lyell McEwin Hospital, SA Health, Elizabeth Vale, SA, Australia

Introduction: Carcinoid heart disease (CHD), a complication of carcinoid syndrome (CS), is a rare condition that can lead to right sided valvular heart disease and has been traditionally associated with a poor prognosis. We conducted a systematic review and meta-analysis to explore the accuracy of biomarkers and echocardiography in diagnosing CHD amongst patients who are already known to have neuroendocrine tumours and to assess whether surgical management of CHD leads to a reduction in mortality.

Methods: A systematic literature search of MEDLINE, EMBASE, EBM Reviews, Google Scholar, ClinicalTrials.gov was conducted. All studies on patients with carcinoid heart disease (CHD) reporting on biomarkers, echocardiographic and surgical outcomes were included. The National Heart, Lung, and Blood Institute quality assessment tool was used to assess the methodological study quality. Data analysis was performed using Stata Statistical Software and R Studio, and individual meta-analyses were performed for biomarkers, echocardiographic findings, and surgical outcomes.

Results: A total of 36 articles were included in the systematic review analysis. N terminal pro-brain natriuretic peptide (NTproBNP) and 5-hydroxyindole acetate (5-HIAA) levels were higher in patients with CHD compared with those without CHD. 32% of CS patients had echocardiographic evidence of cardiac involvement, of which 79% involved tricuspid valve abnormalities. Moderate-severe tricuspid regurgitation was the most common echocardiographic abnormality (70% of patients). However, these analyses had substantial heterogeneity due to the high variability of cardiac involvement across studies. Pooled surgical mortality for CHD was 11% at 1 month, 31% at 12 months and 56% at 24 months. When assessing surgical outcomes longitudinally, the one-month surgical results showed a trend towards more recent surgeries having lower mortality rates than those reported in earlier years, however this was not statistically significant.

Discussion: There is not enough data in current literature to determine a clear cut-off value of NTproBNP and 5-HIAA to help diagnose or determine CHD severity. Surgical management of CHD is yet to show significant mortality benefit, and there are no consistent comparisons to medical treatment in current literature.

1 Introduction

Carcinoid heart disease (CHD) is an uncommon complication of Carcinoid syndrome (CS), which is a rare syndrome amongst patients with metastatic neuroendocrine tumours (NETs), a neoplasm of enterochromaffin cells that secrete bioactive substances ( 1 ). The annual age-adjusted increase in NETs was reported as 6.98 per 100,000 persons in 2012, and the incidence of CS among NET patients has increased from 11% in 2000 to 19% in 2011 ( 1 ).

The typical form of CS is characterized by flushing, abdominal cramps, diarrhea and bronchospasm. CS results from an excess secretion of NETs, which can excrete as many as 40 vasoactive products, but predominantly serotonin. It manifests when there is reduced hepatic capacity to metabolize the excess, abnormal secreted vasoactive peptides ( 2 ). Rarely, CS may exist in patients without pre-existing liver metastases, such as ovarian and retroperitoneal tumours, where the vasoactive substances enter the systemic circulation via the caval system, bypassing the liver. Atypical CS is rare and mainly occurs in the context of lung NETs, characterized by headache, shortness of breath and extended episodes of flushing. The excess serotonin (amongst other peptides) appears to result in tissue fibrosis in the heart and subsequent CHD ( 1 ).

CHD is usually insidious, and most commonly involves the right side of the heart, as the neurohormonal substances break down in the respiratory system before reaching the left heart unless there is a right-to-left shunt such as via a patent foramen ovale ( 3 ). Right-sided valves are predominantly involved, leading to right heart failure over time.

As a rare disease, however, there is no clear path for the diagnosis of CHD, which still relies strongly on clinical suspicion. The prognosis of patients with CHD is poor (31% survival rate within 3 years in patients with CHD compared to 69% in those without CHD) ( 4 ). There are biomarkers that are well-established to be associated with neuroendocrine tumours and others known to be associated with heart failure, and although it would logically make sense for these biomarkers to be elevated in CHD, they have not been systematically evaluated in literature, nor any cut-off levels that aid in diagnosis been established. Echocardiography as the diagnostic tool for CS, has not been serially analyzed against symptoms, biomarkers and outcomes to be able to guide disease trajectory and management, such as the optimal timing for surgery in CHD. A recent consensus document by European Neuroendocrine Tumor Society (ENETS) has provided a "best practice" proforma recommending information at time of referral to be captured and to create a standardized assessment of patients across sites ( 5 ).

There is paucity of data on surgical management of CHD. This may be partly because of the overall paucity of data on CHD, and even less on right heart surgeries for CHD because right-heart surgeries with tricuspid valve replacement or repair and pulmonary valve replacement or repair have been historically conducted mostly in the setting of the patient already undergoing cardiac bypass for an alternative indication. In recent times though, cardiothoracic literature increasingly supports isolated right-sided valvular intervention to improve outcomes ( 6 ). Furthermore, it appears that timing of surgical intervention is a significant variable in the degree of mortality benefit for patients undergoing non-carcinoid right-sided valvular surgery ( 7 ).

Thus, in view of these lessons in the cardiothoracic field, it is necessary to re-visit its applicability to CHD, which for many years was accepted to have a poor prognosis of 2–4 years mortality from time of diagnosis.

Due to the rarity of CHD amongst an already rare cohort of NETs patients, CHD has been difficult to study. To our knowledge, there are no systematic reviews with meta-analyses assessing the optimal method of diagnosis and management of carcinoid heart disease.

This systematic review explores the question of the accuracy of biomarkers and echocardiography in diagnosing carcinoid heart disease amongst patients who are already known to have neuroendocrine tumours (NETs). It also seeks to answer the question of whether surgical management for carcinoid heart disease will improve mortality.

2.1 Study design and search strategy

This systematic review and meta-analysis follows the reporting guidelines outlined in the PRISMA statement (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 1 . The research question informed by PECO is “In patients with carcinoid syndrome, what is the optimal diagnosis method for carcinoid heart disease to minimise mortality and increase quality of life?” The PECO is included in Supplementary Appendix 1 .

2.2 Eligibility criteria

The review focused on studies that included patients with CHD that reported on biomarkers, echocardiographic findings, and surgical outcomes. Previous reviews of relevant topics and bibliographies of the selected manuscripts were also checked for relevant publications. Only studies published in English in peer-reviewed journals were selected. Case studies, conference abstracts, reviews, editorials, commentaries and book chapters and studies published in languages other than English were excluded.

2.3 Search strategy and screening

We conducted a systematic search of literature on MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials, Google Scholar, ClinicalTrials.gov from inception to March 12, 2021 on carcinoid heart disease. An updated search was completed on February 12, 2023. A detailed description of the search strategy is included in Supplementary Appendix 2 .

Two reviewers independently screened the titles and abstracts of all studies. Data extraction was also conducted by two reviewers independently (JN and PA). Methodological quality of each study was assessed using the checklist published by the US National Heart, Lung and Blood Institute for case control studies ( 8 ). Disagreements between reviewers in title abstract screening, full text screening, data extraction and study quality assessment were resolved by discussion within the team.

2.4 Data extraction

Data was extracted from selected studies on study year, design, country, and patient cohort, number of patients (total) and number of patients (CHD) and investigation for measurement.

For two continuous biomarker variables, N terminal pro-brain natriuretic peptide (NTproBNP) level and 5-hydroxyindole acetate (5-HIAA), sample size, mean and standard deviation for both Carcinoid Heart Disease and No Carcinoid Heart Disease groups were included in the analyses. For 5-HIAA, two studies had their values adjusted to be consistent with the ng/l units. A zero standard deviation value was given a value of 0.01.

For 13 dichotomous echocardiographic variables, proportion and 95% confidence interval (CI) of proportion were presented in Forest plots, for each study and then for all the studies combined. For one continuous variable, LVEF in CHD and NoCHD, mean difference and 95% CI were calculated for each study then overall. A variable was included in the meta-analysis if at least 2 of the 16 journal articles involved had sufficient values for that variable (e.g., had a numerator and denominator value). When the numerator was zero it was set to 1 and when the numerator was the same as the denominator it was set to one minus the denominator to avoid leaving out important studies due to extreme proportions not being calculated.

For five dichotomous CHD surgical mortality variables, proportion and 95% CI of proportion are presented in the analysis, for each study and then for all the studies combined. If a study had data for CHD mortality at 1 month, 12 months, 24 months, 36 months or 60 months, it was included in the corresponding forest plot. A number of studies had data for specific time periods, so the years in which surgeries were conducted were included as columns in the Forest plots, with multiple rows in some studies. Meta-regression was performed for each time to death analysis, where the outcome is mortality proportion with associated standard error and the predictor is the year in which the included surgeries started in one model, and the year of the last surgery included in the paper in another model. Mean difference in proportions (i.e., mean difference in proportion of CHD mortality across years), 95% CI and P values are obtained from these 10 meta-regressions.

2.5 Data analysis

Data analyses were performed using Stata Statistical Software: Release 15.1 College Station, TX: StataCorp LP and R Studio (Version 1.4 1717, 2009–2021). In view of the heterogeneity found for several variables in this meta-analysis, a random-effects model was used throughout. Individual meta-analyses were performed for biomarkers, echocardiographic findings, and surgical outcomes. For biomarkers, the pooled mean difference with its 95% CI was used. For echocardiographic findings values, the pooled weighted proportion was used with its 95% CI. For surgical outcomes, odds ratio (OR) and 95% CI were reported as the outcomes are dichotomous. A p value of <0.05 denoted statistical significance. The I ² statistic was used to evaluate heterogeneity (with I ² > 50% indicating significant heterogeneity) as was Cochran's Q P value (with p value < 0.05 indicating significant heterogeneity). A p value of <=0.05 denoted statistical significance.

The proportional meta-analyses were done with the Metaprop Stata command to perform meta-analysis of binomial data. In this random-effects model, the observed difference between the proportions and the mean cannot be entirely attributed to sampling error and other factors such as differences in study populations, study designs, etc. could also contribute. Each study estimates a different parameter, and the pooled estimate describes the heterogeneity among the studies and in the case where the variance is zero, this model simply reduces to the fixed-effects model.

A Funnel plot was presented for each variable that had greater than 10 samples to test for publication bias. An Egger's Test was performed for each variable that had greater than 10 samples to test for small study effects.

A systematic search of the literature from the five databases yielded 2,059 results, of which 800 were duplicates, and the remaining 1,259 were screened by title and abstract ( Figure 1 ). Ninety three articles were selected from the initial screening process by two independent reviewers. Of these, thirty six met criteria for final inclusion in the systematic review analysis. Reasons for excluding certain studies are included in Supplementary Appendix 3 . The demographics of each study reported in this systematic review are reported in Table 1 . Fourteen studies focused on biomarkers for CHD. Thirteen studies altogether (some from echocardiography-focused papers) had analysable data on biomarkers in CHD, twelve focused on echocardiographic findings, two reported on computerised tomography (CT) findings, and fifteen studies reviewed surgical outcomes with tricuspid valve surgery in the CHD cohort ( Table 1 ). Three studies focused on the screening process for CHD amongst the neuroendocrine metastatic disease population.

www.frontiersin.org

Figure 1 . Flow chart of study selection.

www.frontiersin.org

Table 1 . Studies selected from systematic review of literature.

3.1 Biomarkers

The two biomarkers reviewed were NT-proBNP, a marker known to elevate in heart failure, and 5-HIAA, known to elevate with serotonin excretion in CS. Four studies ( Figure 2 ) compared NT-proBNP values in CHD patients vs. those with CS but no CHD ( 8 – 12 ), and fourteen studies ( Figure 3 ) compared 5-HIAA ( 9 , 10 , 13 – 19 ). NT-proBNP levels were significantly higher in patients with CHD compared with those with no CHD (mean difference (MD) 731.45 (95% CI 75.79–1,387.11, I 2 98.8%, p value < 0.000, Figure 2 ). Similarly, 5-HIAA levels were significantly elevated in patients with CHD (MD 253.52 (95% CI 111.07–389.96, I 2 99.9%, p value < 0.000 Figure 3A ). However, for both meta-analyses, high heterogeneity was observed indicating significant variability between studies.

www.frontiersin.org

Figure 2 . NTproBNP in patients with CHD compared to no CHD.

www.frontiersin.org

Figure 3 . HIAA-5 in patients with CHD compared to no CHD.

For NT-proBNP an Egger's test for possible publication bias was not appropriate due to small number ( n  = 4) of studies. For 5-HIAA studies, funnel plot ( Figure 3B ) shows possible publication bias (10 studies are outside the funnel) and the Egger's Test ( P value = 0.176) does not show small study effects ( Table 2 ).

www.frontiersin.org

Table 2 . Echocardiographic markers explored in each study.

3.2 Echocardiography

Of the sixteen studies reporting on echocardiography findings amongst metastatic NET patients to determine extent and type of cardiac involvement ( 10 , 13 – 15 , 18 – 24 ), twelve reviewed percentage of cardiac involvement, and eleven papers explored tricuspid valve abnormality with seven papers specifying detection of moderate-severe tricuspid regurgitation ( Table 3 ). Five articles included tricuspid valve thickening, pulmonary regurgitation, six included pulmonary stenosis or right ventricular enlargement as independent echocardiographic markers. Three papers reviewed right atrial enlargement and tricuspid valve retraction as an echocardiographic marker ( Table 3 ).

www.frontiersin.org

Table 3 . Risk of bias within studies from publication and small study bias.

The proportion of carcinoid syndrome patients who had echocardiographic involvement of cardiac disease was 32% across the studies, however heterogeneity across studies was significant ( p  < 0.01) ( Figure 4 ). There was no significant difference in left ventricular ejection fraction between patients with and without cardiac involvement (MD 6.23, 95% CI −7.40–19.86, I 2 97.7%, p value < 0.01, Figure 5 ). 79% of patients with echocardiographic cardiac involvement had tricuspid valve abnormalities (95% CI 0.69–0.90; I 2  = 91.96%, p  < 0.01), of which moderate-severe tricuspid regurgitation was the most common with 70% (95% CI 0.56–0.84; I 2 81%, p  < 0.01) in pooled studies ( Supplementary Figures S1, S2 ). Tricuspid valve thickening was documented in 56% (95% CI 0.28–0.84, I 2  = 95.16%, p  < 0.01) of the pooled studies, severe tricuspid stenosis in 7% (95% CI 0.01–0.13 I 2  = 69.6%, p  = 0.01), and mild tricuspid regurgitation in 19% (95% CI 0.00–0.38; I 2  = 85.17%, p  < 0.01). However all analyses reported significant heterogeneity ( Supplementary Figures S3–S5 ).

www.frontiersin.org

Figure 4 . Echocardiographic involvement of cardiac disease in patients with carcinoid syndrome.

www.frontiersin.org

Figure 5 . ( A ) Surgical mortality at 1 month for carcinoid heart disease shown chronologically. ( B ) Pooled surgical mortality over time graphically. ( C ) Pooled surgical mortality over time based on meta-regression.

Significant pulmonary regurgitation was documented in 21% (95% CI 0.06–0.36; I 2  = 81.04%, p  < 0.01) of the pooled CS study cohort and mild pulmonary regurgitation in 40% (95% CI 0.29–0.50 =  I 2 34.40%, p  = 0.19), whilst pulmonary stenosis was noted in 43% (95% CI 0.24- 0.63; I 2  = 90.24%, p  < 0.01) ( Supplementary Figures S6–S8 ). Significant mitral regurgitation was less common, being documented in 11% (95% CI 0.05–0.17; I 2  = 55.55%, p  = 0.05) as was aortic regurgitation documented in 10% (95% CI 0.06–0.14; I 2  = 22.34%, p  < 0.01) of the pooled echocardiographic analysis, with low-moderate heterogeneity reported for both analyses ( Supplementary Figures S9, 10 ). The frequency of left-sided valve lesions was more consistent in the studies with results falling within the 95% confidence interval funnel plot, compared with the results for the frequency of right-sided valve lesion echocardiographic findings ( Supplementary Figure S11 ).

In regard to cardiac chambers, right atrial enlargement abnormalities was commonly found (74%) but this wasn’t statistically significant (95% CI 0.45–1.03), whilst right ventricular enlargement occurring in 43% of echocardiographic pooled findings was less common (95% CI 0.23–0.63). For all of these analyses, the heterogeneity for the studies was significant at I 2  > 85% for all analyses ( Supplementary Figures S12, S13 ).

3.3 Surgical outcomes

The fifteen studies exploring mortality outcomes from surgery for CHD included studies published from 1995 to 2020 ( 16 , 18 , 24 – 36 ), representing 766 surgeries from the year 1981 to 2017 ( Table 1 ). Most of these were tricuspid valve replacement surgeries, although the pooled cohort did include tricuspid valve repairs and multi-valve surgeries with tricuspid and pulmonary valve replacement. Exact surgical techniques were not specified, and in many studies the mortality analysis did not separate isolated tricuspid valve replacements from other CHD surgeries. The detail on number of valve replacements across studies where recorded is highlighted in more detail in Table 4 .

www.frontiersin.org

Table 4 . Surgical indications for participants in all studies.

Pooled surgical mortality for CHD was 12% at 1 month, 31% at 12 months and 56% at 24 months, 52% at 36 months, and 65% at 60 months ( Supplementary Figures S14–S18 ). The heterogeneity of these analyses were low to moderate: I 2  = 54.37%, 40.56% and 0% respectively). When looking at surgical outcomes over time, the one-month surgical outcome results showed a modest trend towards more recent surgeries having lower mortality rates than earlier surgeries ( Figures 5A–C ). At 12 months, the surgical outcome mortality did not show a chronological trend toward improvement over time ( Figure 6 ).

www.frontiersin.org

Figure 6 . Surgical mortality at 12 months carcinoid heart disease shown chronologically.

The risk of bias assessment in the included studies is shown in Table 2 and the supplementary material ( Supplementary Table S1 ). In the surgical outcome statistical analysis for one month mortality, small study bias and publication bias were present ( 37 ). In the 5-HIAA outcome, small study bias and publication bias were present ( Figure 3B ). Although it appeared as though the risk of bias reduced for surgical outcomes with time, at 12 months, 24 months and 36 months compared with the 1 month outcomes, this is not possible to confidently conclude as funnel plots and Egger's test to assess for possible publication bias are not applied to forest plots with less than ten studies as in this case, as when there are fewer studies the power of the tests is too low to distinguish chance from real asymmetry ( 37 ).

For the same reason, small study bias and publication bias were not able to be assessed for NTproBNP in CHD, nor for each echocardiographic variable in CHD.

4 Discussion

4.1 summary of findings.

This systematic review adds to the current evidence in showing that elevated biomarkers for CS (5-HIAA) and heart failure (NT-proBNP) are further elevated in CHD. However, it reveals that the presentation of CHD, reflected by the heterogeneity across studies and poor quality of some studies, is so varied that it is difficult to determine a cut-off value for diagnosis of CHD for either of the biomarkers.

This review has established the frequency of specific echocardiographic findings in the context of CHD amongst a CS and metastatic NET population. However, these findings are highly heterogeneous across studies, therefore, we could not infer specific criteria for echocardiographic findings for CHD based on the literature collectively in the format of a meta-analysis.

The most notable and unique finding of this systematic review is tracking the mortality of surgical management of the tricuspid valve for CHD chronologically. This revealed that although there is a non-statistically significant trend towards improved surgical outcomes over time, there is not yet any clear evidence that surgical management offers lower mortality today than it did in previous decades. There are also no studies directly comparing surgical management with medical management with respect to mortality or morbidity.

The exploration of the review looking at biomarkers in CHD was limited by the few available studies, as our inclusion criteria reviewed only articles with biomarker values in a CHD population. Our analysis showed that there was a correlation between higher NT-proBNP and 5-HIAA in CHD compared to those without CHD. Although a trend of elevation of levels in the presence of CHD could be seen with both biomarkers, the data is too varied and heterogeneous to determine a cut-off value for diagnosis of cardiac involvement amongst CS or metastatic NET patients. One study in the review by Bhattacharyya suggested a cut-off value of NT-proBNP 260 pg/mL to use to further investigate those metastatic NET patients with an echocardiogram ( 8 ), which is what has been similarly recommended in the recent clinical guidelines by the ENETS committee (a cut-off level of 235–260 pg/ml). NT-proBNP is considered the most sensitive marker for presence and severity of CHD and should be measured in all patients with high u5-HIAA even without CS ( 5 ). The clinical guidelines also specified that a Urinary-HIAA secretion >=50 umol is compatible with the diagnosis of CS and is recommended in screening of all patients. The ENETs guideline highlights that while U-5HIAA and NT-pro-BNP are good clinical markers for CS, prognostic markers of aggressive CS and CHD are still required, particularly ones that encompass the broad symptomology of CS. Our review supports the need for more predictive markers given the variability of population data across studies for the current biomarkers.

In reviewing the echocardiographic findings in CHD, it became clear that given CHD is largely diagnosed from echocardiography and only rarely confirmed from histological analysis with biopsy or surgery. Recently, Hofland et al. (2021) have devised a Synoptic Reporting of Echocardiography in CHD which derives a total carcinoid heart disease score based on TTE examination, which may be useful for standardizing care and follow-up of patients with CHD, including referral for surgery. Our findings are in line with Hofland and colleagues, reporting based on meta-analysis that the tricuspid valve is the most affected in CHD (79%), however the report specified 90% based on a large study by Bachhyra et al. The authors acknowledged the lack of standardized TTE reporting as a major challenge in this setting. Based on our review findings, we also recommend standardized timing of referral for NETs patients for echocardiography and cardiology review.

In terms of the surgical literature on CHD, two issues emerged. Firstly, when the study included more than thirty cases, the different types of surgical procedures (tricuspid repair vs. replacement vs. multi-valvular replacement with both tricuspid and pulmonary valves) were not separated in the analysis, therefore affecting the rate of mortality as severity of surgery and operative risk will vary across studies. Secondly, the only consistent objective outcome for comparison was mortality. Although the timeframe at which mortality was measured was not consistent across the studies, most studies gave a 30-day mortality outcome, then varied in providing 12-month and up to 144-month data. The trend that 30-day mortality has declined over time is in line with what is reported in ENETS clinical guidelines, and surgical valve replacement is still considered the best practice for managing CHD, however we have not shown a statistically significant improvement over time. Future studies should compare surgical and non-surgical interventions for CHD. The ENETS guidelines do recommend that prognostic indicators of CHD are needed to aid in accurate timing for surgical intervention.

4.2 Limitations

This systematic review had major limitations. The first and foremost, majority of the analyses were heterogeneous due to the variability in population data and high variability of cardiac involvement. CS is a multifactorial condition in which a broad range of symptoms, physical manifestations and biochemical findings need to be considered in the diagnosis. The 2022 ENETS guidance paper for CS and CHD provides a comprehensive guide for diagnosis of CS which can be referred to. Another limitation is that our analysis on surgical mortality does not take into account the change in this approach. However we believe that the chronological assessment of reduced mortality over time indicates an improvement in surgical management overall. Third, we were not able to adjust for other complexities of CS including the status of the tumor, CS status, liver synthetic function, presence of right heart failure and nutritional status. While our findings are based on studies of high heterogeneity, it is important to acknowledge that the findings of this systematic review are in line with what is reported in recent guidelines, however specific cut-off values cannot be determined without future studies being uniform in their diagnosis and assessment of CS.

The limitation in the literature has been a lack of trials, RCTs or otherwise, comparing medical to surgical management in the same population group. An RCT for such a rare condition is understandably difficult, and in the absence of RCTs the next best option would be to pool studies of surgically managed patients to compare with pooled studies of medically treated patients, aiming to control for variables including year of diagnosis and year of surgery to account for improvements in both medical and surgical treatments over time. However, this strategy will still not be able to account for the inherent selection bias of those who were offered surgery and those who were not; on one hand the surgical cohort possibly being more comorbid with worse prognosis to be offered surgery as a last-resort option, whilst on the other hand they may have to have been of better baseline functional state to be clinically deemed as able to survive a major operation. The authors of this systematic review did not conduct such an analysis as such selection biases would be impossible to discern during the initial search strategy.

4.3 Conclusion

CHD is an important but rare condition with no published RCTs of diagnosis or management performed. Whilst noted to be involved in the presence of CHD, neither the biomarkers NTproBNP and 5-HIAA, nor a variety of echocardiographic findings, could be validated as a clinical metric to diagnose CHD or assess its severity. Surgical interventions studies for CHD showed a reduction in mortality over time but there were no consistent comparisons to medical treatment. Although the recent ENETS guideline attempt to improve the clinical diagnosis and management of CHD, the data available to inform these guidelines is weak as demonstrated by our meta-analysis. Large international registries and carefully designed clinical trials for small cohorts are needed to better understand the expected clinical markers at each stage of CHD progression, with correlation with morbidity and mortality to determine the optimal management of this condition with a generally poor prognosis when left untreated.

Data availability statement

The original contributions presented in the study are included in the article/ Supplementary Material , further inquiries can be directed to the corresponding author.

Author contributions

JN: Conceptualization, Formal Analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing. PA: Formal Analysis, Investigation, Methodology, Project administration, Supervision, Writing – review and editing. MP: Formal Analysis – Investigation, Project administration, Writing – original draft, Writing – review and editing. DM: Formal Analysis, Investigation, Resources, Validation, Writing – review and editing. MD: Conceptualize Data curation, Formal Analysis, Investigation, Methodology, Validation, writing review and editing. SE: Data curation, Formal Analysis, Methodology, Resources, Software, Validation, Visualization, Writing – review and editing. PA: Conceptualization, Formal Analysis, Investigation, Supervision. Resources, Validation, Writing – review and editing. MA: Conceptualization, Formal Analysis, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review and editing.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcvm.2024.1353612/full#supplementary-material

1. Hayes AR, Davar J, Caplin ME. Carcinoid heart disease: a review. Endocr Metab Clinics . (2018) 47(3):671–82. doi: 10.1016/j.ecl.2018.04.012

Crossref Full Text | Google Scholar

2. Taal BG, Visser O. Epidemiology of neuroendocrine tumours. Neuroendocrinology . (2004) 80(Suppl. 1):3–7. doi: 10.1159/000080731

PubMed Abstract | Crossref Full Text | Google Scholar

3. Uema D, Alves C, Mesquita M, Nuñez JE, Siepmann T, Angel M, et al. Carcinoid heart disease and decreased overall survival among patients with neuroendocrine tumors: a retrospective multicenter Latin American cohort study. J Clin Med . (2019) 8(3):405. doi: 10.3390/jcm8030405

4. Alves C, Mesquita M, Silva C, Soeiro M, Hajjar L, Riechelmann RP. High tumour burden, delayed diagnosis and history of cardiovascular disease may be associated with carcinoid heart disease. ecancermedicalscience . (2018) 12:1–9. doi: 10.3332/ecancer.2018.879

5. Grozinsky-Glasberg S, Davar J, Hofland J, Dobson R, Prasad V, Pascher A, et al. European Neuroendocrine tumor society (ENETS) 2022 guidance paper for carcinoid syndrome and carcinoid heart disease. J Neuroendocrinol . (2022) 34(7):e13146. doi: 10.1111/jne.13146

6. Hamandi M, Smith RL, Ryan WH, Grayburn PA, Vasudevan A, George TJ, et al. Outcomes of isolated tricuspid valve surgery have improved in the modern era. Ann Thorac Surg . (2019) 108(1):11–5. doi: 10.1016/j.athoracsur.2019.03.004

7. Dreyfus J, Flagiello M, Bazire B, Eggenspieler F, Viau F, Riant E, et al. Isolated tricuspid valve surgery: impact of aetiology and clinical presentation on outcomes. Eur Heart J . (2020) 41(45):4304–17. doi: 10.1093/eurheartj/ehaa643

8. Bhattacharyya S, Toumpanakis C, Caplin ME, Davar J. Usefulness of N-terminal pro–brain natriuretic peptide as a biomarker of the presence of carcinoid heart disease. Am J Cardiol . (2008) 102(7):938–42. doi: 10.1016/j.amjcard.2008.05.047

9. Dobson R, Burgess MI, Banks M, Pritchard DM, Vora J, Valle JW, et al. The association of a panel of biomarkers with the presence and severity of carcinoid heart disease: a cross-sectional study. PLoS One . (2013) 8(9):e73679. doi: 10.1371/journal.pone.0073679

10. Dobson R, Burgess MI, Pritchard DM, Cuthbertson DJ. The clinical presentation and management of carcinoid heart disease. Int J Cardiol . (2014) 173(1):29–32. doi: 10.1016/j.ijcard.2014.02.037

11. Dobson R, Burgess MI, Valle JW, Pritchard DM, Vora J, Wong C, et al. Serial surveillance of carcinoid heart disease: factors associated with echocardiographic progression and mortality. Br J Cancer . (2014) 111(9):1703–9. doi: 10.1038/bjc.2014.468

12. Zuetenhorst JM, Korse CM, Bonfrer JMG, Bakker RH, Taal BG. Role of natriuretic peptides in the diagnosis and treatment of patients with carcinoid heart disease. Br J Cancer . (2004) 90(11):2073–9. doi: 10.1038/sj.bjc.6601816

13. Denney WD, Kemp WE, Anthony LB, Oates JA, Byrd BF. Echocardiographic and biochemical evaluation of the development and progression of carcinoid heart disease. J Am Coll Cardiol . (1998) 32(4):1017–22. doi: 10.1016/S0735-1097(98)00354-4

14. Haugaa KH, Bergestuen DS, Sahakyan LG, Skulstad H, Aakhus S, Thiis-Evensen E, et al. Evaluation of right ventricular dysfunction by myocardial strain echocardiography in patients with intestinal carcinoid disease. J Am Soc Echocardiogr . (2011) 24(6):644–50. doi: 10.1016/j.echo.2011.02.009

15. Himelman RB, Schiller NB. Clinical and echocardiographic comparison of patients with the carcinoid syndrome with and without carcinoid heart disease. Am J Cardiol . (1989) 63(5):347–52. doi: 10.1016/0002-9149(89)90344-5

16. Komoda S, Komoda T, Pavel ME, Morawietz L, Wiedenmann B, Hetzer R, et al. Cardiac surgery for carcinoid heart disease in 12 cases. Gen Thorac Cardiovasc Surg . (2011) 59(12):780–5. doi: 10.1007/s11748-010-0758-9

17. Mansencal N, McKenna WJ, Mitry E, Beauchet A, Pellerin D, Rougier P, et al. Comparison of prognostic value of tissue doppler imaging in carcinoid heart disease versus the value in patients with the carcinoid syndrome but without carcinoid heart disease. Am J Cardiol . (2010) 105(4):527–31. doi: 10.1016/j.amjcard.2009.10.023

18. Mokhles P, van Herwerden LA, de Jong PL, de Herder WW, Siregar S, Constantinescu AA, et al. Carcinoid heart disease: outcomes after surgical valve replacement. Eur J Cardiothorac Surg . (2012) 41(6):1278–83. doi: 10.1093/ejcts/ezr227

19. Møller JE, Pellikka PA, Bernheim AM, Schaff HV, Rubin J, Connolly HM. Prognosis of carcinoid heart disease: analysis of 200 cases over two decades. Circulation . (2005) 112(21):3320–7. doi: 10.1161/CIRCULATIONAHA.105.553750

20. Bhattacharyya S, Toumpanakis C, Burke M, Taylor AM, Caplin ME, Davar J. Features of carcinoid heart disease identified by 2-and 3-dimensional echocardiography and cardiac MRI. Circ Cardiovas Imaging . (2010) 3(1):103–11. doi: 10.1161/CIRCIMAGING.109.886846

21. Lundin L, Landelius J, Andren B, Oberg K. Transoesophageal echocardiography improves the diagnostic value of cardiac ultrasound in patients with carcinoid heart disease. Heart . (1990) 64(3):190–4. doi: 10.1136/hrt.64.3.190

22. Mansencal N, Mitry E, Bachet J-B, Rougier P, Dubourg O. Echocardiographic follow-up of treated patients with carcinoid syndrome. Am J Cardiol . (2010) 105(11):1588–91. doi: 10.1016/j.amjcard.2010.01.017

23. Moyssakis I, Rallidis L, Guida G, Nihoyannopoulos P. Incidence and evolution of carcinoid syndrome in the heart. J Heart Valve Dis . (1997) 6(6):625–30.9427132

PubMed Abstract | Google Scholar

24. Nguyen A, Schaff HV, Abel MD, Luis SA, Lahr BD, Halfdanarson TR, et al. Improving outcome of valve replacement for carcinoid heart disease. J Thorac Cardiovasc Surg . (2019) 158(1):99–107. 2. doi: 10.1016/j.jtcvs.2018.09.025

25. Bhattacharyya S, Toumpanakis C, Chilkunda D, Caplin ME, Davar J. Risk factors for the development and progression of carcinoid heart disease. Am J Cardiol . (2011) 107(8):1221–6. doi: 10.1016/j.amjcard.2010.12.025

26. Castillo JG, Filsoufi F, Rahmanian PB, Anyanwu A, Zacks JS, Warner RR, et al. Early and late results of valvular surgery for carcinoid heart disease. J Am Coll Cardiol . (2008) 51(15):1507–9. doi: 10.1016/j.jacc.2007.12.036

27. Connolly HM, Nishimura RA, Smith HC, Pellikka PA, Mullany CJ, Kvols LK. Outcome of cardiac surgery for carcinoid heart disease. J Am Coll Cardiol . (1995) 25(2):410–6. doi: 10.1016/0735-1097(94)00374-Y

28. Connolly HM, Schaff HV, Abel MD, Rubin J, Askew JW, Li Z, et al. Early and late outcomes of surgical treatment in carcinoid heart disease. J Am Coll Cardiol . (2015) 66(20):2189–96. doi: 10.1016/j.jacc.2015.09.014

29. Kuntze T, Owais T, Secknus M-A, Kaemmerer D, Baum R, Girdauskas E. Results of contemporary valve surgery in patients with carcinoid heart disease. J Heart Valve Dis . (2016) 356:363.

Google Scholar

30. Møller JE, Connolly HM, Rubin J, Seward JB, Modesto K, Pellikka PA. Factors associated with progression of carcinoid heart disease. N Engl J Med . (2003) 348(11):1005–15. doi: 10.1056/NEJMoa021451

31. Mortelmans P, Herregods M-C, Rega F, Timmermans P. The path to surgery in carcinoid heart disease: a retrospective study and a multidisciplinary proposal of a new algorithm. Acta Cardiol . (2019) 74(3):207–14. doi: 10.1080/00015385.2018.1478242

32. Robiolio PA, Rigolin VH, Harrison JK, Lowe JE, Moore JO, Bashore TM, et al. Predictors of outcome of tricuspid valve replacement in carcinoid heart disease. Am J Cardiol . (1995) 75(7):485–8. doi: 10.1016/S0002-9149(99)80586-4

33. Said SM, Burkhart HM, Schaff HV, Johnson JN, Connolly HM, Dearani JA. When should a mechanical tricuspid valve replacement be considered? J Thorac Cardiovasc Surg . (2014) 148(2):603–8. doi: 10.1016/j.jtcvs.2013.09.043

34. Veen KM, Hart EA, Mokhles MM, De Jong PL, De Heer F, Van Boven W-JP, et al. Outcomes after tricuspid valve replacement for carcinoid heart disease: a multicenter study. Structural Heart . (2020) 4(2):122–30. doi: 10.1080/24748706.2019.1706795

35. Yong MS, Kong G, Ludhani P, Michael M, Morgan J, Hofman MS, et al. Early outcomes of surgery for carcinoid heart disease. Heart Lung Circ . (2020) 29(5):742–7. doi: 10.1016/j.hlc.2019.05.183

36. Davar J, Connolly HM, Caplin ME, Pavel M, Zacks J, Bhattacharyya S, et al. Diagnosing and managing carcinoid heart disease in patients with neuroendocrine tumors: an expert statement. J Am Coll Cardiol . (2017) 69(10):1288–304. doi: 10.1016/j.jacc.2016.12.030

37. Deeks J, Higgins J, Altman D, Green S. Part1: cochrane reviews. Part 2: general methods for cochrane reviews. In: Higgins JPT, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions Version 51 . London: Cochrane United Kingdom (2011). Electronic chapter 5–9.

Keywords: carcinoid heart disease, carcinoid syndrome, meta-analysis, systematic review, endocrinology

Citation: Namkoong J, Andraweera PH, Pathirana M, Munawar D, Downie M, Edwards S, Averbuj P and Arstall MA (2024) A systematic review and meta-analysis of the diagnosis and surgical management of carcinoid heart disease. Front. Cardiovasc. Med. 11:1353612. doi: 10.3389/fcvm.2024.1353612

Received: 21 December 2023; Accepted: 11 March 2024; Published: 20 March 2024.

Reviewed by:

© 2024 Namkoong, Andraweera, Pathirana, Munawar, Downie, Edwards, Averbuj and Arstall. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Margaret A. Arstall [email protected]

IMAGES

  1. Mastering Systematic Literature Reviews with AI Tools

    systematic literature review using ai tools

  2. Artificial intelligence maturity model: a systematic literature review

    systematic literature review using ai tools

  3. AI tools for writing Systematic Literature Reviews

    systematic literature review using ai tools

  4. Use AI to Start Your Literature Review in a second|| Paper Digest

    systematic literature review using ai tools

  5. Systematic reviews

    systematic literature review using ai tools

  6. How To Write A Literature Review

    systematic literature review using ai tools

VIDEO

  1. jainology 2nd semester classes

  2. Literature Review| Research

  3. How To Do Literature Review With Ai Tools Step by Step Tutorial

  4. After Literature Review Chapter, how to write Review Journal Can I use AI?

  5. A.I Tools for Review of Literature & Research Mapping

  6. Literature Review Using Free Tools -- Pubmed

COMMENTS

  1. AI for Systematic Review

    Tools to support the automation of systematic reviews: a scoping review The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study Using artificial intelligence methods for systematic review in health sciences: A systematic review

  2. Artificial intelligence in systematic reviews: promising when

    Results Considerations to ensure methodological quality when using AI in systematic reviews included: the choice of whether to use AI, the need of both deduplication and checking for inter-reviewer agreement, how to choose a stopping criterion and the quality of reporting. Using the tool in our review resulted in much time saved: only 23% of the articles were assessed by the reviewer.

  3. Artificial intelligence in systematic reviews: promising when

    In our systematic review, the AI tool 'ASReview' (V.0.17.1) 9 was used for the screening of titles and abstracts by the first reviewer (SHBvD). The tool uses an active researcher-in-the-loop machine learning algorithm to rank the articles from high to low probability of eligibility for inclusion by text mining.

  4. An open source machine learning framework for efficient and ...

    It is a challenging task for any research field to screen the literature and determine what needs to be included in a systematic review in a transparent way. A new open source machine learning ...

  5. Using artificial intelligence methods for systematic review in health

    This review delineated automated tools and platforms that employ artificial intelligence (AI) approaches and evaluated the reported benefits and challenges in using such methods. A search was conducted in 4 databases (Medline, Embase, CDSR, and Epistemonikos) up to April 2021 for systematic reviews and other related reviews implementing AI methods.

  6. Artificial intelligence to automate the systematic review of scientific

    Artificial intelligence (AI) has acquired notorious relevance in modern computing as it effectively solves complex tasks traditionally done by humans. AI provides methods to represent and infer knowledge, efficiently manipulate texts and learn from vast amount of data. These characteristics are applicable in many activities that human find laborious or repetitive, as is the case of the ...

  7. PDF arXiv:2402.08565v1 [cs.AI] 13 Feb 2024

    ligence (AI) in Systematic Literature Reviews (SLRs). A SLR is a rigorous and organised methodology that assesses and integrates previous research on a given topic. Numerous tools have been developed to assist and partially automate the SLR process. The increasing role of AI in this field shows great potential in providing more effective ...

  8. Cheap, Quick, and Rigorous: Artificial Intelligence and the Systematic

    The systematic literature review (SLR) is the gold standard in providing research a firm evidence foundation to support decision-making. ... (2014) provide an overview of AI tools that can be used for steps in the review process, and Antons et al. (2023) provide a comprehensive review method for the creation of standalone computational ...

  9. [2402.08565] Artificial Intelligence for Literature Reviews

    This manuscript presents a comprehensive review of the use of Artificial Intelligence (AI) in Systematic Literature Reviews (SLRs). A SLR is a rigorous and organised methodology that assesses and integrates previous research on a given topic. Numerous tools have been developed to assist and partially automate the SLR process. The increasing role of AI in this field shows great potential in ...

  10. PDF PRISMA AI reporting guidelines for systematic reviews and meta ...

    The most accepted guideline for reporting systematic reviews and meta-analyses is the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) state-ment. This evidence-based ...

  11. Toward systematic review automation: a practical guide to using machine

    Technologies and methods to speed up the production of systematic reviews by reducing the manual labour involved have recently emerged. Automation has been proposed or used to expedite most steps of the systematic review process, including search, screening, and data extraction. However, how these technologies work in practice and when (and when not) to use them is often not clear to ...

  12. Mastering Systematic Literature Reviews with AI Tools

    Analyzing and synthesizing data using AI tools. Prompt Engineering for summarizing, comparing and synthesizing literature using ChatGPT and AI tools. Module 4: Reporting a Systematic Literature Review using AI Tools. Using AI tools to generate summary reports and visualizations. Preparing research articles for High impact journals.

  13. AI tools in evidence synthesis

    A variety of AI tools can be used during the systematic review or evidence synthesis process. These may be used to assist with developing a search strategy; locating relevant articles or resources; or during the data screening, data extraction or synthesis stage.They can also be used to draft plain language summaries.. The overall consensus is that the AI tools can be very useful in different ...

  14. Role of AI Tools in Systematic Literature Review

    Using AI in systematic reviews is revolutionizing the process, making it more practical and sustainable. Incorporating AI into the process not only expedites the systematic literature reviews but also reduces human errors, and comes as a cost-effective systematic approach. In this article, we'll learn more about the role of AI in systematic ...

  15. Using AI's in reviews

    Using Large language models like GPT to do Q&A over papers (II) — using Perplexity.ai (free) over CORE, Scite.ai, Semantic Scholar etc domains. Blog. Academic Publishers Are Missing the Point on ChatGPT. Blog - Scholarly Kitchen. Using artificial intelligence methods for systematic review in health sciences: A systematic review.

  16. System for Systematic Literature Review Using Multiple AI agents

    The Systematic Literature Review (SLR) is a fundamental component of academic research, offering a comprehensive and unbiased overview of existing literature on a specific topic Keele et al. ().It involves a structured methodology for identifying, evaluating, and synthesizing all relevant research to address clearly defined research questions Kitchenham et al. ().

  17. Rayyan

    Rayyan is a fantastic tool to save time and improve systematic reviews!!! It has changed my life as a researcher!!! thanks. Easy to use, friendly, has everything you need for cooperative work on the systematic review. Rayyan makes life easy in every way when conducting a systematic review and it is easy to use. Great usability and functionality.

  18. Automation of literature screening using machine ...

    Background Systematic review is an indispensable tool for optimal evidence collection and evaluation in evidence-based medicine. However, the explosive increase of the original literatures makes it difficult to accomplish critical appraisal and regular update. Artificial intelligence (AI) algorithms have been applied to automate the literature screening procedure in medical systematic reviews ...

  19. Silvi.ai

    Silvi.ai was founded in 2018 by Professor in Health Economic Evidence, Tove Holm-Larsen, and expert in Machine Learning, Rasmus Hvingelby. The idea for Silvi stemmed from their own research, and the need to conduct systematic literature reviews and meta-analyses faster. The ideas behind Silvi were originally a component of a larger project.

  20. A systematic literature review of AI-based digital decision support

    Unstructured and text-based information is especially challenging to use for an AI. Further, most available datasets like the Jerusalem Trauma Outreach and Prevention Study do not include data on modern sensors . The most common AI algorithm found during this literature review was support vector machines.

  21. System for systematic literature review using multiple AI agents

    This work introduces a novel multi-AI agent model designed to fully automate the process of conducting an SLR, utilizing the capabilities of Large Language Models (LLMs), and streamlines the review process, enhancing efficiency and accuracy. Systematic Literature Reviews (SLRs) have become the foundation of evidence-based studies, enabling researchers to identify, classify, and combine ...

  22. Automate your literature review with AI

    How to write a literature review with AI. Now that we understand the benefits of a literature review using artificial intelligence, let's explore how you can automate the process. Literature reviews with AI-powered tools can save you countless hours and allow a more comprehensive and systematic approach. Here's one process you can follow:

  23. AI-Based Human Resource Management Tools and Techniques; A Systematic

    This systematic literature review addresses the intersection of AI and HRM, elucidating AI-driven tools and techniques that optimize recruitment, performance management, and employee engagement. Rooted in the recognition that conventional HRM approaches can be time-intensive and biased, AI's integration promises to enhance decision-making ...

  24. Semantic Scholar

    Semantic Scholar uses groundbreaking AI and engineering to understand the semantics of scientific literature to help Scholars discover relevant research. ... Semantic Scholar is a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI. Learn More. About

  25. Deepfake Detection: A Systematic Literature Review

    Over the last few decades, rapid progress in AI, machine learning, and deep learning has resulted in new techniques and various tools for manipulating multimedia. Though the technology has been mostly used in legitimate applications such as for entertainment and education, etc., malicious users have also exploited them for unlawful or nefarious purposes. For example, high-quality and realistic ...

  26. System for systematic literature review using multiple AI agents

    Systematic Literature Reviews (SLRs) have become the foundation of evidence-based studies, enabling researchers to identify, classify, and combine existing studies based on specific research questions. ... To this end, we introduce a novel multi-AI agent model designed to fully automate the process of conducting an SLR. By utilizing the ...

  27. A systematic review of learning task design for K-12 AI education

    The primary objective of this systematic review is to present, synthesize, and assess the available body of literature concerning AI education programs within K-12 settings, with a specific emphasis on the design of AI learning tasks. To achieve this objective, this systematic review addresses the following research questions. 1.

  28. Buildings

    It begins with exploring the importance of AI and digital tools in revolutionizing contemporary urban planning practices. Through the methodology based on the Systematic Reviews and Meta-Analyses (PRISMA) protocol, this review sifts through relevant literature over the past two decades by categorizing artificial intelligence technologies based ...

  29. Empowering education development through AIGC: A systematic literature

    3.2 Initial literature search. According to the PRISMA guidelines, SLR generally requires three or more databases as sources of literature (Moher et al., 2009).To comprehensively obtain the required literature data for this study, we drew inspiration from existing SLR studies in the field of educational technology (Radianti et al., 2020; Luo et al., 2021) and selected four English literature ...

  30. A systematic review and meta-analysis of the diagnosis and surgical

    IntroductionCarcinoid heart disease (CHD), a complication of carcinoid syndrome (CS), is a rare condition that can lead to right sided valvular heart disease and has been traditionally associated with a poor prognosis. We conducted a systematic review and meta-analysis to explore the accuracy of biomarkers and echocardiography in diagnosing CHD amongst patients who are already known to have ...