• Mayo Clinic Libraries
  • Systematic Reviews
  • Critical Appraisal by Study Design

Systematic Reviews: Critical Appraisal by Study Design

  • Knowledge Synthesis Comparison
  • Knowledge Synthesis Decision Tree
  • Standards & Reporting Results
  • Materials in the Mayo Clinic Libraries
  • Training Resources
  • Review Teams
  • Develop & Refine Your Research Question
  • Develop a Timeline
  • Project Management
  • Communication
  • PRISMA-P Checklist
  • Eligibility Criteria
  • Register your Protocol
  • Other Resources
  • Other Screening Tools
  • Grey Literature Searching
  • Citation Searching
  • Data Extraction Tools
  • Minimize Bias
  • Synthesis & Meta-Analysis
  • Publishing your Systematic Review

Tools for Critical Appraisal of Studies

critical appraisal of research studies

“The purpose of critical appraisal is to determine the scientific merit of a research report and its applicability to clinical decision making.” 1 Conducting a critical appraisal of a study is imperative to any well executed evidence review, but the process can be time consuming and difficult. 2 The critical appraisal process requires “a methodological approach coupled with the right tools and skills to match these methods is essential for finding meaningful results.” 3 In short, it is a method of differentiating good research from bad research.

Critical Appraisal by Study Design (featured tools)

  • Non-RCTs or Observational Studies
  • Diagnostic Accuracy
  • Animal Studies
  • Qualitative Research
  • Tool Repository
  • AMSTAR 2 The original AMSTAR was developed to assess the risk of bias in systematic reviews that included only randomized controlled trials. AMSTAR 2 was published in 2017 and allows researchers to “identify high quality systematic reviews, including those based on non-randomised studies of healthcare interventions.” 4 more... less... AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews)
  • ROBIS ROBIS is a tool designed specifically to assess the risk of bias in systematic reviews. “The tool is completed in three phases: (1) assess relevance(optional), (2) identify concerns with the review process, and (3) judge risk of bias in the review. Signaling questions are included to help assess specific concerns about potential biases with the review.” 5 more... less... ROBIS (Risk of Bias in Systematic Reviews)
  • BMJ Framework for Assessing Systematic Reviews This framework provides a checklist that is used to evaluate the quality of a systematic review.
  • CASP Checklist for Systematic Reviews This CASP checklist is not a scoring system, but rather a method of appraising systematic reviews by considering: 1. Are the results of the study valid? 2. What are the results? 3. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • CEBM Systematic Reviews Critical Appraisal Sheet The CEBM’s critical appraisal sheets are designed to help you appraise the reliability, importance, and applicability of clinical evidence. more... less... CEBM (Centre for Evidence-Based Medicine)
  • JBI Critical Appraisal Tools, Checklist for Systematic Reviews JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis.
  • NHLBI Study Quality Assessment of Systematic Reviews and Meta-Analyses The NHLBI’s quality assessment tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. more... less... NHLBI (National Heart, Lung, and Blood Institute)
  • RoB 2 RoB 2 “provides a framework for assessing the risk of bias in a single estimate of an intervention effect reported from a randomized trial,” rather than the entire trial. 6 more... less... RoB 2 (revised tool to assess Risk of Bias in randomized trials)
  • CASP Randomised Controlled Trials Checklist This CASP checklist considers various aspects of an RCT that require critical appraisal: 1. Is the basic study design valid for a randomized controlled trial? 2. Was the study methodologically sound? 3. What are the results? 4. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • CONSORT Statement The CONSORT checklist includes 25 items to determine the quality of randomized controlled trials. “Critical appraisal of the quality of clinical trials is possible only if the design, conduct, and analysis of RCTs are thoroughly and accurately described in the report.” 7 more... less... CONSORT (Consolidated Standards of Reporting Trials)
  • NHLBI Study Quality Assessment of Controlled Intervention Studies The NHLBI’s quality assessment tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. more... less... NHLBI (National Heart, Lung, and Blood Institute)
  • JBI Critical Appraisal Tools Checklist for Randomized Controlled Trials JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis.
  • ROBINS-I ROBINS-I is a “tool for evaluating risk of bias in estimates of the comparative effectiveness… of interventions from studies that did not use randomization to allocate units… to comparison groups.” 8 more... less... ROBINS-I (Risk Of Bias in Non-randomized Studies – of Interventions)
  • NOS This tool is used primarily to evaluate and appraise case-control or cohort studies. more... less... NOS (Newcastle-Ottawa Scale)
  • AXIS Cross-sectional studies are frequently used as an evidence base for diagnostic testing, risk factors for disease, and prevalence studies. “The AXIS tool focuses mainly on the presented [study] methods and results.” 9 more... less... AXIS (Appraisal tool for Cross-Sectional Studies)
  • NHLBI Study Quality Assessment Tools for Non-Randomized Studies The NHLBI’s quality assessment tools were designed to assist reviewers in focusing on concepts that are key for critical appraisal of the internal validity of a study. • Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies • Quality Assessment of Case-Control Studies • Quality Assessment Tool for Before-After (Pre-Post) Studies With No Control Group • Quality Assessment Tool for Case Series Studies more... less... NHLBI (National Heart, Lung, and Blood Institute)
  • Case Series Studies Quality Appraisal Checklist Developed by the Institute of Health Economics (Canada), the checklist is comprised of 20 questions to assess “the robustness of the evidence of uncontrolled, [case series] studies.” 10
  • Methodological Quality and Synthesis of Case Series and Case Reports In this paper, Dr. Murad and colleagues “present a framework for appraisal, synthesis and application of evidence derived from case reports and case series.” 11
  • MINORS The MINORS instrument contains 12 items and was developed for evaluating the quality of observational or non-randomized studies. 12 This tool may be of particular interest to researchers who would like to critically appraise surgical studies. more... less... MINORS (Methodological Index for Non-Randomized Studies)
  • JBI Critical Appraisal Tools for Non-Randomized Trials JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis. • Checklist for Analytical Cross Sectional Studies • Checklist for Case Control Studies • Checklist for Case Reports • Checklist for Case Series • Checklist for Cohort Studies
  • QUADAS-2 The QUADAS-2 tool “is designed to assess the quality of primary diagnostic accuracy studies… [it] consists of 4 key domains that discuss patient selection, index test, reference standard, and flow of patients through the study and timing of the index tests and reference standard.” 13 more... less... QUADAS-2 (a revised tool for the Quality Assessment of Diagnostic Accuracy Studies)
  • JBI Critical Appraisal Tools Checklist for Diagnostic Test Accuracy Studies JBI Critical Appraisal Tools help you assess the methodological quality of a study and to determine the extent to which study has addressed the possibility of bias in its design, conduct and analysis.
  • STARD 2015 The authors of the standards note that “[e]ssential elements of [diagnostic accuracy] study methods are often poorly described and sometimes completely omitted, making both critical appraisal and replication difficult, if not impossible.”10 The Standards for the Reporting of Diagnostic Accuracy Studies was developed “to help… improve completeness and transparency in reporting of diagnostic accuracy studies.” 14 more... less... STARD 2015 (Standards for the Reporting of Diagnostic Accuracy Studies)
  • CASP Diagnostic Study Checklist This CASP checklist considers various aspects of diagnostic test studies including: 1. Are the results of the study valid? 2. What were the results? 3. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • CEBM Diagnostic Critical Appraisal Sheet The CEBM’s critical appraisal sheets are designed to help you appraise the reliability, importance, and applicability of clinical evidence. more... less... CEBM (Centre for Evidence-Based Medicine)
  • SYRCLE’s RoB “[I]mplementation of [SYRCLE’s RoB tool] will facilitate and improve critical appraisal of evidence from animal studies. This may… enhance the efficiency of translating animal research into clinical practice and increase awareness of the necessity of improving the methodological quality of animal studies.” 15 more... less... SYRCLE’s RoB (SYstematic Review Center for Laboratory animal Experimentation’s Risk of Bias)
  • ARRIVE 2.0 “The [ARRIVE 2.0] guidelines are a checklist of information to include in a manuscript to ensure that publications [on in vivo animal studies] contain enough information to add to the knowledge base.” 16 more... less... ARRIVE 2.0 (Animal Research: Reporting of In Vivo Experiments)
  • Critical Appraisal of Studies Using Laboratory Animal Models This article provides “an approach to critically appraising papers based on the results of laboratory animal experiments,” and discusses various “bias domains” in the literature that critical appraisal can identify. 17
  • CEBM Critical Appraisal of Qualitative Studies Sheet The CEBM’s critical appraisal sheets are designed to help you appraise the reliability, importance and applicability of clinical evidence. more... less... CEBM (Centre for Evidence-Based Medicine)
  • CASP Qualitative Studies Checklist This CASP checklist considers various aspects of qualitative research studies including: 1. Are the results of the study valid? 2. What were the results? 3. Will the results help locally? more... less... CASP (Critical Appraisal Skills Programme)
  • Quality Assessment and Risk of Bias Tool Repository Created by librarians at Duke University, this extensive listing contains over 100 commonly used risk of bias tools that may be sorted by study type.
  • Latitudes Network A library of risk of bias tools for use in evidence syntheses that provides selection help and training videos.

References & Recommended Reading

1.     Kolaski, K., Logan, L. R., & Ioannidis, J. P. (2024). Guidance to best tools and practices for systematic reviews .  British Journal of Pharmacology ,  181 (1), 180-210

2.    Portney LG.  Foundations of clinical research : applications to evidence-based practice.  Fourth edition. ed. Philadelphia: F A Davis; 2020.

3.     Fowkes FG, Fulton PM.  Critical appraisal of published research: introductory guidelines.   BMJ (Clinical research ed).  1991;302(6785):1136-1140.

4.     Singh S.  Critical appraisal skills programme.   Journal of Pharmacology and Pharmacotherapeutics.  2013;4(1):76-77.

5.     Shea BJ, Reeves BC, Wells G, et al.  AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both.   BMJ (Clinical research ed).  2017;358:j4008.

6.     Whiting P, Savovic J, Higgins JPT, et al.  ROBIS: A new tool to assess risk of bias in systematic reviews was developed.   Journal of clinical epidemiology.  2016;69:225-234.

7.     Sterne JAC, Savovic J, Page MJ, et al.  RoB 2: a revised tool for assessing risk of bias in randomised trials.  BMJ (Clinical research ed).  2019;366:l4898.

8.     Moher D, Hopewell S, Schulz KF, et al.  CONSORT 2010 Explanation and Elaboration: Updated guidelines for reporting parallel group randomised trials.  Journal of clinical epidemiology.  2010;63(8):e1-37.

9.     Sterne JA, Hernan MA, Reeves BC, et al.  ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions.  BMJ (Clinical research ed).  2016;355:i4919.

10.     Downes MJ, Brennan ML, Williams HC, Dean RS.  Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS).   BMJ open.  2016;6(12):e011458.

11.   Guo B, Moga C, Harstall C, Schopflocher D.  A principal component analysis is conducted for a case series quality appraisal checklist.   Journal of clinical epidemiology.  2016;69:199-207.e192.

12.   Murad MH, Sultan S, Haffar S, Bazerbachi F.  Methodological quality and synthesis of case series and case reports.  BMJ evidence-based medicine.  2018;23(2):60-63.

13.   Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y, Chipponi J.  Methodological index for non-randomized studies (MINORS): development and validation of a new instrument.   ANZ journal of surgery.  2003;73(9):712-716.

14.   Whiting PF, Rutjes AWS, Westwood ME, et al.  QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.   Annals of internal medicine.  2011;155(8):529-536.

15.   Bossuyt PM, Reitsma JB, Bruns DE, et al.  STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies.   BMJ (Clinical research ed).  2015;351:h5527.

16.   Hooijmans CR, Rovers MM, de Vries RBM, Leenaars M, Ritskes-Hoitinga M, Langendam MW.  SYRCLE's risk of bias tool for animal studies.   BMC medical research methodology.  2014;14:43.

17.   Percie du Sert N, Ahluwalia A, Alam S, et al.  Reporting animal research: Explanation and elaboration for the ARRIVE guidelines 2.0.  PLoS biology.  2020;18(7):e3000411.

18.   O'Connor AM, Sargeant JM.  Critical appraisal of studies using laboratory animal models.   ILAR journal.  2014;55(3):405-417.

  • << Previous: Minimize Bias
  • Next: GRADE >>
  • Last Updated: Apr 2, 2024 11:58 AM
  • URL: https://libraryguides.mayo.edu/systematicreviewprocess

Systematic Reviews and Meta-Analyses: Critical Appraisal

  • Get Started
  • Exploratory Search
  • Where to Search
  • How to Search
  • Grey Literature
  • What about errata and retractions?
  • Eligibility Screening

Critical Appraisal

  • Data Extraction
  • Synthesis & Discussion
  • Assess Certainty
  • Share & Archive

All relevant studies must undergo a critical appraisal to evaluate the risk of bias , or internal and external validity, of all relevant references.

This step often occurs simultaneously with the Data Extraction  phase. It is a vital stage of the systematic review process to uphold the cornerstone of reducing bias .

Risk of Bias Tools

  • Presenting Results

Critical Appraisal 

Critical appraisal is also referred to as quality assessment , risk of bias assessment , and similar variations. Sometimes the critical appraisal phase is confused with the assessment of certainty of evidence  - although related, these are independent  stages of the systematic review process.

According to the Center for Evidence-Based Medicine (CEBM): 

"Critical appraisal is the process of carefully and systematically assessing the outcome of scientific research (evidence) to judge its trustworthiness, value and relevance in a particular context. Critical appraisal looks at the way a study is conducted and examines factors such as internal validity , generalizability and relevance."

Systematic reviews require a formal, systematic, uniform appraisal of the quality - or  risk of bias  - of all   relevant  studies. In a critical appraisal, you are examining the methods   not  the results .

Process Details

Use risk of bias tools  f or this stage - these tools are often formatted as checklists. You can find more about risk of bias tools in the next tab! If a refresher of some common biases, definitions, and examples is helpful, check out the Catalogue of Bias  from the University of Oxford and CEBM.

Just like the other stages of a systematic review,  2 reviewers  should assess risk of bias in each reference . As such, your team should calculate and report interrater reliability , deciding ahead of time how to resolve conflicts. Oftentimes the critical appraisal occurs at the same time as data extraction .

In addition to the formal risk of bias assessment, your team should also consider meta-biases like publication bias, selective reporting, etc. Search for errata and retractions related to included research, and consider other limitations of and concerns about the included studies and how this may impact the reliability of your review.

Note: Subjectivity of Critical Appraisal 

The critical appraisal is inherently subjective , from the selection of the RoB tool(s) to the final assessment of each study. Therefore, it is important to consider how tools compare, and how this process may impact the results of your review. Check out these studies evaluating Risk of Bias Tools:

Page MJ, McKenzie JE, Higgins JPT  Tools for assessing risk of reporting biases in studies and syntheses of studies: a systematic review   BMJ Open 2018;8:e019703. doi:  10.1136/bmjopen-2017-019703

Losilla, J.-M., Oliveras, I., Marin-Garcia, J. A., & Vives, J. (2018).  Three risk of bias tools lead to opposite conclusions in observational research synthesis.   Journal of Clinical Epidemiology ,  101 , 61–72.  https://doi.org/10.1016/j.jclinepi.2018.05.021

Margulis, A. V., Pladevall, M., Riera-Guardia, N., Varas-Lorenzo, C., Hazell, L., Berkman, N., Viswanathan, M., & Perez-Gutthann, S. (2014).  Quality assessment of observational studies in a drug-safety systematic review, comparison of two tools: The Newcastle-Ottawa Scale and the RTI item bank .  Clinical Epidemiology , 359.  https://doi.org/10.2147/CLEP.S66677

Select Risk of Bias Tool(s)

When you think of a critical appraisal in a systematic review and/or meta-analysis, think of assessing the  risk of bias  of included studies. The potential biases to consider will vary by study design. Therefore, risk of bias tool(s) should be selected based on the designs of included studies.  If you include more than one study design , you'll include more than one risk of bias tool.  Whenever possible, select tools developed for a discipline relevant to your topic.

Risk of bias tools  are simply checklists used to consider bias specific to a study design, and sometimes discipline. 

  • Cochrane Risk of Bias Tool  | randomized trials, health
  • Collaboration for Environmental Evidence (CEE) Critical Appraisal Tool, Prototype  | environmental management focused
  • Crowe Critical Appraisal Tool  | mixed methods
  • Meta-QAT  | public health focused
  • Meta-QAT Grey Literature Companion | grey literature
  • Mixed Method Appraisal Tool (MMAT) | mixed method ( more detail )
  • Newcastle-Ottawa Scale | non-randomized studies
  • RTI Item Bank  | observational studies
  • SYRCLE's Risk of Bias Tool | animal studies
  • Quality Checklist for Blogs | blogs
  • Quality Checklist for Podcasts | podcasts

Risk of Bias Toolsets

Risk of bias tool sets  are a series of tools developed by the same group or organization, where each tool addresses a specific study design. The organization is usually discipline specific. Note that many also include a systematic review and/or meta-analysis quality assessment tool, but that these tools will not be useful during this stage as existing reviews will not be folded into your synthesis.

Critical Appraisal Skills Programme (CASP) Checklists include tools for:

  • Randomized Controlled Trials 
  • Qualitative Studies
  • Cohort Study
  • Diagnostic Study
  • Case Control Study
  • Economic Evaluation
  • Clinical Prediction Rule 

National Institutes of Health (NIH) Study Quality Assessment Tools include tools for:

  • Controlled intervention studies
  • Observational cohort and cross-sectional studies
  • Case-control studies
  • Before-after (pre-post) studies without control
  • Case series studies

Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) includes tools for:

  • Case-control
  • Cross-sectional
  • Conference abstracts

Joanna Briggs Institute (JBI) Manual for Evidence Synthesis includes the following tools found in respective relevant chapters:

  • Qualitative research (appendix 2.1)
  • Randomized controlled trials (appendix 3.1)
  • Quasi-experimental studies (non-randomized experimental studies; appendix 3.3)
  • Text and opinion (appendix 4.1)  with explanation (Appendix 4.2)
  • Prevalence studies (appendix 5.1)
  • Cohort studies (appendix 7.1)
  • Case-control studies (appendix 7.2)
  • Case series (appendix 7.3)
  • Case reports (appendix 7.4)
  • Cross sectional studies (appendix 7.5)
  • Diagnostic test accuracy (appendix 9.1)

Latitudes Network 

  • Systematic Reviews ( ROBIS )
  • Randomized Controlled Trials ( RoB 2 ) 
  • Cohort studies - interventions ( ROBINS-I )
  • Cohort studies - exposure ( ROBINS-E ) 
  • Diagnostic accuracy studies ( QUADAS-2 ; QUADAS-C ) 
  • Prognostic accuracy studies ( QUAPAS ) 
  • Prediction models ( PROBAST ) 
  • Reliability studies ( COSMIN )

Risk of Bias Tool Repositories

Risk of bias tool repositories  are curated lists of existing tools - kind of like what we've presented above. Although we update this guide with new tools as we find them, these repositories may contain additional resources:

  • Quality Assessment and Risk of Bias Tool Repository , from Duke University's Medical Center Library & Archives
  • Interactive Tableau Dataset of 68 Risk of Bias Tools , from the National Toxicology Program

Presenting Critical Appraisal Results

Risk of bias within each reference should be presented in a table like the one seen below. Studies are presented along the y-axis and biases considered (what is addressed by the tool) along the x-axis, such that each row belongs to a study , and each column belongs to a bias (or domain/category of biases).

Example - Graphic representation of risk of bias within each study

It is also best practice to present the bias across the included set of literature  (seen below). Each bias or bias category  is represented as a row and each row is associated with a bar showing the  percentage of the total included literature  that was rated as low risk, some risk, high risk, or unable to determine the risk. 

Example - Graphic representation of risk of bias across each study

The images above can be created using the ROBVIS package of metaverse for evidence synthesis in R. You can create your own graphics without using this software.

Methodological Guidance

  • Health Sciences
  • Animal, Food Sciences
  • Social Sciences
  • Environmental Sciences

Cochrane Handbook  -  Part 2: Core Methods

Chapter 7 : Considering bias and conflicts of interest among the included studies

  • 7.2 Empirical evidence of bias
  • 7.3 General procedures for risk-of-bias assessment
  • 7.4 Presentation of assessment of risk of bias
  • 7.5 Summary assessments of risk of bias 
  • 7.6 Incorporating assessment of risk of bias into analyses 
  • 7.7 Considering risk of bias due to missing results
  • 7.8 Considering source of funding and conflict of interest of authors of included studies 

Chapter 8: Assessing risk of bias in randomized trial

  • 8.2 Overview of RoB 2
  • 8.3 Bias arising from the randomization process
  • 8.4 Bias due to deviations from intended interventions
  • 8.5 Bias due to missing outcome data 
  • 8.6 Bias in measurement of the outcome
  • 8.7 Bias in selection of the reported result
  • 8.8 Differences from the previous version of the tool

Chapter 25:  Risk of bias in non-randomized studies

SYREAF Resources

Step 3: identifying eligible papers.

Conducting systematic reviews of intervention questions II: Relevance screening, data extraction, assessing risk of bias , presenting the results and interpreting the findings.  Sargeant JM, O’Connor AM. Zoonoses Public Health. 2014 Jun;61 Suppl 1:39-51. doi: 10.1111/zph.12124. PMID: 24905995

Campbell -  MECCIR

C51. Assessing risk of bias / study quality ( protocol & review / final manuscript )

C52. Assessing risk of bias / study quality in duplicate  ( protocol & review / final manuscript )

C53. Supporting judgements of risk of bias / study quality ( review / final manuscript )

C54. Providing sources of information for risk of bias / study quality assessments ( review / final manuscript )

C55. Differentiating between performance bias and detection bias  ( protocol & review / final manuscript )

C56. If applicable, assessing risk of bias due to lack of blinding for different outcomes ( review / final manuscript )

C57. If applicable, assessing completeness of data for different outcomes ( review / final manuscript )

C58. If applicable, summarizing risk of bias when using the Cochrane Risk of Bias tool ( review / final manuscript )

C59. Addressing risk of bias / study quality in the synthesis  ( review / final manuscript )

C60. Incorporating assessments of risk of bias  ( review / final manuscript )

CEE  -  Guidelines and Standards for Evidence synthesis in Environmental Management

Section 7.  critical appraisal of study validity.

CEE Standards for conduct and reporting

7.1.2   Internal validity

7.1.3  External validity 

Reporting in Protocol and Final Manuscript

  • Final Manuscript

In the Protocol |  PRISMA-P

Risk of bias individual studies (item 14).

...planned approach to assessing risk of bias should include the constructs being assessed and a definition for each, reviewer judgment options (high, low, unclear), the number of assessors ...training, piloting, previous risk of bias assessment experience...method(s) of assessment (independent or in duplicate)...

Protocol for reporting results

" ...summarise risk of bias assessments across studies or outcomes ..."

Protocol for reporting  impact on synthesis

"...describe how risk of bias assessments will be incorporated into data synthesis (that is, subgroup or sensitivity analyses) and their potential influence on findings of the review (Item 15c) in the protocol..."

In the Final Manuscript |  PRISMA

For the critical appraisal stage, PRISMA requires specific items to be addressed in both the methods and results section.

Study Risk of Bias Assessment (Item 11; report in methods )

Essential items.

  • Specify the tool(s) (and version) used to assess risk of bias in the included studies.
  • Specify the methodological domains/components/items of the risk of bias tool(s) used.
  • Report whether an overall risk of bias judgment that summarised across domains/components/items was made, and if so, what rules were used to reach an overall judgment.
  • If any adaptations to an existing tool to assess risk of bias in studies were made (such as omitting or modifying items), specify the adaptations.
  • If a new risk of bias tool was developed for use in the review, describe the content of the tool and make it publicly accessible.
  • Report how many reviewers assessed risk of bias in each study, whether multiple reviewers worked independently (such as assessments performed by one reviewer and checked by another), and any processes used to resolve disagreements between assessors.
  • Report any processes used to obtain or confirm relevant information from study investigators.
  • If an automation tool was used to assess risk of bias in studies, report how the automation tool was used (such as machine learning models to extract sentences from articles relevant to risk of bias88), how the tool was trained , and details on the tool’s performance and internal validation

Risk of Bias in Studies (Item 18; report in results )

  • Present tables or figures indicating for each study the risk of bias in each domain /component/item assessed and overall study-level risk of bias.
  • Present justification for each risk of bias judgment—for example, in the form of relevant quotations from reports of included studies.

Additional Items

If assessments of risk of bias were done for specific outcomes or results in each study , consider displaying risk of bias judgments on a forest plot, next to the study results, so that the limitations of studies contributing to a particular meta-analysis are evident (see Sterne et al86 for an example forest plot).

Decorative - a recording on this topic is available!

We host a workshop each fall on critical appraisal, check out our latest recording !

  • << Previous: Eligibility Screening
  • Next: Data Extraction >>
  • Last Updated: Mar 28, 2024 2:54 PM
  • URL: https://guides.lib.vt.edu/SRMA

University of Texas

  • University of Texas Libraries
  • UT Libraries

Systematic Reviews & Evidence Synthesis Methods

Critical appraisal.

  • Types of Reviews
  • Formulate Question
  • Find Existing Reviews & Protocols
  • Register a Protocol
  • Searching Systematically
  • Supplementary Searching
  • Managing Results
  • Deduplication
  • Glossary of terms
  • Librarian Support
  • Video tutorials This link opens in a new window
  • Systematic Review & Evidence Synthesis Boot Camp

Some reviews require a critical appraisal for each study that makes it through the screening process. This involves a risk of bias assessment and/or a quality assessment. The goal of these reviews is not just to find all of the studies, but to determine their methodological rigor, and therefore, their credibility.

"Critical appraisal is the balanced assessment of a piece of research, looking for its strengths and weaknesses and them coming to a balanced judgement about its trustworthiness and its suitability for use in a particular context." 1

It's important to consider the impact that poorly designed studies could have on your findings and to rule out inaccurate or biased work.

Selection of a valid critical appraisal tool, testing the tool with several of the selected studies, and involving two or more reviewers in the appraisal are good practices to follow.

1. Purssell E, McCrae N. How to Perform a Systematic Literature Review: A Guide for Healthcare Researchers, Practitioners and Students. 1st ed. Springer ;  2020.

Evaluation Tools

  • The Appraisal of Guidelines for Research & Evaluation Instrument (AGREE II) The Appraisal of Guidelines for Research & Evaluation Instrument (AGREE II) was developed to address the issue of variability in the quality of practice guidelines.
  • Critical Appraisal Skills Programme (CASP) Checklists Critical Appraisal checklists for many different study types
  • Critical Review Form for Qualitative Studies Version 2, developed out of McMaster University
  • Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS) Downes MJ, Brennan ML, Williams HC, et al. Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS). BMJ Open 2016;6:e011458. doi:10.1136/bmjopen-2016-011458
  • Downs & Black Checklist for Assessing Studies Downs, S. H., & Black, N. (1998). The Feasibility of Creating a Checklist for the Assessment of the Methodological Quality Both of Randomised and Non-Randomised Studies of Health Care Interventions. Journal of Epidemiology and Community Health (1979-), 52(6), 377–384.
  • GRADE The Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group "has developed a common, sensible and transparent approach to grading quality (or certainty) of evidence and strength of recommendations."
  • Grade Handbook Full handbook on the GRADE method for grading quality of evidence.
  • MAGIC (Making GRADE the Irresistible choice) Clear succinct guidance in how to use GRADE
  • Joanna Briggs Institute. Critical Appraisal Tools "JBI’s critical appraisal tools assist in assessing the trustworthiness, relevance and results of published papers." Includes checklists for 13 types of articles.
  • Latitudes Network This is a searchable library of validity assessment tools for use in evidence syntheses. This website also provides access to training on the process of validity assessment.
  • Mixed Methods Appraisal Tool A tool that can be used to appraise a mix of studies that are included in a systematic review - qualitative research, RCTs, non-randomized studies, quantitative studies, mixed methods studies.
  • RoB 2 Tool Higgins JPT, Sterne JAC, Savović J, Page MJ, Hróbjartsson A, Boutron I, Reeves B, Eldridge S. A revised tool for assessing risk of bias in randomized trials In: Chandler J, McKenzie J, Boutron I, Welch V (editors). Cochrane Methods. Cochrane Database of Systematic Reviews 2016, Issue 10 (Suppl 1). dx.doi.org/10.1002/14651858.CD201601.
  • ROBINS-I Risk of Bias for non-randomized (observational) studies or cohorts of interventions Sterne J A, Hernán M A, Reeves B C, Savović J, Berkman N D, Viswanathan M et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions BMJ 2016; 355 :i4919 doi:10.1136/bmj.i4919
  • Scottish Intercollegiate Guidelines Network. Critical Appraisal Notes and Checklists "Methodological assessment of studies selected as potential sources of evidence is based on a number of criteria that focus on those aspects of the study design that research has shown to have a significant effect on the risk of bias in the results reported and conclusions drawn. These criteria differ between study types, and a range of checklists is used to bring a degree of consistency to the assessment process."
  • The TREND Statement (CDC) Des Jarlais DC, Lyles C, Crepaz N, and the TREND Group. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: The TREND statement. Am J Public Health. 2004;94:361-366.
  • Assembling the Pieces of a Systematic Reviews, Chapter 8: Evaluating: Study Selection and Critical Appraisal.
  • How to Perform a Systematic Literature Review, Chapter: Critical Appraisal: Assessing the Quality of Studies.

Other library guides

  • Duke University Medical Center Library. Systematic Reviews: Assess for Quality and Bias
  • UNC Health Sciences Library. Systematic Reviews: Assess Quality of Included Studies
  • Last Updated: Feb 27, 2024 12:53 PM
  • URL: https://guides.lib.utexas.edu/systematicreviews

Creative Commons License

critical appraisal of research studies

  • Critical Appraisal Tools
  • Introduction
  • Related Guides
  • Getting Help

Critical Appraisal of Studies

Critical appraisal is the process of carefully and systematically examining research to judge its trustworthiness, and its value/relevance in a particular context by providing a framework to evaluate the research. During the critical appraisal process, researchers can:

  • Decide whether studies have been undertaken in a way that makes their findings reliable as well as valid and unbiased
  • Make sense of the results
  • Know what these results mean in the context of the decision they are making
  • Determine if the results are relevant to their patients/schoolwork/research

Burls, A. (2009). What is critical appraisal? In What Is This Series: Evidence-based medicine. Available online at  What is Critical Appraisal?

Critical appraisal is included in the process of writing high quality reviews, like systematic and integrative reviews and for evaluating evidence from RCTs and other study designs. For more information on systematic reviews, check out our  Systematic Review  guide.

  • Next: Critical Appraisal Tools >>
  • Last Updated: Nov 16, 2023 1:27 PM
  • URL: https://guides.library.duq.edu/critappraise

Critical Appraisal of Quantitative Research

  • Living reference work entry
  • Latest version View entry history
  • First Online: 12 June 2018
  • Cite this living reference work entry

Book cover

  • Rocco Cavaleri 2 ,
  • Sameer Bhole 3 , 5 &
  • Amit Arora 2 , 4 , 5  

1251 Accesses

1 Citations

2 Altmetric

Critical appraisal skills are important for anyone wishing to make informed decisions or improve the quality of healthcare delivery. A good critical appraisal provides information regarding the believability and usefulness of a particular study. However, the appraisal process is often overlooked, and critically appraising quantitative research can be daunting for both researchers and clinicians. This chapter introduces the concept of critical appraisal and highlights its importance in evidence-based practice. Readers are then introduced to the most common quantitative study designs and key questions to ask when appraising each type of study. These studies include systematic reviews, experimental studies (randomized controlled trials and non-randomized controlled trials), and observational studies (cohort, case-control, and cross-sectional studies). This chapter also provides the tools most commonly used to appraise the methodological and reporting quality of quantitative studies. Overall, this chapter serves as a step-by-step guide to appraising quantitative research in healthcare settings.

  • Critical appraisal
  • Quantitative research
  • Methodological quality
  • Reporting quality

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Altman DG, Bland JM. Treatment allocation in controlled trials: why randomise? BMJ. 1999;318(7192):1209.

Article   Google Scholar  

Arora A, Scott JA, Bhole S, Do L, Schwarz E, Blinkhorn AS. Early childhood feeding practices and dental caries in preschool children: a multi-centre birth cohort study. BMC Public Health. 2011;11(1):28.

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, … Lijmer JG. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med. 2003;138(1):W1–12.

Google Scholar  

Cavaleri R, Schabrun S, Te M, Chipchase L. Hand therapy versus corticosteroid injections in the treatment of de quervain’s disease: a systematic review and meta-analysis. J Hand Ther. 2016;29(1):3–11. https://doi.org/10.1016/j.jht.2015.10.004 .

Centre for Evidence-based Management. Critical appraisal tools. 2017. Retrieved 20 Dec 2017, from https://www.cebma.org/resources-and-tools/what-is-critical-appraisal/ .

Centre for Evidence-based Medicine. Critical appraisal worksheets. 2017. Retrieved 3 Dec 2017, from http://www.cebm.net/blog/2014/06/10/critical-appraisal/ .

Clark HD, Wells GA, Huët C, McAlister FA, Salmi LR, Fergusson D, Laupacis A. Assessing the quality of randomized trials: reliability of the jadad scale. Control Clin Trials. 1999;20(5):448–52. https://doi.org/10.1016/S0197-2456(99)00026-4 .

Critical Appraisal Skills Program. Casp checklists. 2017. Retrieved 5 Dec 2017, from http://www.casp-uk.net/casp-tools-checklists .

Dawes M, Davies P, Gray A, Mant J, Seers K, Snowball R. Evidence-based practice: a primer for health care professionals. London: Elsevier; 2005.

Dumville JC, Torgerson DJ, Hewitt CE. Research methods: reporting attrition in randomised controlled trials. BMJ. 2006;332(7547):969.

Greenhalgh T, Donald A. Evidence-based health care workbook: understanding research for individual and group learning. London: BMJ Publishing Group; 2000.

Guyatt GH, Sackett DL, Cook DJ, Guyatt G, Bass E, Brill-Edwards P, … Gerstein H. Users’ guides to the medical literature: II. How to use an article about therapy or prevention. JAMA. 1993;270(21):2598–601.

Guyatt GH, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, … Jaeschke R. GRADE guidelines: 1. Introduction – GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4), 383–94.

Herbert R, Jamtvedt G, Mead J, Birger Hagen K. Practical evidence-based physiotherapy. London: Elsevier Health Sciences; 2005.

Hewitt CE, Torgerson DJ. Is restricted randomisation necessary? BMJ. 2006;332(7556):1506–8.

Higgins JPT, Green S. Cochrane handbook for systematic reviews of interventions version 5.0.2. The cochrane collaboration. 2009. Retrieved 3 Dec 2017, from http://www.cochrane-handbook.org .

Hoffmann T, Bennett S, Del Mar C. Evidence-based practice across the health professions. Chatswood: Elsevier Health Sciences; 2013.

Hoffmann T, Glasziou PP, Boutron I, Milne R, Perera R, Moher D, … Johnston M. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ, 2014;348: g1687.

Joanna Briggs Institute. Critical appraisal tools. 2017. Retrieved 4 Dec 2017, from http://joannabriggs.org/research/critical-appraisal-tools.html .

Mhaskar R, Emmanuel P, Mishra S, Patel S, Naik E, Kumar A. Critical appraisal skills are essential to informed decision-making. Indian J Sex Transm Dis. 2009;30(2):112–9. https://doi.org/10.4103/0253-7184.62770 .

Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomized trials. BMC Med Res Methodol. 2001;1(1):2. https://doi.org/10.1186/1471-2288-1-2 .

Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group. Preferred reporting items for systematic reviews and meta-analyses: the prisma statement. PLoS Med. 2009;6(7):e1000097.

National Health and Medical Research Council. NHMRC additional levels of evidence and grades for recommendations for developers of guidelines. Canberra: NHMRC; 2009. Retrieved from https://www.nhmrc.gov.au/_files_nhmrc/file/guidelines/developers/nhmrc_levels_grades_evidence_120423.pdf .

National Heart Lung and Blood Institute. Study quality assessment tools. 2017. Retrieved 17 Dec 2017, from https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools .

Physiotherapy Evidence Database. PEDro scale. 2017. Retrieved 10 Dec 2017, from https://www.pedro.org.au/english/downloads/pedro-scale/ .

Portney L, Watkins M. Foundations of clinical research: application to practice. 2nd ed. Upper Saddle River: F.A. Davis Company/Publishers; 2009.

Roberts C, Torgerson DJ. Understanding controlled trials: baseline imbalance in randomised controlled trials. BMJ. 1999;319(7203):185.

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, … Kristjansson E. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008. https://doi.org/10.1136/bmj.j4008 .

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, … Boutron I. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919.

Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, … Thacker SB. Meta-analysis of observational studies in epidemiology: a proposal for reporting. JAMA. 2000;283(15):2008–12.

Von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, Initiative S. The strengthening the reporting of observational studies in epidemiology (strobe) statement: guidelines for reporting observational studies. Int J Surg. 2014;12(12):1495–9.

Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, … Bossuyt PM. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011;155(8):529–36.

Download references

Author information

Authors and affiliations.

School of Science and Health, Western Sydney University, Campbelltown, NSW, Australia

Rocco Cavaleri & Amit Arora

Sydney Dental School, Faculty of Medicine and Health, The University of Sydney, Surry Hills, NSW, Australia

Sameer Bhole

Discipline of Child and Adolescent Health, Sydney Medical School, The University of Sydney, Westmead, NSW, Australia

Oral Health Services, Sydney Local Health District and Sydney Dental Hospital, NSW Health, Surry Hills, NSW, Australia

Sameer Bhole & Amit Arora

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Rocco Cavaleri .

Editor information

Editors and affiliations.

School of Science & Health, Western Sydney University, Penrith, New South Wales, Australia

Pranee Liamputtong

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this entry

Cite this entry.

Cavaleri, R., Bhole, S., Arora, A. (2018). Critical Appraisal of Quantitative Research. In: Liamputtong, P. (eds) Handbook of Research Methods in Health Social Sciences . Springer, Singapore. https://doi.org/10.1007/978-981-10-2779-6_120-2

Download citation

DOI : https://doi.org/10.1007/978-981-10-2779-6_120-2

Received : 20 January 2018

Accepted : 12 February 2018

Published : 12 June 2018

Publisher Name : Springer, Singapore

Print ISBN : 978-981-10-2779-6

Online ISBN : 978-981-10-2779-6

eBook Packages : Springer Reference Social Sciences Reference Module Humanities and Social Sciences Reference Module Business, Economics and Social Sciences

  • Publish with us

Policies and ethics

Chapter history

DOI: https://doi.org/10.1007/978-981-10-2779-6_120-2

DOI: https://doi.org/10.1007/978-981-10-2779-6_120-1

  • Find a journal
  • Track your research

Critical Appraisal of Research Articles: Systematic Reviews

  • Systematic Reviews
  • Clinical Practice Guidelines
  • Qualitative Studies

What is a Systematic Review?

A systematic review is a review of a clearly formulated queston that uses systematic and explicit methods to identify, select, and critically appraise relevant research, and to collect and analyze data from studies that are included in the review. Statistical methods may or may not be used to analyze and summarize the results of the included studies.

How to Find Systematic Reviews

1. Search the Cochrane Database of Systematic Reviews

2.  Using  PubMed , either use the 'Systematic Reviews' filter or add this to the end of your search 'AND (systematic review [ti])

3. If searching CINAHL , limit by publication type (select "Systematic Review").

Questions to Ask

  • Is it a systematic review of the right type of studies which are relevant to your question?
  • Does the methods section describe how all the relevant trials were found and assessed?    The paper should give a comprehensive account of the sources consulted in the search for relevant papers, the search strategy used to find them, and the quality and relevance criteria used to decide whether to include them in the review.
  • The authors should include hand searching of journals and searching for unpublished literature.
  • Were any obvious databases missed?
  • Did the authors check the reference lists of articles and textbooks?
  • Did they contact experts (to get their list of references checked for completeness and to try to find out about ongoing or unpublished research)?
  • Did they use an appropriate search strategy; were important subject terms missed?
  • Who were the study participants and how is their disease status defined?
  • What intervention(s) were given, how, and in what setting?
  • How were outcomes assessed?

      3.   Are the studies consistent, both clinically and statistically?

      4.   Compare with PRISMA

  • Look at the most recent PRISMA checklist to see how well the authors documented the various preferred reporting items.

Appraisal Checklists for Systematic Reviews

  • Critical Appraisals Skills Programme (CASP)
  • Joanna Briggs Institute
  • << Previous: Prognosis
  • Next: Clinical Practice Guidelines >>

Creative Commons License

  • Last Updated: Mar 1, 2024 11:56 AM
  • URL: https://guides.himmelfarb.gwu.edu/CriticalAppraisal

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2850
  • [email protected]
  • https://himmelfarb.gwu.edu
  • Methodology
  • Open access
  • Published: 02 April 2024

Critical appraisal of machine learning prognostic models for acute pancreatitis: protocol for a systematic review

  • Amier Hassan 1   na1 ,
  • Brian Critelli 1   na1 ,
  • Ila Lahooti 2 ,
  • Ali Lahooti 1 ,
  • Nate Matzko 1 ,
  • Jan Niklas Adams 3 ,
  • Lukas Liss 3 ,
  • Justin Quion 4 ,
  • David Restrepo 4 ,
  • Melica Nikahd 5 ,
  • Stacey Culp 5 ,
  • Lydia Noh 6 ,
  • Kathleen Tong 2 ,
  • Jun Sung Park 2 ,
  • Venkata Akshintala 7 ,
  • John A. Windsor 8 ,
  • Nikhil K. Mull 9 ,
  • Georgios I. Papachristou 2 ,
  • Leo Anthony Celi 8 , 10 &
  • Peter J. Lee 2  

Diagnostic and Prognostic Research volume  8 , Article number:  6 ( 2024 ) Cite this article

Metrics details

Acute pancreatitis (AP) is an acute inflammatory disorder that is common, costly, and is increasing in incidence worldwide with over 300,000 hospitalizations occurring yearly in the United States alone. As its course and outcomes vary widely, a critical knowledge gap in the field has been a lack of accurate prognostic tools to forecast AP patients’ outcomes. Despite several published studies in the last three decades, the predictive performance of published prognostic models has been found to be suboptimal. Recently, non-regression machine learning models (ML) have garnered intense interest in medicine for their potential for better predictive performance. Each year, an increasing number of AP models are being published. However, their methodologic quality relating to transparent reporting and risk of bias in study design has never been systematically appraised. Therefore, through collaboration between a group of clinicians and data scientists with appropriate content expertise, we will perform a systematic review of papers published between January 2021 and December 2023 containing artificial intelligence prognostic models in AP. To systematically assess these studies, the authors will leverage the CHARMS checklist, PROBAST tool for risk of bias assessment, and the most current version of the TRIPOD-AI. (Research Registry ( http://www.reviewregistry1727 .).

Peer Review reports

Introduction

Acute pancreatitis (AP)—characterized by acute inflammation of the pancreas—is the most common cause of gastrointestinal-related hospitalization in the United States, accounting for over two billion dollars in annual healthcare spending [ 1 ]. The etiology of AP is variable, with the most common causes being alcohol and gallstones in adults and congenital anomalies, trauma, and drugs being more frequently implicated in pediatric patients [ 2 ]. The condition’s natural history is both diverse and unpredictable, ranging from short-term events such as intensive care unit admission, organ failure, and pancreatic gland necrosis to long-term sequelae such as diabetes, exocrine pancreatic dysfunction, malnutrition, recurrent pancreatitis, and chronic pancreatitis [ 3 , 4 ]. Currently, the development of an accurate prognostic model for use in AP population for research and clinical setting is among the top priorities of the National Institute of Health [ 5 ]. A variety of potentially effective drugs are in the pipeline for testing in AP, where an accurate model which prognosticates clinically significant developments such as worsening disease severity or mortality would be of crucial importance for cohort enrichment for randomized clinical trials [ 6 ]. Additionally, there is currently a critical need for an accurate prognostic model to use for clinical decision support and for patient counseling [ 7 ].

We have previously shown that the most well-known regression-based prognostic models in AP (e.g., Glasgow criteria, Acute Physiology and Chronic Health Examination (APACHE), Systemic Inflammatory Response Syndrome (SIRS), and the Bedside Index for Severity in Acute Pancreatitis (BISAP), etc.)—which are broadly characterized as models which assume a linear association between predictors and outcome(s)—showed suboptimal predictive performances, highlighting the need for better models [ 7 ]. Machine learning (ML) is one such field that holds great promise in AP prognostication. Broadly defined, ML uses the computer to fit statistical models for datasets where predictors and outcomes have non-linear associations and complex interactions. Some examples of ML technique include random forests and neural networks. Recent studies have shown these models to purportedly surpass existing regression-based models across multiple predictive performance metrics [ 8 , 9 , 10 ]. However, caution is necessary before high-performing AI models can be fully embraced as numerous concerns have been documented from methodologic issues, concerning model building practices, and a lack of transparent reporting in different fields of medicine [ 11 , 12 , 13 ], all of which can negatively influence the generalizability of the model. Contrary to the fields of oncology, cardiology, and surgery where studies that critically appraise ML prognostic models started to emerge, there has never been a critical appraisal of ML prognostic models developed for AP [ 14 , 15 , 16 ]. Conducting such an appraisal can help identify common shortcomings of studies and promote improvement in the methodologic rigor of ML prognostic model studies. Herein, we address this unmet need by conducting a systematic review which identifies, describes, and appraises all non-regression ML prognostic models in AP published between January of 2021 and December of 2023.

Aims and objectives

This project aims to identify, describe, and appraise all prognostic models developed through ML in AP published from January 2021 through December 2023. The objective of the review is to critically appraise the prognostic model studies and the developed models in AP in terms of the following: (a) risk of bias in the study design, (b) completeness of reporting in accordance with the standards of the TRIPOD-AI statement, (c) summarize predictive performances of the published ML prognostic models in AP.

To achieve these objectives, we will conduct a systematic review to identify studies published from January 2021 through December 2023 in which a prognostic model was either developed and/or validated (either internally or externally), with or without model updating. This review will include any studies of prospective or retrospective design (including post hoc analysis of clinical trials) that use multiple prognostic factors to predict an individual’s risk of outcomes related to AP. We will assess the included studies for risk of bias using the Prediction Model Risk of Bias Assessment Tool (PROBAST) [ 17 ], Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist [ 18 ] for data extraction, and assess quality of reporting by the standards of the TRIPOD-AI statement, making this the first systematic review of ML prognostic models to include these tools in the AP literature. We have registered this review at Research Registry ( http://www.reviewregistry1727 .).

The PICOTS system for our review is presented next

Participants

The target population of interest comprises adult patients with a diagnosis of AP.

Intervention

We will consider any ML-based prognostic models that have been developed/validated to be used in the AP population.

This review seeks to critically appraise all existing ML-based prognostic models published between January 2021 and December 2023, for their risk of bias, completeness of reporting, and summary of their predictive performance as applicable. Therefore, this section is not applicable.

Our primary focus is the methodologic quality of the published ML-based prognostic model studies. However, if sufficient published data (i.e., if more than two studies investigated the same ML-based prognostic model predicting the same outcome) are available, meta-analyses of predictive performance will be performed. Examples of outcomes commonly predicted in AP are (1) severity of AP, (2) pancreatic necrosis, and (3) mortality among others.

Timing and setting

We decided not to set limits on restrictions on the setting (e.g., inpatients or outpatients) or prediction horizon (how far into the future the model predicts). Given that our primary focus is methodologic quality of the published studies of ML prognostic models, we opted for an inclusive approach.

Study eligibility criteria

Inclusion criteria

Studies with all adult patients (i.e., aged 18 years or older) that contain a prognostic model developed/validated with non-regression ML techniques in AP

Studies published in the English language

Studies that predict any outcome(s) of AP

Exclusion criteria

Studies involving participants with chronic pancreatitis or pancreatic cancer

Studies including animals

Studies that include post-surgical pancreatitis, which is considered a different disease entity in pancreatology with a different natural history and outcomes

Prognostic factor studies without prediction model building

Models published only in abstract form given that it will preclude adequate PROBAST assessment

Prognostic model studies that predict development of AP instead of outcomes of AP

Studies with regression-based model building

Review articles

Information sources

We will search the following databases from January 1, 2021, to December 31, 2023: MEDLINE (OvidSP) and EMBASE (OvidSP). We will screen the reference lists of the included studies, relevant review articles, Google Scholar, medRxiv, and practice guidelines. Search strategies are given in Tables 1 and 2 . Because ML methodology is rapidly evolving, with newer algorithms quickly outdating models developed as recent as 4 years ago, we will focus this review on the studies published in the last 3 years.

Search strategy

We will aim for a broad literature search by targeting studies that focus on investigating prognosis in AP patients, combining validated search strings that are optimized for sensitivity and specificity [ 12 ]. The screening of title-abstract and full text will be assessed by two independent reviewers (LN, IL, KT, JP, AH, BC, NM, or AL) using Covidence software, a system designed to aid the conduct of systematic reviews [ 19 ]. Disputes regarding the inclusion of a publication at either stage will be resolved by a third independent reviewer, PJL. The objective nature of our inclusion and exclusion criteria obviated the need for consensus meetings.

Assessment of study quality

Recently, a tool entitled, “Prediction Model Study Risk of Bias Assessment Tool (PROBAST)” was developed to assess both risk of bias and applicability of a prediction model [ 9 ]. Using PROBAST, we will systematically assess the applicability of published prognostic models in AP and their risk of bias. Given the concerns raised about low inter-rater agreement [ 20 ], we have conducted PROBAST rater training: this included weekly meetings with an AP content expert who has undergone appropriate PROBAST training by the PROBAST developers (PJL) to discuss every signaling question on the PROBAST domains with examples for 6 months. When ML content expertise is required to accurately complete PROBAST, the data scientists, led by ML methodology expert (LAC), will be consulted for a valid risk of bias assessment. This training has been and continues to be conducted according to customized training and guidance described in the literature [ 21 ] which was shown to significantly improve the raters’ ability to correctly apply and interpret the PROBAST instrument.

PROBAST includes assessment of participants, predictors, outcomes, and analysis [ 9 ]. The risk of bias assessment will consider study design and sample size, analysis of missing data and continuous variables, prognostic factor selection, data accessibility, and model internal or external validation for all included studies. All studies will be assessed by two independent reviewers utilizing the PROBAST tool, and any disagreements will be settled by a third party (PJL and LAC).

Data elements collected

Data elements listed in the CHARMS checklist will be extracted. Additionally, we will focus on summarizing the results of our appraisal of specific domains of quality. The following domains will be evaluated.

Reporting of the study methods and findings: we will assess for alignment with expected standards of reporting and identify common areas of deficiency. For this purpose, the most recent draft of the TRIPOD-AI checklist will be used, which is publicly available [ 22 ].

Conduct of the study: we will use PROBAST’s framework to assess 4 main domains of a prognostic model study.

The contents of this systematic review will adhere to the TRIPOD-SRMA checklist [ 23 ].

Data reporting

Descriptive statistics including study publication information, sources of data, participant demographics, candidate predictors, outcomes predicted, missing data, model development information, and model evaluation metrics will all be reported in accordance with the CHARMS checklist. The overall risk of bias and risk of bias in each PROBAST domain will be summarized for all included studies in accordance with PROBAST developers’ recommendations. Summary statistics of and fidelity to the current TRIPOD-AI statement checklist will be reported as well. The fidelity to the current TRIPOD-AI statement checklist will be measured by assigning 1 point to every item on the TRIPOD-AI checklist if reported, and 0 point when a required item on the TRIPOD-AI is not reported. And we will add up the total points divided by the total possible points to give a numeric representation of an article’s fidelity to TRIPOD-AI. When applicable and feasible, a meta-analysis of the predictive performance (e.g., c-statistic, sensitivity, specificity, positive and negative predictive value) will be conducted and presented. As important, we will also be looking for measures of calibration (e.g., intercept and calibration slope) to assess the agreement between observed outcomes and model’s computed predictions.

AP is a common and often debilitating gastrointestinal disease, and its incidence is rising worldwide [ 24 ]. Despite over 300 studies in the literature reporting prognostic models for AP, none of the published models are currently used for clinical decision support [ 25 ]. There has been a sharp increase in the ML-based prognostic model studies, but they have not been critically appraised for their methodologic quality. It is necessary to appraise the methodologic quality of the published studies in order to promote studies with valid and reproducible results. Furthermore, transparent reporting of methodology will allow other investigators to externally validate existing models. We hope our review will highlight the current quality of methodology reporting and thus serve as a framework for the future review of ML-derived prognostic models for other diseases in gastroenterology. Additionally, we hope our work emphasizes the importance of collaboration between data scientists and clinicians. As artificial intelligence continues to rapidly transform the world, the role of the clinician must change with it. Neither group could have accomplished this work without the expertise of the other.

Availability of data and materials

Not applicable.

Peery AF, Crockett SD, Murphy CC, Jensen ET, Kim HP, Egberg MD, Lund JL, Moon AM, Pate V, Barnes EL, et al. Burden and cost of gastrointestinal, liver, and pancreatic diseases in the United States: update 2021. Gastroenterology. 2022;162(2):621–44.

Article   PubMed   Google Scholar  

Suzuki M, Sai JK, Shimizu T. Acute pancreatitis in children and adolescents. World J Gastrointest Pathophysiol. 2014;5(4):416–26. https://doi.org/10.4291/wjgp.v5.i4.416.PMID:25400985;PMCID:PMC4231506 .

Article   PubMed   PubMed Central   Google Scholar  

Petrov MS, Yadav D. Global epidemiology and holistic prevention of pancreatitis. Nat Rev Gastroenterol Hepatol. 2019;16(3):175–84.

Xiao AY, Tan ML, Wu LM, Asrani VM, Windsor JA, Yadav D, Petrov MS. Global incidence and mortality of pancreatic diseases: a systematic review, meta-analysis, and meta-regression of population-based cohort studies. Lancet Gastroenterol Hepatol. 2016;1(1):45–55.

Abu-El-Haija M, Gukovskaya AS, Andersen DK, Gardner TB, Hegyi P, Pandol SJ, Papachristou GI, Saluja AK, Singh VK, Uc A, et al. Accelerating the drug delivery pipeline for acute and chronic pancreatitis: summary of the working group on drug development and trials in acute pancreatitis at the National Institute of Diabetes and Digestive and Kidney Diseases Workshop. Pancreas. 2018;47(10):1185–92.

Lee PJ, Papachristou GI. New insights into acute pancreatitis. Nat Rev GastroenterolHepatol. 2019;16(8):479–96.

Article   CAS   Google Scholar  

Mounzer R, Langmead CJ, Wu BU, Evans AC, Bishehsari F, Muddana V, Singh VK, Slivka A, Whitcomb DC, Yadav D, et al. Comparison of existing clinical scoring systems to predict persistent organ failure in patients with acute pancreatitis. Gastroenterology. 2012;142(7):1476–1476.

Zhou Y, Ge YT, Shi XL, Wu KY, Chen WW, Ding YB, Xiao WM, Wang D, Lu GT, Hu LH. Machine learning predictive models for acute pancreatitis: a systematic review. Int J Med Inform. 2022;157:104641.

Langmead C, Lee PJ, Paragomi P, Greer P, Stello K, Hart PA, Whitcomb DC, Papachristou GI. A novel 5-cytokine panel outperforms conventional predictive markers of persistent organ failure in acute pancreatitis. Clinical and translational gastroenterology. 2021;12(5):e00351–e00351.

Fei Y, Gao K, Li W-Q. Artificial neural network algorithm model as powerful tool to predict acute lung injury following to severe acute pancreatitis. Pancreatology : official journal of the International Association of Pancreatology (IAP). 2018;18(8):892–9.

Article   Google Scholar  

Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM, et al. Systematic review finds “spin” practices and poor reporting standards in studies on machine learning-based prediction models. J Clin Epidemiol. 2023;158:99–110.

Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, et al. Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review. J Clin Epidemiol. 2023;157:120–33.

Andaur Navarro CL, Damen JAA, van Smeden M, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, et al. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol. 2023;154:8–22.

van Smeden M, Heinze G, Van Calster B, Asselbergs FW, Vardas PE, Bruining N, de Jaegere P, Moore JH, Denaxas S, Boulesteix AL, Moons KGM. Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease. Eur Heart J. 2022;43(31):2921–30. https://doi.org/10.1093/eurheartj/ehac238.PMID:35639667;PMCID:PMC9443991 .

Dhiman P, Ma J, Andaur Navarro CL, et al. Risk of bias of prognostic models developed using machine learning: a systematic review in oncology. Diagn Progn Res. 2022;6:13. https://doi.org/10.1186/s41512-022-00126-w .

Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review. J Clin Epidemiol. 2023;157:120–33. https://doi.org/10.1016/j.jclinepi.2023.03.012 . (Epub 2023 Mar 17 PMID: 36935090).

Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, Reitsma JB, Kleijnen J, Mallett S. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Annals of internal medicine. 2019;170(1):51–8.

Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, Reitsma JB, Collins GS. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLOS Medicine. 2014;11(10):e1001744.

Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia. Available at www.covidence.org .

Langenhuijsen LFS, Janse RJ, Venema E, Kent DM, van Diepen M, Dekker FW, Steyerberg EW, de Jong Y. Systematic metareview of prediction studies demonstrates stable trends in bias and low PROBAST inter-rater agreement. J Clin Epidemiol. 2023;159:159–73.

Kaiser I, Pfahlberg AB, Mathes S, Uter W, Diehl K, Steeb T, Heppt MV, Gefeller O. Inter-rater agreement in assessing risk of bias in melanoma prediction studies using the Prediction Model Risk of Bias Assessment Tool (PROBAST): results from a controlled experiment on the effect of specific rater training. J Clin Med. 2023;12(5):1976.

TRIPOD+AI.  https://osf.io/yht3d .

Snell KIE, Levis B, Damen JAA, Dhiman P, Debray TPA, Hooft L, Reitsma JB, Moons KGM, Collins GS, Riley RD. Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (TRIPOD-SRMA). Bmj. 2023;381:e073538.

Iannuzzi JP, King JA, Leong JH, Quan J, Windsor JW, Tanyingoh D, Coward S, Forbes N, Heitman SJ, Shaheen A-A, et al. Global incidence of acute pancreatitis is increasing over time: a systematic review and meta-analysis. Gastroenterology. 2022;162(1):122–34.

Vege SS, DiMagno MJ, Forsmark CE, Martel M, Barkun AN. Initial medical treatment of acute pancreatitis: American Gastroenterological Association Institute technical review. Gastroenterology. 2018;154(4):1103–39.

Download references

Acknowledgements

Author information.

Amier Hassan and Brian Critelli are co-first authors.

Authors and Affiliations

Division of Gastroenterology and Hepatology, Weill Cornell Medical College, New York, USA

Amier Hassan, Brian Critelli, Ali Lahooti & Nate Matzko

Division of Gastroenterology and Hepatology, Ohio State University Wexner Medical Center, Columbus, OH, USA

Ila Lahooti, Kathleen Tong, Jun Sung Park, Georgios I. Papachristou & Peter J. Lee

Division of Process and Data Science, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen, Germany

Jan Niklas Adams & Lukas Liss

Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, USA

Justin Quion & David Restrepo

Division of Bioinformatics, Ohio State University Wexner Medical Center, Columbus, USA

Melica Nikahd & Stacey Culp

Northeast Ohio Medical School, Rootstown, USA

Division of Gastroenterology, Johns Hopkins Medical Center, Baltimore, USA

Venkata Akshintala

Department of Surgery, University of Auckland, Auckland, New Zealand

John A. Windsor & Leo Anthony Celi

Division of Hospital Medicine and Penn Medicine Center for Evidence-based Practice, University of Pennsylvania, Philadelphia, USA

Nikhil K. Mull

Division of Critical Care, Beth Israel Medical Center, Boston, USA

Leo Anthony Celi

You can also search for this author in PubMed   Google Scholar

Contributions

Amier Hassan—drafting of the manuscript, editing and proof reading of the manuscript; Brian Critelli—drafting of the manuscript, editing and proof reading of the manuscript; Ila Lahooti, Ali Lahooti, Nate Matzko, Jan Niklas-Adams, Lukas Liss, Justin Quion, David Restrepo, Melica Nikahd, Stacey Culp, Lydia Noh, Kathleen Tong, Jun Sung Park, Venkata Akshintala, John A. Windsor, Nikhil K. Mull, Georgios I Papachristou, and Leo Anthony Celi—direct editing and proof reading of the manuscript; Peter J. Lee—substantial drafting of the manuscript, editing and proof reading of the manuscript, including its submission form; all authors approved the manuscript for submission.

Corresponding author

Correspondence to Peter J. Lee .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hassan, A., Critelli, B., Lahooti, I. et al. Critical appraisal of machine learning prognostic models for acute pancreatitis: protocol for a systematic review. Diagn Progn Res 8 , 6 (2024). https://doi.org/10.1186/s41512-024-00169-1

Download citation

Received : 27 October 2023

Accepted : 15 February 2024

Published : 02 April 2024

DOI : https://doi.org/10.1186/s41512-024-00169-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Diagnostic and Prognostic Research

ISSN: 2397-7523

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

critical appraisal of research studies

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 08 April 2022

How to appraise the literature: basic principles for the busy clinician - part 1: randomised controlled trials

  • Aslam Alkadhimi 1 ,
  • Samuel Reeves 2 &
  • Andrew T. DiBiase 3  

British Dental Journal volume  232 ,  pages 475–481 ( 2022 ) Cite this article

808 Accesses

1 Citations

2 Altmetric

Metrics details

Critical appraisal is the process of carefully, judiciously and systematically examining research to adjudicate its trustworthiness and its value and relevance in clinical practice. The first part of this two-part series will discuss the principles of critically appraising randomised controlled trials. The second part will discuss the principles of critically appraising systematic reviews and meta-analyses.

Evidence-based dentistry (EBD) is the integration of the dentist's clinical expertise, the patient's needs and preferences and the most current, clinically relevant evidence. Critical appraisal of the literature is an invaluable and indispensable skill that dentists should possess to help them deliver EBD.

This article seeks to act as a refresher and guide for generalists, specialists and the wider readership, so that they can efficiently and confidently appraise research - specifically, randomised controlled trials - that may be pertinent to their daily clinical practice.

Evidence-based dentistry is discussed.

Efficient techniques for critically appraising randomised controlled trials are described.

Important methodological and statistical considerations are explicated.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 24 print issues and online access

251,40 € per year

only 10,48 € per issue

Buy this article

Purchase on Springer Link

Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

critical appraisal of research studies

Similar content being viewed by others

critical appraisal of research studies

An overview of clinical decision support systems: benefits, risks, and strategies for success

Reed T. Sutton, David Pincock, … Karen I. Kroeker

critical appraisal of research studies

Principal component analysis

Michael Greenacre, Patrick J. F. Groenen, … Elena Tuzhilina

critical appraisal of research studies

Digital twins for health: a scoping review

Evangelia Katsoulakis, Qi Wang, … Jun Deng

Burls A. What is critical appraisal? 2014. Available at http://www.whatisseries.co.uk/whatiscritical-appraisal/ (accessed April 2021).

Hong B, Plugge E. Critical appraisal skills teaching in UK dental schools. Br Dent J 2017; 222: 209-213.

Isham A, Bettiol S, Hoang H, Crocombe L. A Systematic Literature Review of the Information-Seeking Behaviour of Dentists in Developed Countries. J Dent Educ 2016; 80: 569-577.

Critical Appraisal Skills Programme. CASP Checklist. Available at https://casp-uk.net/wp-content/uploads/2018/01/CASP-Randomised-Controlled-Trial-Checklist-2018.pdf (accessed April 2021).

Schulz K F, Altman D G, Moher D, CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann Int Med 2010; 152 : 726-732.

Sterne J A C, Savović J, Page M J et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ 2019; DOI: 10.1136/bmj.l4898.

Petrou S, Grey A. Economic evaluation alongside randomised controlled trials: design, conduct, analysis, and reporting. BMJ 2011; DOI: 10.1136/bmj.d1548.

Black W C. The CE plane: a graphic representation of cost-effectiveness. Med Decis Making 1990; 10: 212-214.

Download references

Author information

Authors and affiliations.

Senior Registrar in Orthodontics, The Royal London Hospital Barts Health NHS Trust and East Kent Hospitals University NHS Foundation Trust, London, UK

Aslam Alkadhimi

Dental Core Trainee, East Kent Hospitals University NHS Foundation Trust, UK

Samuel Reeves

Consultant Orthodontist, East Kent Hospitals University NHS Foundation Trust, UK

Andrew T. DiBiase

You can also search for this author in PubMed   Google Scholar

Contributions

Aslam Alkadhimi contributed to conceptualisation, literature search, original draft preparation and drafting and critically revising the manuscript; Samuel Reeves contributed to original draft preparation and editing; and Andrew DiBiase contributed to supervision, draft editing and critically revising the manuscript.

Corresponding author

Correspondence to Aslam Alkadhimi .

Ethics declarations

The authors declare no competing interests.

Ethical approval and consent to participate did not apply to this study.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Alkadhimi, A., Reeves, S. & DiBiase, A. How to appraise the literature: basic principles for the busy clinician - part 1: randomised controlled trials. Br Dent J 232 , 475–481 (2022). https://doi.org/10.1038/s41415-022-4096-y

Download citation

Received : 31 January 2021

Accepted : 25 April 2021

Published : 08 April 2022

Issue Date : 08 April 2022

DOI : https://doi.org/10.1038/s41415-022-4096-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

critical appraisal of research studies

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10248995

Logo of sysrev

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table ​ (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI

a Data from https://www.cochranelibrary.com/cdsr/reviews . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. ​ Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4  for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Declarations

The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

IMAGES

  1. Critical Appraisal Guidelines for Single Case Study Research

    critical appraisal of research studies

  2. Example Of A Critical Analysis Of A Qualitative Study

    critical appraisal of research studies

  3. (PDF) Critical Appraisal of Clinical Research

    critical appraisal of research studies

  4. Critical Appraisal Guidelines for Single Case Study Research

    critical appraisal of research studies

  5. (PDF) JBI's Systematic Reviews: Study Selection and Critical Appraisal

    critical appraisal of research studies

  6. 10 Ultimate Steps: How to Critically Appraise an Article

    critical appraisal of research studies

VIDEO

  1. HS2405 AssessmentTask1 Group4 Maru

  2. Literature Review Tip 7 I Dr Dee

  3. Critical Appraisal Of Nutritional Epidemiological Studies

  4. Critical Appraisal of a Quantitative Research

  5. Critical Appraisal of Research NOV 23

  6. Critical appraisal of Research Papers and Protocols Testing Presence of Confounders GKSingh

COMMENTS

  1. A guide to critical appraisal of evidence : Nursing2020 Critical Care

    Critical appraisal is the assessment of research studies' worth to clinical practice. Critical appraisal—the heart of evidence-based practice—involves four phases: rapid critical appraisal, evaluation, synthesis, and recommendation. This article reviews each phase and provides examples, tips, and caveats to help evidence appraisers ...

  2. Critical Appraisal of Clinical Research

    Critical appraisal is essential to: Combat information overload; Identify papers that are clinically relevant; Continuing Professional Development (CPD). Carrying out Critical Appraisal: Assessing the research methods used in the study is a prime step in its critical appraisal.

  3. Critical Appraisal Tools and Reporting Guidelines

    More. Critical appraisal tools and reporting guidelines are the two most important instruments available to researchers and practitioners involved in research, evidence-based practice, and policymaking. Each of these instruments has unique characteristics, and both instruments play an essential role in evidence-based practice and decision-making.

  4. Full article: Critical appraisal

    What is critical appraisal? Critical appraisal involves a careful and systematic assessment of a study's trustworthiness or rigour (Booth et al., Citation 2016).A well-conducted critical appraisal: (a) is an explicit systematic, rather than an implicit haphazard, process; (b) involves judging a study on its methodological, ethical, and theoretical quality, and (c) is enhanced by a reviewer ...

  5. Critical appraisal of published research papers

    INTRODUCTION. Critical appraisal of a research paper is defined as "The process of carefully and systematically examining research to judge its trustworthiness, value and relevance in a particular context."[] Since scientific literature is rapidly expanding with more than 12,000 articles being added to the MEDLINE database per week,[] critical appraisal is very important to distinguish ...

  6. Systematic Reviews: Critical Appraisal by Study Design

    Tools for Critical Appraisal of Studies. "The purpose of critical appraisal is to determine the scientific merit of a research report and its applicability to clinical decision making."1 Conducting a critical appraisal of a study is imperative to any well executed evidence review, but the process can be time consuming and difficult.2 The ...

  7. Scientific writing: Critical Appraisal Toolkit (CAT) for assessing

    The first step in the critical appraisal of an individual study is to identify the study design; this can be surprisingly problematic, since many published research studies are complex. An algorithm was developed to help identify whether a study was an analytic study, a descriptive study or a literature review (see text box for definitions).

  8. Critical Appraisal: Assessing the Quality of Studies

    Critical appraisal is the balanced assessment of a piece of research, looking for its strengths and weaknesses and then coming to a balanced judgement about its trustworthiness and its suitability for use in a particular context. If this all seems a bit abstract, think of an essay that you submit to pass a course.

  9. PDF THE FUNDAMENTALS OF CRITICALLY APPRAISING AN ARTICLE

    control studies.3 ¾ ˛e research question should be concise ... and Stratford P W. Critical appraisal of the health research literature: prevalence or incidence of a health problem.

  10. The fundamentals of critically appraising an article

    Here are some of the tools and basic considerations you might find useful when critically appraising an article. In a nutshell when appraising an article, you are assessing: 1. Its relevance ...

  11. How to critically appraise an article

    Critical appraisal is a systematic process through which the strengths and weaknesses of a research study can be identified. This process enables the reader to assess the study's usefulness and ...

  12. Critical Appraisal

    "Critical appraisal is the process of carefully and systematically assessing the outcome of scientific research (evidence) to judge its trustworthiness, value and relevance in a particular context. Critical appraisal looks at the way a study is conducted and examines factors such as internal validity, generalizability and relevance."

  13. Evidence-Based Decision-Making 1: Critical Appraisal

    Evidence-based medicine has become synonymous with evidence-based healthcare and evidence-based practice. Critical appraisal is an important, albeit, one step in this process. Understanding the strengths and weaknesses and the quality of study designs and their inherent ability to provide high-grade evidence for health interventions will inform ...

  14. Critical Appraisal

    It's important to consider the impact that poorly designed studies could have on your findings and to rule out inaccurate or biased work. Selection of a valid critical appraisal tool, testing the tool with several of the selected studies, and involving two or more reviewers in the appraisal are good practices to follow. 1. Purssell E, McCrae N.

  15. Introduction

    Critical Appraisal of Studies. Critical appraisal is the process of carefully and systematically examining research to judge its trustworthiness, and its value/relevance in a particular context by providing a framework to evaluate the research. During the critical appraisal process, researchers can: Decide whether studies have been undertaken ...

  16. Critical Appraisal of Quantitative Research

    Abstract. Critical appraisal skills are important for anyone wishing to make informed decisions or improve the quality of healthcare delivery. A good critical appraisal provides information regarding the believability and usefulness of a particular study. However, the appraisal process is often overlooked, and critically appraising quantitative ...

  17. Research Guides: Critical Appraisal of Research Articles: Home

    In a retrospective cohort study, the exposure status was measured in the past and disease identification has already begun. Case-control Study. Studies that start by identifying persons with and without a disease of interest (cases and controls, respectively) and then look back in time to find differences in exposure to risk factors.

  18. Critical Appraisal of Research Articles: Systematic Reviews

    A systematic review is a review of a clearly formulated queston that uses systematic and explicit methods to identify, select, and critically appraise relevant research, and to collect and analyze data from studies that are included in the review.

  19. Evidence appraisal: a scoping review, conceptual framework, and

    Objective. Critical appraisal of clinical evidence promises to help prevent, detect, and address flaws related to study importance, ethics, validity, applicability, and reporting. These research issues are of growing concern. The purpose of this scoping review is to survey the current literature on evidence appraisal to develop a conceptual ...

  20. Critical appraisal skills are essential to informed decision-making

    Therefore, critical appraisal of the quality of clinical research is central to informed decision-making in healthcare. Critical appraisal is the process of carefully and systematically examining research evidence to judge its trustworthiness, its value and relevance in a particular context. It allows clinicians to use research evidence ...

  21. Critical appraisal of machine learning prognostic models for acute

    Acute pancreatitis (AP) is an acute inflammatory disorder that is common, costly, and is increasing in incidence worldwide with over 300,000 hospitalizations occurring yearly in the United States alone. As its course and outcomes vary widely, a critical knowledge gap in the field has been a lack of accurate prognostic tools to forecast AP patients' outcomes. Despite several published studies ...

  22. How to appraise the literature: basic principles for the busy clinician

    Critical appraisal is the process of carefully, judiciously and systematically examining research to adjudicate its trustworthiness and its value and relevance in clinical practice. The first part ...

  23. Guidance to best tools and practices for systematic reviews

    The JBI Critical Appraisal Checklist for Qualitative Research is recommended for reviews using a meta-aggregative approach i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence