Banner

Write a Critical Review of a Scientific Journal Article

1. identify how and why the research was carried out, 2. establish the research context, 3. evaluate the research, 4. establish the significance of the research.

  • Writing Your Critique

Ask Us: Chat, email, visit or call

Click to chat: contact the library

Video: How to Integrate Critical Voice into Your Literature Review

How to Integrate Critical Voice in Your Lit Review

Video: Note-taking and Writing Tips to Avoid Plagiarism

Note-taking and Writing Tips to Avoid Accidental Plagiarism

Get assistance

The library offers a range of helpful services.  All of our appointments are free of charge and confidential.

  • Book an appointment

Read the article(s) carefully and use the questions below to help you identify how and why the research was carried out. Look at the following sections: 

Introduction

  • What was the objective of the study?
  • What methods were used to accomplish this purpose (e.g., systematic recording of observations, analysis and evaluation of published research, assessment of theory, etc.)?
  • What techniques were used and how was each technique performed?
  • What kind of data can be obtained using each technique?
  • How are such data interpreted?
  • What kind of information is produced by using the technique?
  • What objective evidence was obtained from the authors’ efforts (observations, measurements, etc.)?
  • What were the results of the study? 
  • How was each technique used to obtain each result?
  • What statistical tests were used to evaluate the significance of the conclusions based on numeric or graphic data?
  • How did each result contribute to answering the question or testing the hypothesis raised in the introduction?
  • How were the results interpreted? How were they related to the original problem (authors’ view of evidence rather than objective findings)? 
  • Were the authors able to answer the question (test the hypothesis) raised?
  • Did the research provide new factual information, a new understanding of a phenomenon in the field, or a new research technique?
  • How was the significance of the work described?
  • Do the authors relate the findings of the study to literature in the field?
  • Did the reported observations or interpretations support or refute observations or interpretations made by other researchers?

These questions were adapted from the following sources:  Kuyper, B.J. (1991). Bringing up scientists in the art of critiquing research. Bioscience 41(4), 248-250. Wood, J.M. (2003). Research Lab Guide. MICR*3260 Microbial Adaptation and Development Web Site . Retrieved July 31, 2006.

Once you are familiar with the article, you can establish the research context by asking the following questions:

  • Who conducted the research? What were/are their interests?
  • When and where was the research conducted?
  • Why did the authors do this research?
  • Was this research pertinent only within the authors’ geographic locale, or did it have broader (even global) relevance?
  • Were many other laboratories pursuing related research when the reported work was done? If so, why?
  • For experimental research, what funding sources met the costs of the research?
  • On what prior observations was the research based? What was and was not known at the time?
  • How important was the research question posed by the researchers?

These questions were adapted from the following sources: Kuyper, B.J. (1991). Bringing up scientists in the art of critiquing research. Bioscience 41(4), 248-250. Wood, J.M. (2003). Research Lab Guide. MICR*3260 Microbial Adaptation and Development Web Site . Retrieved July 31, 2006.

Remember that simply disagreeing with the material is not considered to be a critical assessment of the material.  For example, stating that the sample size is insufficient is not a critical assessment.  Describing why the sample size is insufficient for the claims being made in the study would be a critical assessment.

Use the questions below to help you evaluate the quality of the authors’ research:

  • Does the title precisely state the subject of the paper?
  • Read the statement of purpose in the abstract. Does it match the one in the introduction?

Acknowledgments

  • Could the source of the research funding have influenced the research topic or conclusions?
  • Check the sequence of statements in the introduction. Does all the information lead coherently to the purpose of the study?
  • Review all methods in relation to the objective(s) of the study. Are the methods valid for studying the problem?
  • Check the methods for essential information. Could the study be duplicated from the methods and information given?
  • Check the methods for flaws. Is the sample selection adequate? Is the experimental design sound?
  • Check the sequence of statements in the methods. Does all the information belong there? Is the sequence of methods clear and pertinent?
  • Was there mention of ethics? Which research ethics board approved the study?
  • Carefully examine the data presented in the tables and diagrams. Does the title or legend accurately describe the content? 
  • Are column headings and labels accurate? 
  • Are the data organized for ready comparison and interpretation? (A table should be self-explanatory, with a title that accurately and concisely describes content and column headings that accurately describe information in the cells.)
  • Review the results as presented in the text while referring to the data in the tables and diagrams. Does the text complement, and not simply repeat data? Are there discrepancies between the results in the text and those in the tables?
  • Check all calculations and presentation of data.
  • Review the results in light of the stated objectives. Does the study reveal what the researchers intended?
  • Does the discussion clearly address the objectives and hypotheses?
  • Check the interpretation against the results. Does the discussion merely repeat the results? 
  • Does the interpretation arise logically from the data or is it too far-fetched? 
  • Have the faults, flaws, or shortcomings of the research been addressed?
  • Is the interpretation supported by other research cited in the study?
  • Does the study consider key studies in the field?
  • What is the significance of the research? Do the authors mention wider implications of the findings?
  • Is there a section on recommendations for future research? Are there other research possibilities or directions suggested? 

Consider the article as a whole

  • Reread the abstract. Does it accurately summarize the article?
  • Check the structure of the article (first headings and then paragraphing). Is all the material organized under the appropriate headings? Are sections divided logically into subsections or paragraphs?
  • Are stylistic concerns, logic, clarity, and economy of expression addressed?

These questions were adapted from the following sources:  Kuyper, B.J. (1991). Bringing up scientists in the art of critiquing research. Bioscience 41(4), 248-250. Wood, J.M. (2003). Research Lab Guide. MICR*3260 Microbial Adaptation and Development Web Site. Retrieved July 31, 2006.

After you have evaluated the research, consider whether the research has been successful. Has it led to new questions being asked, or new ways of using existing knowledge? Are other researchers citing this paper?

You should consider the following questions:

  • How did other researchers view the significance of the research reported by your authors?
  • Did the research reported in your article result in the formulation of new questions or hypotheses (by the authors or by other researchers)?
  • Have other researchers subsequently supported or refuted the observations or interpretations of these authors?
  • Did the research make a significant contribution to human knowledge?
  • Did the research produce any practical applications?
  • What are the social, political, technological, medical implications of this research?
  • How do you evaluate the significance of the research?

To answer these questions, look at review articles to find out how reviewers view this piece of research. Look at research articles and databases like Web of Science to see how other people have used this work. What range of journals have cited this article?

These questions were adapted from the following sources:

Kuyper, B.J. (1991). Bringing up scientists in the art of critiquing research. Bioscience 41(4), 248-250. Wood, J.M. (2003). Research Lab Guide. MICR*3260 Microbial Adaptation and Development Web Site . Retrieved July 31, 2006.

  • << Previous: Start Here
  • Next: Writing Your Critique >>
  • Last Updated: Jan 11, 2024 12:42 PM
  • URL: https://guides.lib.uoguelph.ca/WriteCriticalReview

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 31 January 2022

The fundamentals of critically appraising an article

  • Sneha Chotaliya 1  

BDJ Student volume  29 ,  pages 12–13 ( 2022 ) Cite this article

1957 Accesses

Metrics details

Sneha Chotaliya

We are often surrounded by an abundance of research and articles, but the quality and validity can vary massively. Not everything will be of a good quality - or even valid. An important part of reading a paper is first assessing the paper. This is a key skill for all healthcare professionals as anything we read can impact or influence our practice. It is also important to stay up to date with the latest research and findings.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

We are sorry, but there is no personal subscription option available for your country.

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Chambers R, 'Clinical Effectiveness Made Easy', Oxford: Radcliffe Medical Press , 1998

Loney P L, Chambers L W, Bennett K J, Roberts J G and Stratford P W. Critical appraisal of the health research literature: prevalence or incidence of a health problem. Chronic Dis Can 1998; 19 : 170-176.

Brice R. CASP CHECKLISTS - CASP - Critical Appraisal Skills Programme . 2021. Available at: https://casp-uk.net/casp-tools-checklists/ (Accessed 22 July 2021).

White S, Halter M, Hassenkamp A and Mein G. 2021. Critical Appraisal Techniques for Healthcare Literature . St George's, University of London.

Download references

Author information

Authors and affiliations.

Academic Foundation Dentist, London, UK

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Sneha Chotaliya .

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Chotaliya, S. The fundamentals of critically appraising an article. BDJ Student 29 , 12–13 (2022). https://doi.org/10.1038/s41406-021-0275-6

Download citation

Published : 31 January 2022

Issue Date : 31 January 2022

DOI : https://doi.org/10.1038/s41406-021-0275-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

how critically evaluate research paper

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Critically appraising...

Critically appraising qualitative research

  • Related content
  • Peer review
  • Ayelet Kuper , assistant professor 1 ,
  • Lorelei Lingard , associate professor 2 ,
  • Wendy Levinson , Sir John and Lady Eaton professor and chair 3
  • 1 Department of Medicine, Sunnybrook Health Sciences Centre, and Wilson Centre for Research in Education, University of Toronto, 2075 Bayview Avenue, Room HG 08, Toronto, ON, Canada M4N 3M5
  • 2 Department of Paediatrics and Wilson Centre for Research in Education, University of Toronto and SickKids Learning Institute; BMO Financial Group Professor in Health Professions Education Research, University Health Network, 200 Elizabeth Street, Eaton South 1-565, Toronto
  • 3 Department of Medicine, Sunnybrook Health Sciences Centre
  • Correspondence to: A Kuper ayelet94{at}post.harvard.edu

Six key questions will help readers to assess qualitative research

Summary points

Appraising qualitative research is different from appraising quantitative research

Qualitative research papers should show appropriate sampling, data collection, and data analysis

Transferability of qualitative research depends on context and may be enhanced by using theory

Ethics in qualitative research goes beyond review boards’ requirements to involve complex issues of confidentiality, reflexivity, and power

Over the past decade, readers of medical journals have gained skills in critically appraising studies to determine whether the results can be trusted and applied to their own practice settings. Criteria have been designed to assess studies that use quantitative methods, and these are now in common use.

In this article we offer guidance for readers on how to assess a study that uses qualitative research methods by providing six key questions to ask when reading qualitative research (box 1). However, the thorough assessment of qualitative research is an interpretive act and requires informed reflective thought rather than the simple application of a scoring system.

Box 1 Key questions to ask when reading qualitative research studies

Was the sample used in the study appropriate to its research question, were the data collected appropriately, were the data analysed appropriately, can i transfer the results of this study to my own setting, does the study adequately address potential ethical issues, including reflexivity.

Overall: is what the researchers did clear?

One of the critical decisions in a qualitative study is whom or what to include in the sample—whom to interview, whom to observe, what texts to analyse. An understanding that qualitative research is based in experience and in the construction of meaning, combined with the specific research question, should guide the sampling process. For example, a study of the experience of survivors of domestic violence that examined their reasons for not seeking help from healthcare providers might focus on interviewing a sample of such survivors (rather than, for example, healthcare providers, social services workers, or academics in the field). The sample should be broad enough to capture the many facets of a phenomenon, and limitations to the sample should be clearly justified. Since the answers to questions of experience and meaning also relate to people’s social affiliations (culture, religion, socioeconomic group, profession, etc), it is also important that the researcher acknowledges these contexts in the selection of a study sample.

In contrast with quantitative approaches, qualitative studies do not usually have predetermined sample sizes. Sampling stops when a thorough understanding of the phenomenon under study has been reached, an end point that is often called saturation. Researchers consider samples to be saturated when encounters (interviews, observations, etc) with new participants no longer elicit trends or themes not already raised by previous participants. Thus, to sample to saturation, data analysis has to happen while new data are still being collected. Multiple sampling methods may be used to broaden the understanding achieved in a study (box 2). These sampling issues should be clearly articulated in the methods section.

Box 2 Qualitative sampling methods for interviews and focus groups 9

Examples are for a hypothetical study of financial concerns among adult patients with chronic renal failure receiving ongoing haemodialysis in a single hospital outpatient unit.

Typical case sampling —sampling the most ordinary, usual cases of a phenomenon

The sample would include patients likely to have had typical experiences for that haemodialysis unit and patients who fit the profile of patients in the unit for factors found on literature review. Other typical cases could be found via snowball sampling (see below)

Deviant case sampling —sampling the most extreme cases of a phenomenon

The sample would include patients likely to have had different experiences of relevant aspects of haemodialysis. For example, if most patients in the unit are 60-70 years old and recently began haemodialysis for diabetic nephropathy, researchers might sample the unmarried university student in his 20s on haemodialysis since childhood, the 32 year old woman with lupus who is now trying to get pregnant, and the 90 year old who newly started haemodialysis due to an adverse reaction to radio-opaque contrast dye. Other deviant cases could be found via theoretical and/or snowball sampling (see below)

Critical case sampling —sampling cases that are predicted (based on theoretical models or previous research) to be especially information-rich and thus particularly illuminating

The nature of this sample depends on previous research. For example, if research showed that marital status was a major determinant of financial concerns for haemodialysis patients, then critical cases might include patients whose marital status changed while on haemodialysis

Maximum-variation sampling —sampling as wide a range of perspectives as possible to capture the broadest set of information and experiences)

The sample would include typical, deviant, and critical cases (as above), plus any other perspectives identified

Confirming-disconfirming sampling —Sampling both individuals or texts whose perspectives are likely to confirm the researcher’s developing understanding of the phenomenon under study and those whose perspectives are likely to challenge that understanding

The sample would include patients whose experiences would likely either confirm or disconfirm what the researchers had already learnt (from other patients) about financial concerns among patients in the haemodialysis unit. This could be accomplished via theoretical and/or snowball sampling (see below)

Snowball sampling —sampling participants found by asking current participants in a study to recommend others whose experiences would be relevant to the study

Current participants could be asked to provide the names of others in the unit who they thought, when asked about financial concerns, would either share their views (confirming), disagree with their views (disconfirming), have views typical of patients on their unit (typical cases), or have views different from most other patients on their unit (deviant cases)

Theoretical sampling —sampling individuals or texts whom the researchers predict (based on theoretical models or previous research) would add new perspectives to those already represented in the sample

Researchers could use their understanding of known issues for haemodialysis patients that would, in theory, relate to financial concerns to ensure that the relevant perspectives were represented in the study. For example, if, as the research progressed, it turned out that none of the patients in the sample had had to change or leave a job in order to accommodate haemodialysis scheduling, the researchers might (based on previous research) choose to intentionally sample patients who had left their jobs because of the time commitment of haemodialysis (but who could not do peritoneal dialysis) and others who had switched to jobs with more flexible scheduling because of their need for haemodialysis

It is important that a qualitative study carefully describes the methods used in collecting data. The appropriateness of the method(s) selected to use for the specific research question should be justified, ideally with reference to the research literature. It should be clear that methods were used systematically and in an organised manner. Attention should be paid to specific methodological challenges such as the Hawthorne effect, 1 whereby the presence of an observer may influence participants’ behaviours. By using a technique called thick description, qualitative studies often aim to include enough contextual information to provide readers with a sense of what it was like to have been in the research setting.

Another technique that is often used is triangulation, with which a researcher uses multiple methods or perspectives to help produce a more comprehensive set of findings. A study can triangulate data, using different sources of data to examine a phenomenon in different contexts (for example, interviewing palliative patients who are at home, those who are in acute care hospitals, and those who are in specialist palliative care units); it can also triangulate methods, collecting different types of data (for example, interviews, focus groups, observations) to increase insight into a phenomenon.

Another common technique is the use of an iterative process, whereby concurrent data analysis is used to inform data collection. For example, concurrent analysis of an interview study about lack of adherence to medications among a particular social group might show that early participants seem to be dismissive of the efforts of their local pharmacists; the interview script might then be changed to include an exploration of this phenomenon. The iterative process constitutes a distinctive qualitative tradition, in contrast to the tradition of stable processes and measures in quantitative studies. Iterations should be explicit and justified with reference to the research question and sampling techniques so that the reader understands how data collection shaped the resulting insights.

Qualitative studies should include a clear description of a systematic form of data analysis. Many legitimate analytical approaches exist; regardless of which is used, the study should report what was done, how, and by whom. If an iterative process was used, it should be clearly delineated. If more than one researcher analysed the data (which depends on the methodology used) it should be clear how differences between analyses were negotiated. Many studies make reference to a technique called member checking, wherein the researcher shows all or part of the study’s findings to participants to determine if they are in accord with their experiences. 2 Studies may also describe an audit trail, which might include researchers’ analysis notes, minutes of researchers’ meetings, and other materials that could be used to follow the research process.

The contextual nature of qualitative research means that careful thought must be given to the potential transferability of its results to other sociocultural settings. Though the study should discuss the extent of the findings’ resonance with the published literature, 3 much of the onus of assessing transferability is left to readers, who must decide if the setting of the study is sufficiently similar for its results to be transferable to their own context. In doing so, the reader looks for resonance—the extent that research findings have meaning for the reader.

Transferability may be helped by the study’s discussion of how its results advance theoretical understandings that are relevant to multiple situations. For example, a study of patients’ preferences in palliative care may contribute to theories of ethics and humanity in medicine, thus suggesting relevance to other clinical situations such as the informed consent exchange before treatment. We have explained elsewhere in this series the importance of theory in qualitative research, and there are many who believe that a key indicator of quality in qualitative research is its contribution to advancing theoretical understanding as well as useful knowledge. This debate continues in the literature, 4 but from a pragmatic perspective most qualitative studies in health professions journals emphasise results that relate to practice; theoretical discussions tend to be published elsewhere.

Reflexivity is particularly important within the qualitative paradigm. Reflexivity refers to recognition of the influence a researcher brings to the research process. It highlights potential power relationships between the researcher and research participants that might shape the data being collected, particularly when the researcher is a healthcare professional or educator and the participant is a patient, client, or student. 5 It also acknowledges how a researcher’s gender, ethnic background, profession, and social status influence the choices made within the study, such as the research question itself and the methods of data collection. 6 7

Research articles written in the qualitative paradigm should show evidence both of reflexive practice and of consideration of other relevant ethical issues. Ethics in qualitative research should extend beyond prescriptive guidelines and research ethics boards into a thorough exploration of the ethical consequences of collecting personal experiences and opening those experiences to public scrutiny (a detailed discussion of this problem within a research report may, however, be limited by the practicalities of word count limitations). 8 Issues of confidentiality and anonymity can become quite complex when data constitute personal reports of experience or perception; the need to minimise harm may involve not only protection from external scrutiny but also mechanisms to mitigate potential distress to participants from sharing their personal stories.

In conclusion: is what the researchers did clear?

The qualitative paradigm includes a wide range of theoretical and methodological options, and qualitative studies must include clear descriptions of how they were conducted, including the selection of the study sample, the data collection methods, and the analysis process. The list of key questions for beginning readers to ask when reading qualitative research articles (see box 1) is intended not as a finite checklist, but rather as a beginner’s guide to a complex topic. Critical appraisal of particular qualitative articles may differ according to the theories and methodologies used, and achieving a nuanced understanding in this area is fairly complex.

Further reading

Crabtree F, Miller WL, eds. Doing qualitative research . 2nd ed. Thousand Oaks, CA: Sage, 1999.

Denzin NK, Lincoln YS, eds. Handbook of qualitative research . 2nd ed. Thousand Oaks, CA: Sage, 2000.

Finlay L, Ballinger C, eds. Qualitative research for allied health professionals: challenging choices . Chichester: Wiley, 2006.

Flick U. An introduction to qualitative research . 2nd ed. London: Sage, 2002.

Green J, Thorogood N. Qualitative methods for health research . London: Sage, 2004.

Lingard L, Kennedy TJ. Qualitative research in medical education . Edinburgh: Association for the Study of Medical Education, 2007.

Mauthner M, Birch M, Jessop J, Miller T, eds. Ethics in Qualitative Research . Thousand Oaks, CA: Sage, 2002.

Seale C. The quality of qualitative research . London: Sage, 1999.

Silverman D. Doing qualitative research . Thousand Oaks, CA: Sage, 2000.

Journal articles

Greenhalgh T. How to read a paper: papers that go beyond numbers. BMJ 1997;315:740-3.

Mays N, Pope C. Qualitative research: Rigour and qualitative research. BMJ 1995;311:109-12.

Mays N, Pope C. Qualitative research in health care: assessing quality in qualitative research. BMJ 2000;320:50-2.

Popay J, Rogers A, Williams G. Rationale and standards for the systematic review of qualitative literature in health services research. Qual Health Res 1998;8:341-51.

Internet resources

National Health Service Public Health Resource Unit. Critical appraisal skills programme: qualitative research appraisal tool . 2006. www.phru.nhs.uk/Doc_Links/Qualitative%20Appraisal%20Tool.pdf

Cite this as: BMJ 2008;337:a1035

  • Related to doi: , 10.1136/bmj.a288
  • doi: , 10.1136/bmj.39602.690162.47
  • doi: , 10.1136/bmj.a1020
  • doi: , 10.1136/bmj.a879
  • doi: 10.1136/bmj.a949

This is the last in a series of six articles that aim to help readers to critically appraise the increasing number of qualitative research articles in clinical journals. The series editors are Ayelet Kuper and Scott Reeves.

For a definition of general terms relating to qualitative research, see the first article in this series.

Contributors: AK wrote the first draft of the article and collated comments for subsequent iterations. LL and WL made substantial contributions to the structure and content, provided examples, and gave feedback on successive drafts. AK is the guarantor.

Funding: None.

Competing interests: None declared.

Provenance and peer review: Commissioned; externally peer reviewed.

  • ↵ Holden JD. Hawthorne effects and research into professional practice. J Evaluation Clin Pract 2001 ; 7 : 65 -70. OpenUrl CrossRef PubMed Web of Science
  • ↵ Hammersley M, Atkinson P. Ethnography: principles in practice . 2nd ed. London: Routledge, 1995 .
  • ↵ Silverman D. Doing qualitative research . Thousand Oaks, CA: Sage, 2000 .
  • ↵ Mays N, Pope C. Qualitative research in health care: assessing quality in qualitative research. BMJ 2000 ; 320 : 50 -2. OpenUrl FREE Full Text
  • ↵ Lingard L, Kennedy TJ. Qualitative research in medical education . Edinburgh: Association for the Study of Medical Education, 2007 .
  • ↵ Seale C. The quality of qualitative research . London: Sage, 1999 .
  • ↵ Wallerstein N. Power between evaluator and community: research relationships within New Mexico’s healthier communities. Soc Sci Med 1999 ; 49 : 39 -54. OpenUrl CrossRef PubMed Web of Science
  • ↵ Mauthner M, Birch M, Jessop J, Miller T, eds. Ethics in qualitative research . Thousand Oaks, CA: Sage, 2002 .
  • ↵ Kuzel AJ. Sampling in qualitative inquiry. In: Crabtree F, Miller WL, eds. Doing qualitative research . 2nd ed. Thousand Oaks, CA: Sage, 1999 :33-45.

how critically evaluate research paper

how critically evaluate research paper

  • University of Oregon Libraries
  • Research Guides

How to Write a Literature Review

  • 5. Critically Analyze and Evaluate
  • Literature Reviews: A Recap
  • Reading Journal Articles
  • Does it Describe a Literature Review?
  • 1. Identify the Question
  • 2. Review Discipline Styles
  • Searching Article Databases
  • Finding Full-Text of an Article
  • Citation Chaining
  • When to Stop Searching
  • 4. Manage Your References

Critically analyze and evaluate

Tip: read and annotate pdfs.

  • 6. Synthesize
  • 7. Write a Literature Review

Chat

Ask yourself questions like these about each book or article you include:

  • What is the research question?
  • What is the primary methodology used?
  • How was the data gathered?
  • How is the data presented?
  • What are the main conclusions?
  • Are these conclusions reasonable?
  • What theories are used to support the researcher's conclusions?

Take notes on the articles as you read them and identify any themes or concepts that may apply to your research question.

This sample template (below) may also be useful for critically reading and organizing your articles. Or you can use this online form and email yourself a copy .

  • Sample Template for Critical Analysis of the Literature

Opening an article in PDF format in Acrobat Reader will allow you to use "sticky notes" and "highlighting" to make notes on the article without printing it out. Make sure to save the edited file so you don't lose your notes!

Some Citation Managers like Mendeley also have highlighting and annotation features.Here's a screen capture of a pdf in Mendeley with highlighting, notes, and various colors:

Screen capture of Mendeley desktop showing note, highlight, and color tools. Tips include adding notes and highlighting, and using different colors for other purposes like quotations

Screen capture from a UO Librarian's Mendeley Desktop app

  • Learn more about citation management software in the previous step: 4. Manage Your References
  • << Previous: 4. Manage Your References
  • Next: 6. Synthesize >>
  • Last Updated: Jan 10, 2024 4:46 PM
  • URL: https://researchguides.uoregon.edu/litreview

Contact Us Library Accessibility UO Libraries Privacy Notices and Procedures

Make a Gift

1501 Kincaid Street Eugene, OR 97403 P: 541-346-3053 F: 541-346-3485

  • Visit us on Facebook
  • Visit us on Twitter
  • Visit us on Youtube
  • Visit us on Instagram
  • Report a Concern
  • Nondiscrimination and Title IX
  • Accessibility
  • Privacy Policy
  • Find People

how critically evaluate research paper

  • The Open University
  • Guest user / Sign out
  • Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

About this free course

Become an ou student, download this course, share this free course.

Succeeding in postgraduate study

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1 Important points to consider when critically evaluating published research papers

Simple review articles (also referred to as ‘narrative’ or ‘selective’ reviews), systematic reviews and meta-analyses provide rapid overviews and ‘snapshots’ of progress made within a field, summarising a given topic or research area. They can serve as useful guides, or as current and comprehensive ‘sources’ of information, and can act as a point of reference to relevant primary research studies within a given scientific area. Narrative or systematic reviews are often used as a first step towards a more detailed investigation of a topic or a specific enquiry (a hypothesis or research question), or to establish critical awareness of a rapidly-moving field (you will be required to demonstrate this as part of an assignment, an essay or a dissertation at postgraduate level).

The majority of primary ‘empirical’ research papers essentially follow the same structure (abbreviated here as IMRAD). There is a section on Introduction, followed by the Methods, then the Results, which includes figures and tables showing data described in the paper, and a Discussion. The paper typically ends with a Conclusion, and References and Acknowledgements sections.

The Title of the paper provides a concise first impression. The Abstract follows the basic structure of the extended article. It provides an ‘accessible’ and concise summary of the aims, methods, results and conclusions. The Introduction provides useful background information and context, and typically outlines the aims and objectives of the study. The Abstract can serve as a useful summary of the paper, presenting the purpose, scope and major findings. However, simply reading the abstract alone is not a substitute for critically reading the whole article. To really get a good understanding and to be able to critically evaluate a research study, it is necessary to read on.

While most research papers follow the above format, variations do exist. For example, the results and discussion sections may be combined. In some journals the materials and methods may follow the discussion, and in two of the most widely read journals, Science and Nature, the format does vary from the above due to restrictions on the length of articles. In addition, there may be supporting documents that accompany a paper, including supplementary materials such as supporting data, tables, figures, videos and so on. There may also be commentaries or editorials associated with a topical research paper, which provide an overview or critique of the study being presented.

Box 1 Key questions to ask when appraising a research paper

  • Is the study’s research question relevant?
  • Does the study add anything new to current knowledge and understanding?
  • Does the study test a stated hypothesis?
  • Is the design of the study appropriate to the research question?
  • Do the study methods address key potential sources of bias?
  • Were suitable ‘controls’ included in the study?
  • Were the statistical analyses appropriate and applied correctly?
  • Is there a clear statement of findings?
  • Does the data support the authors’ conclusions?
  • Are there any conflicts of interest or ethical concerns?

There are various strategies used in reading a scientific research paper, and one of these is to start with the title and the abstract, then look at the figures and tables, and move on to the introduction, before turning to the results and discussion, and finally, interrogating the methods.

Another strategy (outlined below) is to begin with the abstract and then the discussion, take a look at the methods, and then the results section (including any relevant tables and figures), before moving on to look more closely at the discussion and, finally, the conclusion. You should choose a strategy that works best for you. However, asking the ‘right’ questions is a central feature of critical appraisal, as with any enquiry, so where should you begin? Here are some critical questions to consider when evaluating a research paper.

Look at the Abstract and then the Discussion : Are these accessible and of general relevance or are they detailed, with far-reaching conclusions? Is it clear why the study was undertaken? Why are the conclusions important? Does the study add anything new to current knowledge and understanding? The reasons why a particular study design or statistical method were chosen should also be clear from reading a research paper. What is the research question being asked? Does the study test a stated hypothesis? Is the design of the study appropriate to the research question? Have the authors considered the limitations of their study and have they discussed these in context?

Take a look at the Methods : Were there any practical difficulties that could have compromised the study or its implementation? Were these considered in the protocol? Were there any missing values and, if so, was the number of missing values too large to permit meaningful analysis? Was the number of samples (cases or participants) too small to establish meaningful significance? Do the study methods address key potential sources of bias? Were suitable ‘controls’ included in the study? If controls are missing or not appropriate to the study design, we cannot be confident that the results really show what is happening in an experiment. Were the statistical analyses appropriate and applied correctly? Do the authors point out the limitations of methods or tests used? Were the methods referenced and described in sufficient detail for others to repeat or extend the study?

Take a look at the Results section and relevant tables and figures : Is there a clear statement of findings? Were the results expected? Do they make sense? What data supports them? Do the tables and figures clearly describe the data (highlighting trends etc.)? Try to distinguish between what the data show and what the authors say they show (i.e. their interpretation).

Moving on to look in greater depth at the Discussion and Conclusion : Are the results discussed in relation to similar (previous) studies? Do the authors indulge in excessive speculation? Are limitations of the study adequately addressed? Were the objectives of the study met and the hypothesis supported or refuted (and is a clear explanation provided)? Does the data support the authors’ conclusions? Maybe there is only one experiment to support a point. More often, several different experiments or approaches combine to support a particular conclusion. A rule of thumb here is that if multiple approaches and multiple lines of evidence from different directions are presented, and all point to the same conclusion, then the conclusions are more credible. But do question all assumptions. Identify any implicit or hidden assumptions that the authors may have used when interpreting their data. Be wary of data that is mixed up with interpretation and speculation! Remember, just because it is published, does not mean that it is right.

O ther points you should consider when evaluating a research paper : Are there any financial, ethical or other conflicts of interest associated with the study, its authors and sponsors? Are there ethical concerns with the study itself? Looking at the references, consider if the authors have preferentially cited their own previous publications (i.e. needlessly), and whether the list of references are recent (ensuring that the analysis is up-to-date). Finally, from a practical perspective, you should move beyond the text of a research paper, talk to your peers about it, consult available commentaries, online links to references and other external sources to help clarify any aspects you don’t understand.

The above can be taken as a general guide to help you begin to critically evaluate a scientific research paper, but only in the broadest sense. Do bear in mind that the way that research evidence is critiqued will also differ slightly according to the type of study being appraised, whether observational or experimental, and each study will have additional aspects that would need to be evaluated separately. For criteria recommended for the evaluation of qualitative research papers, see the article by Mildred Blaxter (1996), available online. Details are in the References.

Activity 1 Critical appraisal of a scientific research paper

A critical appraisal checklist, which you can download via the link below, can act as a useful tool to help you to interrogate research papers. The checklist is divided into four sections, broadly covering:

  • some general aspects
  • research design and methodology
  • the results
  • discussion, conclusion and references.

Science perspective – critical appraisal checklist [ Tip: hold Ctrl and click a link to open it in a new tab. ( Hide tip ) ]

  • Identify and obtain a research article based on a topic of your own choosing, using a search engine such as Google Scholar or PubMed (for example).
  • The selection criteria for your target paper are as follows: the article must be an open access primary research paper (not a review) containing empirical data, published in the last 2–3 years, and preferably no more than 5–6 pages in length.
  • Critically evaluate the research paper using the checklist provided, making notes on the key points and your overall impression.

Critical appraisal checklists are useful tools to help assess the quality of a study. Assessment of various factors, including the importance of the research question, the design and methodology of a study, the validity of the results and their usefulness (application or relevance), the legitimacy of the conclusions, and any potential conflicts of interest, are an important part of the critical appraisal process. Limitations and further improvements can then be considered.

Previous

How to read a paper, critical review

Reading a scientific article is a complex task. The worst way to approach this task is to treat it like the reading of a textbook—reading from title to literature cited, digesting every word along the way without any reflection or criticism.

A critical review (sometimes called a critique, critical commentary, critical appraisal, critical analysis) is a detailed commentary on and critical evaluation of a text. You might carry out a critical review as a stand-alone exercise, or as part of your research and preparation for writing a literature review. The following guidelines are designed to help you critically evaluate a research article.

How to Read a Scientific Article

You should begin by skimming the article to identify its structure and features. As you read, look for the author’s main points.

  • Generate questions before, during, and after reading.
  • Draw inferences based on your own experiences and knowledge.
  • To really improve understanding and recall, take notes as you read.

What is meant by critical and evaluation?

  • To be critical does not mean to criticise in an exclusively negative manner.   To be critical of a text means you question the information and opinions in the text, in an attempt to evaluate or judge its worth overall.
  • An evaluation is an assessment of the strengths and weaknesses of a text.   This should relate to specific criteria, in the case of a research article.   You have to understand the purpose of each section, and be aware of the type of information and evidence that are needed to make it convincing, before you can judge its overall value to the research article as a whole.

Useful Downloads

  • How to read a scientific paper
  • How to conduct a critical review

Critically Analyzing Information Sources: Critical Appraisal and Analysis

  • Critical Appraisal and Analysis

Initial Appraisal : Reviewing the source

  • What are the author's credentials--institutional affiliation (where he or she works), educational background, past writings, or experience? Is the book or article written on a topic in the author's area of expertise? You can use the various Who's Who publications for the U.S. and other countries and for specific subjects and the biographical information located in the publication itself to help determine the author's affiliation and credentials.
  • Has your instructor mentioned this author? Have you seen the author's name cited in other sources or bibliographies? Respected authors are cited frequently by other scholars. For this reason, always note those names that appear in many different sources.
  • Is the author associated with a reputable institution or organization? What are the basic values or goals of the organization or institution?

B. Date of Publication

  • When was the source published? This date is often located on the face of the title page below the name of the publisher. If it is not there, look for the copyright date on the reverse of the title page. On Web pages, the date of the last revision is usually at the bottom of the home page, sometimes every page.
  • Is the source current or out-of-date for your topic? Topic areas of continuing and rapid development, such as the sciences, demand more current information. On the other hand, topics in the humanities often require material that was written many years ago. At the other extreme, some news sources on the Web now note the hour and minute that articles are posted on their site.

C. Edition or Revision

Is this a first edition of this publication or not? Further editions indicate a source has been revised and updated to reflect changes in knowledge, include omissions, and harmonize with its intended reader's needs. Also, many printings or editions may indicate that the work has become a standard source in the area and is reliable. If you are using a Web source, do the pages indicate revision dates?

D. Publisher

Note the publisher. If the source is published by a university press, it is likely to be scholarly. Although the fact that the publisher is reputable does not necessarily guarantee quality, it does show that the publisher may have high regard for the source being published.

E. Title of Journal

Is this a scholarly or a popular journal? This distinction is important because it indicates different levels of complexity in conveying ideas. If you need help in determining the type of journal, see Distinguishing Scholarly from Non-Scholarly Periodicals . Or you may wish to check your journal title in the latest edition of Katz's Magazines for Libraries (Olin Reference Z 6941 .K21, shelved at the reference desk) for a brief evaluative description.

Critical Analysis of the Content

Having made an initial appraisal, you should now examine the body of the source. Read the preface to determine the author's intentions for the book. Scan the table of contents and the index to get a broad overview of the material it covers. Note whether bibliographies are included. Read the chapters that specifically address your topic. Reading the article abstract and scanning the table of contents of a journal or magazine issue is also useful. As with books, the presence and quality of a bibliography at the end of the article may reflect the care with which the authors have prepared their work.

A. Intended Audience

What type of audience is the author addressing? Is the publication aimed at a specialized or a general audience? Is this source too elementary, too technical, too advanced, or just right for your needs?

B. Objective Reasoning

  • Is the information covered fact, opinion, or propaganda? It is not always easy to separate fact from opinion. Facts can usually be verified; opinions, though they may be based on factual information, evolve from the interpretation of facts. Skilled writers can make you think their interpretations are facts.
  • Does the information appear to be valid and well-researched, or is it questionable and unsupported by evidence? Assumptions should be reasonable. Note errors or omissions.
  • Are the ideas and arguments advanced more or less in line with other works you have read on the same topic? The more radically an author departs from the views of others in the same field, the more carefully and critically you should scrutinize his or her ideas.
  • Is the author's point of view objective and impartial? Is the language free of emotion-arousing words and bias?

C. Coverage

  • Does the work update other sources, substantiate other materials you have read, or add new information? Does it extensively or marginally cover your topic? You should explore enough sources to obtain a variety of viewpoints.
  • Is the material primary or secondary in nature? Primary sources are the raw material of the research process. Secondary sources are based on primary sources. For example, if you were researching Konrad Adenauer's role in rebuilding West Germany after World War II, Adenauer's own writings would be one of many primary sources available on this topic. Others might include relevant government documents and contemporary German newspaper articles. Scholars use this primary material to help generate historical interpretations--a secondary source. Books, encyclopedia articles, and scholarly journal articles about Adenauer's role are considered secondary sources. In the sciences, journal articles and conference proceedings written by experimenters reporting the results of their research are primary documents. Choose both primary and secondary sources when you have the opportunity.

D. Writing Style

Is the publication organized logically? Are the main points clearly presented? Do you find the text easy to read, or is it stilted or choppy? Is the author's argument repetitive?

E. Evaluative Reviews

  • Locate critical reviews of books in a reviewing source , such as the Articles & Full Text , Book Review Index , Book Review Digest, and ProQuest Research Library . Is the review positive? Is the book under review considered a valuable contribution to the field? Does the reviewer mention other books that might be better? If so, locate these sources for more information on your topic.
  • Do the various reviewers agree on the value or attributes of the book or has it aroused controversy among the critics?
  • For Web sites, consider consulting this evaluation source from UC Berkeley .

Permissions Information

If you wish to use or adapt any or all of the content of this Guide go to Cornell Library's Research Guides Use Conditions to review our use permissions and our Creative Commons license.

  • Next: Tips >>
  • Last Updated: Apr 18, 2022 1:43 PM
  • URL: https://guides.library.cornell.edu/critically_analyzing

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Clin Diagn Res
  • v.11(5); 2017 May

Critical Appraisal of Clinical Research

Azzam al-jundi.

1 Professor, Department of Orthodontics, King Saud bin Abdul Aziz University for Health Sciences-College of Dentistry, Riyadh, Kingdom of Saudi Arabia.

Salah Sakka

2 Associate Professor, Department of Oral and Maxillofacial Surgery, Al Farabi Dental College, Riyadh, KSA.

Evidence-based practice is the integration of individual clinical expertise with the best available external clinical evidence from systematic research and patient’s values and expectations into the decision making process for patient care. It is a fundamental skill to be able to identify and appraise the best available evidence in order to integrate it with your own clinical experience and patients values. The aim of this article is to provide a robust and simple process for assessing the credibility of articles and their value to your clinical practice.

Introduction

Decisions related to patient value and care is carefully made following an essential process of integration of the best existing evidence, clinical experience and patient preference. Critical appraisal is the course of action for watchfully and systematically examining research to assess its reliability, value and relevance in order to direct professionals in their vital clinical decision making [ 1 ].

Critical appraisal is essential to:

  • Combat information overload;
  • Identify papers that are clinically relevant;
  • Continuing Professional Development (CPD).

Carrying out Critical Appraisal:

Assessing the research methods used in the study is a prime step in its critical appraisal. This is done using checklists which are specific to the study design.

Standard Common Questions:

  • What is the research question?
  • What is the study type (design)?
  • Selection issues.
  • What are the outcome factors and how are they measured?
  • What are the study factors and how are they measured?
  • What important potential confounders are considered?
  • What is the statistical method used in the study?
  • Statistical results.
  • What conclusions did the authors reach about the research question?
  • Are ethical issues considered?

The Critical Appraisal starts by double checking the following main sections:

I. Overview of the paper:

  • The publishing journal and the year
  • The article title: Does it state key trial objectives?
  • The author (s) and their institution (s)

The presence of a peer review process in journal acceptance protocols also adds robustness to the assessment criteria for research papers and hence would indicate a reduced likelihood of publication of poor quality research. Other areas to consider may include authors’ declarations of interest and potential market bias. Attention should be paid to any declared funding or the issue of a research grant, in order to check for a conflict of interest [ 2 ].

II. ABSTRACT: Reading the abstract is a quick way of getting to know the article and its purpose, major procedures and methods, main findings, and conclusions.

  • Aim of the study: It should be well and clearly written.
  • Materials and Methods: The study design and type of groups, type of randomization process, sample size, gender, age, and procedure rendered to each group and measuring tool(s) should be evidently mentioned.
  • Results: The measured variables with their statistical analysis and significance.
  • Conclusion: It must clearly answer the question of interest.

III. Introduction/Background section:

An excellent introduction will thoroughly include references to earlier work related to the area under discussion and express the importance and limitations of what is previously acknowledged [ 2 ].

-Why this study is considered necessary? What is the purpose of this study? Was the purpose identified before the study or a chance result revealed as part of ‘data searching?’

-What has been already achieved and how does this study be at variance?

-Does the scientific approach outline the advantages along with possible drawbacks associated with the intervention or observations?

IV. Methods and Materials section : Full details on how the study was actually carried out should be mentioned. Precise information is given on the study design, the population, the sample size and the interventions presented. All measurements approaches should be clearly stated [ 3 ].

V. Results section : This section should clearly reveal what actually occur to the subjects. The results might contain raw data and explain the statistical analysis. These can be shown in related tables, diagrams and graphs.

VI. Discussion section : This section should include an absolute comparison of what is already identified in the topic of interest and the clinical relevance of what has been newly established. A discussion on a possible related limitations and necessitation for further studies should also be indicated.

Does it summarize the main findings of the study and relate them to any deficiencies in the study design or problems in the conduct of the study? (This is called intention to treat analysis).

  • Does it address any source of potential bias?
  • Are interpretations consistent with the results?
  • How are null findings interpreted?
  • Does it mention how do the findings of this study relate to previous work in the area?
  • Can they be generalized (external validity)?
  • Does it mention their clinical implications/applicability?
  • What are the results/outcomes/findings applicable to and will they affect a clinical practice?
  • Does the conclusion answer the study question?
  • -Is the conclusion convincing?
  • -Does the paper indicate ethics approval?
  • -Can you identify potential ethical issues?
  • -Do the results apply to the population in which you are interested?
  • -Will you use the results of the study?

Once you have answered the preliminary and key questions and identified the research method used, you can incorporate specific questions related to each method into your appraisal process or checklist.

1-What is the research question?

For a study to gain value, it should address a significant problem within the healthcare and provide new or meaningful results. Useful structure for assessing the problem addressed in the article is the Problem Intervention Comparison Outcome (PICO) method [ 3 ].

P = Patient or problem: Patient/Problem/Population:

It involves identifying if the research has a focused question. What is the chief complaint?

E.g.,: Disease status, previous ailments, current medications etc.,

I = Intervention: Appropriately and clearly stated management strategy e.g.,: new diagnostic test, treatment, adjunctive therapy etc.,

C= Comparison: A suitable control or alternative

E.g.,: specific and limited to one alternative choice.

O= Outcomes: The desired results or patient related consequences have to be identified. e.g.,: eliminating symptoms, improving function, esthetics etc.,

The clinical question determines which study designs are appropriate. There are five broad categories of clinical questions, as shown in [ Table/Fig-1 ].

[Table/Fig-1]:

Categories of clinical questions and the related study designs.

2- What is the study type (design)?

The study design of the research is fundamental to the usefulness of the study.

In a clinical paper the methodology employed to generate the results is fully explained. In general, all questions about the related clinical query, the study design, the subjects and the correlated measures to reduce bias and confounding should be adequately and thoroughly explored and answered.

Participants/Sample Population:

Researchers identify the target population they are interested in. A sample population is therefore taken and results from this sample are then generalized to the target population.

The sample should be representative of the target population from which it came. Knowing the baseline characteristics of the sample population is important because this allows researchers to see how closely the subjects match their own patients [ 4 ].

Sample size calculation (Power calculation): A trial should be large enough to have a high chance of detecting a worthwhile effect if it exists. Statisticians can work out before the trial begins how large the sample size should be in order to have a good chance of detecting a true difference between the intervention and control groups [ 5 ].

  • Is the sample defined? Human, Animals (type); what population does it represent?
  • Does it mention eligibility criteria with reasons?
  • Does it mention where and how the sample were recruited, selected and assessed?
  • Does it mention where was the study carried out?
  • Is the sample size justified? Rightly calculated? Is it adequate to detect statistical and clinical significant results?
  • Does it mention a suitable study design/type?
  • Is the study type appropriate to the research question?
  • Is the study adequately controlled? Does it mention type of randomization process? Does it mention the presence of control group or explain lack of it?
  • Are the samples similar at baseline? Is sample attrition mentioned?
  • All studies report the number of participants/specimens at the start of a study, together with details of how many of them completed the study and reasons for incomplete follow up if there is any.
  • Does it mention who was blinded? Are the assessors and participants blind to the interventions received?
  • Is it mentioned how was the data analysed?
  • Are any measurements taken likely to be valid?

Researchers use measuring techniques and instruments that have been shown to be valid and reliable.

Validity refers to the extent to which a test measures what it is supposed to measure.

(the extent to which the value obtained represents the object of interest.)

  • -Soundness, effectiveness of the measuring instrument;
  • -What does the test measure?
  • -Does it measure, what it is supposed to be measured?
  • -How well, how accurately does it measure?

Reliability: In research, the term reliability means “repeatability” or “consistency”

Reliability refers to how consistent a test is on repeated measurements. It is important especially if assessments are made on different occasions and or by different examiners. Studies should state the method for assessing the reliability of any measurements taken and what the intra –examiner reliability was [ 6 ].

3-Selection issues:

The following questions should be raised:

  • - How were subjects chosen or recruited? If not random, are they representative of the population?
  • - Types of Blinding (Masking) Single, Double, Triple?
  • - Is there a control group? How was it chosen?
  • - How are patients followed up? Who are the dropouts? Why and how many are there?
  • - Are the independent (predictor) and dependent (outcome) variables in the study clearly identified, defined, and measured?
  • - Is there a statement about sample size issues or statistical power (especially important in negative studies)?
  • - If a multicenter study, what quality assurance measures were employed to obtain consistency across sites?
  • - Are there selection biases?
  • • In a case-control study, if exercise habits to be compared:
  • - Are the controls appropriate?
  • - Were records of cases and controls reviewed blindly?
  • - How were possible selection biases controlled (Prevalence bias, Admission Rate bias, Volunteer bias, Recall bias, Lead Time bias, Detection bias, etc.,)?
  • • Cross Sectional Studies:
  • - Was the sample selected in an appropriate manner (random, convenience, etc.,)?
  • - Were efforts made to ensure a good response rate or to minimize the occurrence of missing data?
  • - Were reliability (reproducibility) and validity reported?
  • • In an intervention study, how were subjects recruited and assigned to groups?
  • • In a cohort study, how many reached final follow-up?
  • - Are the subject’s representatives of the population to which the findings are applied?
  • - Is there evidence of volunteer bias? Was there adequate follow-up time?
  • - What was the drop-out rate?
  • - Any shortcoming in the methodology can lead to results that do not reflect the truth. If clinical practice is changed on the basis of these results, patients could be harmed.

Researchers employ a variety of techniques to make the methodology more robust, such as matching, restriction, randomization, and blinding [ 7 ].

Bias is the term used to describe an error at any stage of the study that was not due to chance. Bias leads to results in which there are a systematic deviation from the truth. As bias cannot be measured, researchers need to rely on good research design to minimize bias [ 8 ]. To minimize any bias within a study the sample population should be representative of the population. It is also imperative to consider the sample size in the study and identify if the study is adequately powered to produce statistically significant results, i.e., p-values quoted are <0.05 [ 9 ].

4-What are the outcome factors and how are they measured?

  • -Are all relevant outcomes assessed?
  • -Is measurement error an important source of bias?

5-What are the study factors and how are they measured?

  • -Are all the relevant study factors included in the study?
  • -Have the factors been measured using appropriate tools?

Data Analysis and Results:

- Were the tests appropriate for the data?

- Are confidence intervals or p-values given?

  • How strong is the association between intervention and outcome?
  • How precise is the estimate of the risk?
  • Does it clearly mention the main finding(s) and does the data support them?
  • Does it mention the clinical significance of the result?
  • Is adverse event or lack of it mentioned?
  • Are all relevant outcomes assessed?
  • Was the sample size adequate to detect a clinically/socially significant result?
  • Are the results presented in a way to help in health policy decisions?
  • Is there measurement error?
  • Is measurement error an important source of bias?

Confounding Factors:

A confounder has a triangular relationship with both the exposure and the outcome. However, it is not on the causal pathway. It makes it appear as if there is a direct relationship between the exposure and the outcome or it might even mask an association that would otherwise have been present [ 9 ].

6- What important potential confounders are considered?

  • -Are potential confounders examined and controlled for?
  • -Is confounding an important source of bias?

7- What is the statistical method in the study?

  • -Are the statistical methods described appropriate to compare participants for primary and secondary outcomes?
  • -Are statistical methods specified insufficient detail (If I had access to the raw data, could I reproduce the analysis)?
  • -Were the tests appropriate for the data?
  • -Are confidence intervals or p-values given?
  • -Are results presented as absolute risk reduction as well as relative risk reduction?

Interpretation of p-value:

The p-value refers to the probability that any particular outcome would have arisen by chance. A p-value of less than 1 in 20 (p<0.05) is statistically significant.

  • When p-value is less than significance level, which is usually 0.05, we often reject the null hypothesis and the result is considered to be statistically significant. Conversely, when p-value is greater than 0.05, we conclude that the result is not statistically significant and the null hypothesis is accepted.

Confidence interval:

Multiple repetition of the same trial would not yield the exact same results every time. However, on average the results would be within a certain range. A 95% confidence interval means that there is a 95% chance that the true size of effect will lie within this range.

8- Statistical results:

  • -Do statistical tests answer the research question?

Are statistical tests performed and comparisons made (data searching)?

Correct statistical analysis of results is crucial to the reliability of the conclusions drawn from the research paper. Depending on the study design and sample selection method employed, observational or inferential statistical analysis may be carried out on the results of the study.

It is important to identify if this is appropriate for the study [ 9 ].

  • -Was the sample size adequate to detect a clinically/socially significant result?
  • -Are the results presented in a way to help in health policy decisions?

Clinical significance:

Statistical significance as shown by p-value is not the same as clinical significance. Statistical significance judges whether treatment effects are explicable as chance findings, whereas clinical significance assesses whether treatment effects are worthwhile in real life. Small improvements that are statistically significant might not result in any meaningful improvement clinically. The following questions should always be on mind:

  • -If the results are statistically significant, do they also have clinical significance?
  • -If the results are not statistically significant, was the sample size sufficiently large to detect a meaningful difference or effect?

9- What conclusions did the authors reach about the study question?

Conclusions should ensure that recommendations stated are suitable for the results attained within the capacity of the study. The authors should also concentrate on the limitations in the study and their effects on the outcomes and the proposed suggestions for future studies [ 10 ].

  • -Are the questions posed in the study adequately addressed?
  • -Are the conclusions justified by the data?
  • -Do the authors extrapolate beyond the data?
  • -Are shortcomings of the study addressed and constructive suggestions given for future research?
  • -Bibliography/References:

Do the citations follow one of the Council of Biological Editors’ (CBE) standard formats?

10- Are ethical issues considered?

If a study involves human subjects, human tissues, or animals, was approval from appropriate institutional or governmental entities obtained? [ 10 , 11 ].

Critical appraisal of RCTs: Factors to look for:

  • Allocation (randomization, stratification, confounders).
  • Follow up of participants (intention to treat).
  • Data collection (bias).
  • Sample size (power calculation).
  • Presentation of results (clear, precise).
  • Applicability to local population.

[ Table/Fig-2 ] summarizes the guidelines for Consolidated Standards of Reporting Trials CONSORT [ 12 ].

[Table/Fig-2]:

Summary of the CONSORT guidelines.

Critical appraisal of systematic reviews: provide an overview of all primary studies on a topic and try to obtain an overall picture of the results.

In a systematic review, all the primary studies identified are critically appraised and only the best ones are selected. A meta-analysis (i.e., a statistical analysis) of the results from selected studies may be included. Factors to look for:

  • Literature search (did it include published and unpublished materials as well as non-English language studies? Was personal contact with experts sought?).
  • Quality-control of studies included (type of study; scoring system used to rate studies; analysis performed by at least two experts).
  • Homogeneity of studies.

[ Table/Fig-3 ] summarizes the guidelines for Preferred Reporting Items for Systematic reviews and Meta-Analyses PRISMA [ 13 ].

[Table/Fig-3]:

Summary of PRISMA guidelines.

Critical appraisal is a fundamental skill in modern practice for assessing the value of clinical researches and providing an indication of their relevance to the profession. It is a skills-set developed throughout a professional career that facilitates this and, through integration with clinical experience and patient preference, permits the practice of evidence based medicine and dentistry. By following a systematic approach, such evidence can be considered and applied to clinical practice.

Financial or other Competing Interests

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Evaluating Research – Process, Examples and Methods

Evaluating Research – Process, Examples and Methods

Table of Contents

Evaluating Research

Evaluating Research

Definition:

Evaluating Research refers to the process of assessing the quality, credibility, and relevance of a research study or project. This involves examining the methods, data, and results of the research in order to determine its validity, reliability, and usefulness. Evaluating research can be done by both experts and non-experts in the field, and involves critical thinking, analysis, and interpretation of the research findings.

Research Evaluating Process

The process of evaluating research typically involves the following steps:

Identify the Research Question

The first step in evaluating research is to identify the research question or problem that the study is addressing. This will help you to determine whether the study is relevant to your needs.

Assess the Study Design

The study design refers to the methodology used to conduct the research. You should assess whether the study design is appropriate for the research question and whether it is likely to produce reliable and valid results.

Evaluate the Sample

The sample refers to the group of participants or subjects who are included in the study. You should evaluate whether the sample size is adequate and whether the participants are representative of the population under study.

Review the Data Collection Methods

You should review the data collection methods used in the study to ensure that they are valid and reliable. This includes assessing the measures used to collect data and the procedures used to collect data.

Examine the Statistical Analysis

Statistical analysis refers to the methods used to analyze the data. You should examine whether the statistical analysis is appropriate for the research question and whether it is likely to produce valid and reliable results.

Assess the Conclusions

You should evaluate whether the data support the conclusions drawn from the study and whether they are relevant to the research question.

Consider the Limitations

Finally, you should consider the limitations of the study, including any potential biases or confounding factors that may have influenced the results.

Evaluating Research Methods

Evaluating Research Methods are as follows:

  • Peer review: Peer review is a process where experts in the field review a study before it is published. This helps ensure that the study is accurate, valid, and relevant to the field.
  • Critical appraisal : Critical appraisal involves systematically evaluating a study based on specific criteria. This helps assess the quality of the study and the reliability of the findings.
  • Replication : Replication involves repeating a study to test the validity and reliability of the findings. This can help identify any errors or biases in the original study.
  • Meta-analysis : Meta-analysis is a statistical method that combines the results of multiple studies to provide a more comprehensive understanding of a particular topic. This can help identify patterns or inconsistencies across studies.
  • Consultation with experts : Consulting with experts in the field can provide valuable insights into the quality and relevance of a study. Experts can also help identify potential limitations or biases in the study.
  • Review of funding sources: Examining the funding sources of a study can help identify any potential conflicts of interest or biases that may have influenced the study design or interpretation of results.

Example of Evaluating Research

Example of Evaluating Research sample for students:

Title of the Study: The Effects of Social Media Use on Mental Health among College Students

Sample Size: 500 college students

Sampling Technique : Convenience sampling

  • Sample Size: The sample size of 500 college students is a moderate sample size, which could be considered representative of the college student population. However, it would be more representative if the sample size was larger, or if a random sampling technique was used.
  • Sampling Technique : Convenience sampling is a non-probability sampling technique, which means that the sample may not be representative of the population. This technique may introduce bias into the study since the participants are self-selected and may not be representative of the entire college student population. Therefore, the results of this study may not be generalizable to other populations.
  • Participant Characteristics: The study does not provide any information about the demographic characteristics of the participants, such as age, gender, race, or socioeconomic status. This information is important because social media use and mental health may vary among different demographic groups.
  • Data Collection Method: The study used a self-administered survey to collect data. Self-administered surveys may be subject to response bias and may not accurately reflect participants’ actual behaviors and experiences.
  • Data Analysis: The study used descriptive statistics and regression analysis to analyze the data. Descriptive statistics provide a summary of the data, while regression analysis is used to examine the relationship between two or more variables. However, the study did not provide information about the statistical significance of the results or the effect sizes.

Overall, while the study provides some insights into the relationship between social media use and mental health among college students, the use of a convenience sampling technique and the lack of information about participant characteristics limit the generalizability of the findings. In addition, the use of self-administered surveys may introduce bias into the study, and the lack of information about the statistical significance of the results limits the interpretation of the findings.

Note*: Above mentioned example is just a sample for students. Do not copy and paste directly into your assignment. Kindly do your own research for academic purposes.

Applications of Evaluating Research

Here are some of the applications of evaluating research:

  • Identifying reliable sources : By evaluating research, researchers, students, and other professionals can identify the most reliable sources of information to use in their work. They can determine the quality of research studies, including the methodology, sample size, data analysis, and conclusions.
  • Validating findings: Evaluating research can help to validate findings from previous studies. By examining the methodology and results of a study, researchers can determine if the findings are reliable and if they can be used to inform future research.
  • Identifying knowledge gaps: Evaluating research can also help to identify gaps in current knowledge. By examining the existing literature on a topic, researchers can determine areas where more research is needed, and they can design studies to address these gaps.
  • Improving research quality : Evaluating research can help to improve the quality of future research. By examining the strengths and weaknesses of previous studies, researchers can design better studies and avoid common pitfalls.
  • Informing policy and decision-making : Evaluating research is crucial in informing policy and decision-making in many fields. By examining the evidence base for a particular issue, policymakers can make informed decisions that are supported by the best available evidence.
  • Enhancing education : Evaluating research is essential in enhancing education. Educators can use research findings to improve teaching methods, curriculum development, and student outcomes.

Purpose of Evaluating Research

Here are some of the key purposes of evaluating research:

  • Determine the reliability and validity of research findings : By evaluating research, researchers can determine the quality of the study design, data collection, and analysis. They can determine whether the findings are reliable, valid, and generalizable to other populations.
  • Identify the strengths and weaknesses of research studies: Evaluating research helps to identify the strengths and weaknesses of research studies, including potential biases, confounding factors, and limitations. This information can help researchers to design better studies in the future.
  • Inform evidence-based decision-making: Evaluating research is crucial in informing evidence-based decision-making in many fields, including healthcare, education, and public policy. Policymakers, educators, and clinicians rely on research evidence to make informed decisions.
  • Identify research gaps : By evaluating research, researchers can identify gaps in the existing literature and design studies to address these gaps. This process can help to advance knowledge and improve the quality of research in a particular field.
  • Ensure research ethics and integrity : Evaluating research helps to ensure that research studies are conducted ethically and with integrity. Researchers must adhere to ethical guidelines to protect the welfare and rights of study participants and to maintain the trust of the public.

Characteristics Evaluating Research

Characteristics Evaluating Research are as follows:

  • Research question/hypothesis: A good research question or hypothesis should be clear, concise, and well-defined. It should address a significant problem or issue in the field and be grounded in relevant theory or prior research.
  • Study design: The research design should be appropriate for answering the research question and be clearly described in the study. The study design should also minimize bias and confounding variables.
  • Sampling : The sample should be representative of the population of interest and the sampling method should be appropriate for the research question and study design.
  • Data collection : The data collection methods should be reliable and valid, and the data should be accurately recorded and analyzed.
  • Results : The results should be presented clearly and accurately, and the statistical analysis should be appropriate for the research question and study design.
  • Interpretation of results : The interpretation of the results should be based on the data and not influenced by personal biases or preconceptions.
  • Generalizability: The study findings should be generalizable to the population of interest and relevant to other settings or contexts.
  • Contribution to the field : The study should make a significant contribution to the field and advance our understanding of the research question or issue.

Advantages of Evaluating Research

Evaluating research has several advantages, including:

  • Ensuring accuracy and validity : By evaluating research, we can ensure that the research is accurate, valid, and reliable. This ensures that the findings are trustworthy and can be used to inform decision-making.
  • Identifying gaps in knowledge : Evaluating research can help identify gaps in knowledge and areas where further research is needed. This can guide future research and help build a stronger evidence base.
  • Promoting critical thinking: Evaluating research requires critical thinking skills, which can be applied in other areas of life. By evaluating research, individuals can develop their critical thinking skills and become more discerning consumers of information.
  • Improving the quality of research : Evaluating research can help improve the quality of research by identifying areas where improvements can be made. This can lead to more rigorous research methods and better-quality research.
  • Informing decision-making: By evaluating research, we can make informed decisions based on the evidence. This is particularly important in fields such as medicine and public health, where decisions can have significant consequences.
  • Advancing the field : Evaluating research can help advance the field by identifying new research questions and areas of inquiry. This can lead to the development of new theories and the refinement of existing ones.

Limitations of Evaluating Research

Limitations of Evaluating Research are as follows:

  • Time-consuming: Evaluating research can be time-consuming, particularly if the study is complex or requires specialized knowledge. This can be a barrier for individuals who are not experts in the field or who have limited time.
  • Subjectivity : Evaluating research can be subjective, as different individuals may have different interpretations of the same study. This can lead to inconsistencies in the evaluation process and make it difficult to compare studies.
  • Limited generalizability: The findings of a study may not be generalizable to other populations or contexts. This limits the usefulness of the study and may make it difficult to apply the findings to other settings.
  • Publication bias: Research that does not find significant results may be less likely to be published, which can create a bias in the published literature. This can limit the amount of information available for evaluation.
  • Lack of transparency: Some studies may not provide enough detail about their methods or results, making it difficult to evaluate their quality or validity.
  • Funding bias : Research funded by particular organizations or industries may be biased towards the interests of the funder. This can influence the study design, methods, and interpretation of results.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Institutional Review Board (IRB)

Institutional Review Board – Application Sample...

Research Questions

Research Questions – Types, Examples and Writing...

Critical Analysis and Evaluation

Many assignments ask you to   critique   and   evaluate   a source. Sources might include journal articles, books, websites, government documents, portfolios, podcasts, or presentations.

When you   critique,   you offer both negative and positive analysis of the content, writing, and structure of a source.

When   you   evaluate , you assess how successful a source is at presenting information, measured against a standard or certain criteria.

Elements of a critical analysis:

opinion + evidence from the article + justification

Your   opinion   is your thoughtful reaction to the piece.

Evidence from the article  offers some proof to back up your opinion.

The   justification   is an explanation of how you arrived at your opinion or why you think it’s true.

How do you critique and evaluate?

When critiquing and evaluating someone else’s writing/research, your purpose is to reach an   informed opinion   about a source. In order to do that, try these three steps:

  • How do you feel?
  • What surprised you?
  • What left you confused?
  • What pleased or annoyed you?
  • What was interesting?
  • What is the purpose of this text?
  • Who is the intended audience?
  • What kind of bias is there?
  • What was missing?
  • See our resource on analysis and synthesis ( Move From Research to Writing: How to Think ) for other examples of questions to ask.
  • sophisticated
  • interesting
  • undocumented
  • disorganized
  • superficial
  • unconventional
  • inappropriate interpretation of evidence
  • unsound or discredited methodology
  • traditional
  • unsubstantiated
  • unsupported
  • well-researched
  • easy to understand
  • Opinion : This article’s assessment of the power balance in cities is   confusing.
  • Evidence:   It first says that the power to shape policy is evenly distributed among citizens, local government, and business (Rajal, 232).
  • Justification :  but then it goes on to focus almost exclusively on business. Next, in a much shorter section, it combines the idea of citizens and local government into a single point of evidence. This leaves the reader with the impression that the citizens have no voice at all. It is   not helpful   in trying to determine the role of the common voter in shaping public policy.  

Sample criteria for critical analysis

Sometimes the assignment will specify what criteria to use when critiquing and evaluating a source. If not, consider the following prompts to approach your analysis. Choose the questions that are most suitable for your source.

  • What do you think about the quality of the research? Is it significant?
  • Did the author answer the question they set out to? Did the author prove their thesis?
  • Did you find contradictions to other things you know?
  • What new insight or connections did the author make?
  • How does this piece fit within the context of your course, or the larger body of research in the field?
  • The structure of an article or book is often dictated by standards of the discipline or a theoretical model. Did the piece meet those standards?
  • Did the piece meet the needs of the intended audience?
  • Was the material presented in an organized and logical fashion?
  • Is the argument cohesive and convincing? Is the reasoning sound? Is there enough evidence?
  • Is it easy to read? Is it clear and easy to understand, even if the concepts are sophisticated?
  • Writing Rules
  • Running Head & Page numbers
  • Using Quotations
  • Citing Sources
  • Reference List
  • General Reference List Principles
  • Structure of the Report

Introduction

  • References & Appendices
  • Unpacking the Assignment Topic
  • Planning and Structuring the Assignment
  • Writing the Assignment
  • Writing Concisely
  • Developing Arguments

Critically Evaluating Research

  • Editing the Assignment
  • Writing in the Third Person
  • Directive Words
  • Before You Submit
  • Cover Sheet & Title Page
  • Academic Integrity
  • Marking Criteria
  • Word Limit Rules
  • Submitting Your Work
  • Writing Effective E-mails
  • Writing Concisely Exercise
  • About Redbook

Some research reports or assessments will require you critically evaluate a journal article or piece of research. Below is a guide with examples of how to critically evaluate research and how to communicate your ideas in writing.

To develop the skill of being able to critically evaluate, when reading research articles in psychology read with an open mind and be active when reading. Ask questions as you go and see if the answers are provided. Initially skim through the article to gain an overview of the problem, the design, methods, and conclusions. Then read for details and consider the questions provided below for each section of a journal article.

  • Did the title describe the study?
  • Did the key words of the title serve as key elements of the article?
  • Was the title concise, i.e., free of distracting or extraneous phrases?
  • Was the abstract concise and to the point?
  • Did the abstract summarise the study’s purpose/research problem, the independent and dependent variables under study, methods, main findings, and conclusions?
  • Did the abstract provide you with sufficient information to determine what the study is about and whether you would be interested in reading the entire article?
  • Was the research problem clearly identified?
  • Is the problem significant enough to warrant the study that was conducted?
  • Did the authors present an appropriate theoretical rationale for the study?
  • Is the literature review informative and comprehensive or are there gaps?
  • Are the variables adequately explained and operationalised?
  • Are hypotheses and research questions clearly stated? Are they directional? Do the author’s hypotheses and/or research questions seem logical in light of the conceptual framework and research problem?
  • Overall, does the literature review lead logically into the Method section?
  • Is the sample clearly described, in terms of size, relevant characteristics (gender, age, SES, etc), selection and assignment procedures, and whether any inducements were used to solicit subjects (payment, subject credit, free therapy, etc)?
  • What population do the subjects represent (external validity)?
  • Are there sufficient subjects to produce adequate power (statistical validity)?
  • Have the variables and measurement techniques been clearly operationalised?
  • Do the measures/instruments seem appropriate as measures of the variables under study (construct validity)?
  • Have the authors included sufficient information about the psychometric properties (eg. reliability and validity) of the instruments?
  • Are the materials used in conducting the study or in collecting data clearly described?
  • Are the study’s scientific procedures thoroughly described in chronological order?
  • Is the design of the study identified (or made evident)?
  • Do the design and procedures seem appropriate in light of the research problem, conceptual framework, and research questions/hypotheses?
  • Are there other factors that might explain the differences between groups (internal validity)?
  • Were subjects randomly assigned to groups so there was no systematic bias in favour of one group? Was there a differential drop-out rate from groups so that bias was introduced (internal validity and attrition)?
  • Were all the necessary control groups used? Were participants in each group treated identically except for the administration of the independent variable?
  • Were steps taken to prevent subject bias and/or experimenter bias, eg, blind or double blind procedures?
  • Were steps taken to control for other possible confounds such as regression to the mean, history effects, order effects, etc (internal validity)?
  • Were ethical considerations adhered to, eg, debriefing, anonymity, informed consent, voluntary participation?
  • Overall, does the method section provide sufficient information to replicate the study?
  • Are the findings complete, clearly presented, comprehensible, and well organised?
  • Are data coding and analysis appropriate in light of the study’s design and hypotheses? Are the statistics reported correctly and fully, eg. are degrees of freedom and p values given?
  • Have the assumptions of the statistical analyses been met, eg. does one group have very different variance to the others?
  • Are salient results connected directly to hypotheses? Are there superfluous results presented that are not relevant to the hypotheses or research question?
  • Are tables and figures clearly labelled? Well-organised? Necessary (non-duplicative of text)?
  • If a significant result is obtained, consider effect size. Is the finding meaningful? If a non-significant result is found, could low power be an issue? Were there sufficient levels of the IV?
  • If necessary have appropriate post-hoc analyses been performed? Were any transformations performed; if so, were there valid reasons? Were data collapsed over any IVs; if so, were there valid reasons? If any data was eliminated, were valid reasons given?

Discussion and Conclusion

  • Are findings adequately interpreted and discussed in terms of the stated research problem, conceptual framework, and hypotheses?
  • Is the interpretation adequate? i.e., does it go too far given what was actually done or not far enough? Are non-significant findings interpreted inappropriately?
  • Is the discussion biased? Are the limitations of the study delineated?
  • Are implications for future research and/or practical application identified?
  • Are the overall conclusions warranted by the data and any limitations in the study? Are the conclusions restricted to the population under study or are they generalised too widely?
  • Is the reference list sufficiently specific to the topic under investigation and current?
  • Are citations used appropriately in the text?

General Evaluation

  • Is the article objective, well written and organised?
  • Does the information provided allow you to replicate the study in all its details?
  • Was the study worth doing? Does the study provide an answer to a practical or important problem? Does it have theoretical importance? Does it represent a methodological or technical advance? Does it demonstrate a previously undocumented phenomenon? Does it explore the conditions under which a phenomenon occurs?

How to turn your critical evaluation into writing

Example from a journal article.

Banner

  • Teesside University Student & Library Services
  • Learning Hub Group

Critical Appraisal for Health Students

  • Critical Appraisal of a qualitative paper
  • Critical Appraisal: Help
  • Critical Appraisal of a quantitative paper
  • Useful resources

Appraisal of a Qualitative paper : Top tips

undefined

  • Introduction

Critical appraisal of a qualitative paper

This guide aimed at health students, provides basic level support for appraising qualitative research papers. It's designed for students who have already attended lectures on critical appraisal. One framework  for appraising qualitative research (based on 4 aspects of trustworthiness) is  provided and there is an opportunity to practise the technique on a sample article.

Support Materials

  • Framework for reading qualitative papers
  • Critical appraisal of a qualitative paper PowerPoint

To practise following this framework for critically appraising a qualitative article, please look at the following article:

Schellekens, M.P.J.  et al  (2016) 'A qualitative study on mindfulness-based stress reduction for breast cancer patients: how women experience participating with fellow patients',  Support Care Cancer , 24(4), pp. 1813-1820.

Critical appraisal of a qualitative paper: practical example.

  • Credibility
  • Transferability
  • Dependability
  • Confirmability

How to use this practical example 

Using the framework, you can have a go at appraising a qualitative paper - we are going to look at the following article: 

Step 1.  take a quick look at the article, step 2.  click on the credibility tab above - there are questions to help you appraise the trustworthiness of the article, read the questions and look for the answers in the article. , step 3.   click on each question and our answers will appear., step 4.    repeat with the other aspects of trustworthiness: transferability, dependability and confirmability ., questioning the credibility:, who is the researcher what has been their experience how well do they know this research area, was the best method chosen what method did they use was there any justification was the method scrutinised by peers is it a recognisable method was there triangulation ( more than one method used), how was the data collected was data collected from the participants at more than one time point how long were the interviews were questions asked to the participants in different ways, is the research reporting what the participants actually said were the participants shown transcripts / notes of the interviews / observations to ‘check’ for accuracy are direct quotes used from a variety of participants, how would you rate the overall credibility, questioning the transferability, was a meaningful sample obtained how many people were included is the sample diverse how were they selected, are the demographics given, does the research cover diverse viewpoints do the results include negative cases was data saturation reached, what is the overall transferability can the research be transferred to other settings , questioning the dependability :, how transparent is the audit trail can you follow the research steps are the decisions made transparent is the whole process explained in enough detail did the researcher keep a field diary is there a clear limitations section, was there peer scrutiny of the researchwas the research plan shown to peers / colleagues for approval and/or feedback did two or more researchers independently judge data, how would you rate the overall dependability would the results be similar if the study was repeated how consistent are the data and findings, questioning the confirmability :, is the process of analysis described in detail is a method of analysis named or described is there sufficient detail, have any checks taken place was there cross-checking of themes was there a team of researchers, has the researcher reflected on possible bias is there a reflexive diary, giving a detailed log of thoughts, ideas and assumptions, how do you rate the overall confirmability has the researcher attempted to limit bias, questioning the overall trustworthiness :, overall how trustworthy is the research, further information.

See Useful resources  for links, books and LibGuides to help with Critical appraisal.

  • << Previous: Critical Appraisal: Help
  • Next: Critical Appraisal of a quantitative paper >>
  • Last Updated: Aug 25, 2023 2:48 PM
  • URL: https://libguides.tees.ac.uk/critical_appraisal

Training videos   |   Faqs

Ref-n-Write: Scientific Research Paper Writing Software

Critical Literature Review : How to Critique a Research Article?

A literature review critically examines the methodologies used in the studies, discussing their strengths and weaknesses. Your review should be critical and analytical, not just descriptive. In this blog, we will look at how to use constructive language when critiquing other’s work in your research paper.

1. Should a Literature Review be Critical?

Absolutely! A literature review should indeed be critical. It’s like being a judge in a talent show. Rather than simply describing each performance, you critically evaluate them, highlighting strengths, noting weaknesses, and considering how each act contributes to the show’s overall theme. In the context of a literature review:

Assess the Methodology

Critically analyze the research methods used in the studies you review. Are there limitations or biases in how the research was conducted?

Compare and Contrast

Examine how different studies relate to each other. Do they present conflicting evidence? Do they build upon each other’s work?

Identify Gaps

A critical review identifies what hasn’t been explored or fully understood in your area of study. Highlighting these gaps is crucial for setting the direction for future research.

Theoretical Analysis

Engage with the theoretical frameworks underpinning the research. Are there alternative theories or perspectives that could be applied?

Remember, a critical literature review doesn’t mean being negative about the studies. It’s about providing a thoughtful, in-depth analysis that offers a balanced view of the research landscape. This critical approach adds depth to your review, demonstrating your understanding and engagement with the field.

2. How to Critique a Research Article?

Keep your review academic and objective. Avoid personal bias and ensure your research backs your arguments. Your critique should be evidence-based. Here are some academic phrases from Ref-n-write’s academic phrasebank that you can use to critique research articles.

  • Prior study drawbacks
  • Debates/Controvesies
  • Research gap

3. Using Constructive and Diplomatic Language

Try to use constructive and diplomatic language in your literature review. Pay specific attention to your language when you are criticizing other researchers in your field. Let’s look at some examples.

3.1. Example 1

Here, we are pointing out the drawback of previous studies.

Bad: Too blunt None of the previous works [1-4] offer a good solution. The problem is still unsolved.
Good: Some credit given to previous research Despite the success of previous works [1-4] in certain aspects, the problem is still unsolved.

Look at the first statement, it is quite blunt and we are very critical of previous works. Now look at the second statement, we are giving some credit to the previous authors and appreciating their efforts. And then we are making our claim that there is no solution to the problem.

3.2. Example 2

Here is another example. Here we are establishing the research gap.

Bad: Too confident There are no studies in the literature that deals with this problem.
Good: Modest language To the best of our knowledge, there are no studies in the literature that deals with this problem.

Look at the first statement, we are coming across as too confident. We are saying that we are 100% sure that there are no studies in the literature that deals with this issue. Now, look at the second statement. We managed to tone it down a bit by using the phrase ‘To the best of our knowledge’. Now the statement sounds a bit more modest and constructive.

4. Can I use ChatGPT for Critical Literature Review?

how critically evaluate research paper

Using ChatGPT or similar AI language models to assist in a literature review is intriguing, blending traditional research methods with cutting-edge technology. ChatGPT can quickly provide summaries or explanations of concepts, saving time in the initial stages of research. However, it’s essential to understand the disadvantages:

A literature review is a foundational element in academia, whether you’re crafting a thesis, dissertation, or any significant research project. It’s not just a formality; it’s a strategic exploration that shapes and informs your research journey. When writing your literature review, please keep in mind the things we have discussed in this blog. For further reading, see our blog on literature review phrases .

Similar Posts

3 Costly Mistakes to Avoid in the Research Introduction

3 Costly Mistakes to Avoid in the Research Introduction

In this blog, we will discuss three common mistakes that beginner writers make while writing the research paper introduction.

Introduction Paragraph Examples and Writing Tips

Introduction Paragraph Examples and Writing Tips

In this blog, we will go through a few introduction paragraph examples and understand how to construct a great introduction paragraph for your research paper.

Useful Phrases and Sentences for Academic & Research Paper Writing

Useful Phrases and Sentences for Academic & Research Paper Writing

In this blog, we explain various sections of a research paper and give you an overview of what these sections should contain.

Results Section Examples and Writing Tips

Results Section Examples and Writing Tips

In this blog, we will go through many results section examples and understand how to write a great results section for your paper.

Examples of Good and Bad Research Questions

Examples of Good and Bad Research Questions

In this blog, we will look at some examples of good and bad research questions and learn how to formulate a strong research question.

How to Write a Research Paper? A Beginners Guide with Useful Academic Phrases

How to Write a Research Paper? A Beginners Guide with Useful Academic Phrases

This blog explains how to write a research paper and provides writing ideas in the form of academic phrases.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • 10 Share Facebook
  • 0 Share Twitter
  • 0 Share LinkedIn
  • 0 Share Email

how critically evaluate research paper

  • Open access
  • Published: 16 April 2024

Biomedical semantic text summarizer

  • Mahira Kirmani 1 ,
  • Gagandeep Kour 1 ,
  • Mudasir Mohd 2 ,
  • Nasrullah Sheikh 3 ,
  • Dawood Ashraf Khan 4 ,
  • Zahid Maqbool 5 ,
  • Mohsin Altaf Wani 2 &
  • Abid Hussain Wani 2  

BMC Bioinformatics volume  25 , Article number:  152 ( 2024 ) Cite this article

145 Accesses

Metrics details

Text summarization is a challenging problem in Natural Language Processing, which involves condensing the content of textual documents without losing their overall meaning and information content, In the domain of bio-medical research, summaries are critical for efficient data analysis and information retrieval. While several bio-medical text summarizers exist in the literature, they often miss out on an essential text aspect: text semantics.

This paper proposes a novel extractive summarizer that preserves text semantics by utilizing bio-semantic models. We evaluate our approach using ROUGE on a standard dataset and compare it with three state-of-the-art summarizers. Our results show that our approach outperforms existing summarizers.

The usage of semantics can improve summarizer performance and lead to better summaries. Our summarizer has the potential to aid in efficient data analysis and information retrieval in the field of biomedical research.

Peer Review reports

Introduction

The volume of biomedical documents has increased considerably since medical records were digitized. The patient-doctor interactions also lead to a considerable increase in biomedical papers. With more than 3500 documents added daily to different journals. Pubmed has around 36 million citations and abstracts of biomedical literature [ 1 ]. All these resources provide a valuable source of information to health practitioners, doctors, and researchers. However, retrieving information from this enormous knowledge base is cumbersome and time-consuming. Thus, text summarization is a potential solution to this information overload problem. Text summarization condenses this information for quick and efficient consumption.

Text summarization is the art of condensing lengthy textual documents into concise versions while retaining the original document’s core meaning and informational value. Doing so results in more digestible and easily comprehensible documents to the reader. This process facilitates the efficient handling of extensive documents. In formal terms, text summarization involves thoroughly analyzing and processing lengthy text documents to distill their essence for immediate consumption. This, in turn, enhances understanding and readability without sacrificing the document’s overall meaning and significance.

Text summarization is helpful as it serves the following purposes: Text summarization produces smaller documents, reducing the input documents’ size and hence a shorter reading time; text summarization helps produce reports used by commercial companies for easier decision-making; text summaries are useful in stock markets and reviewing financial statements; emails are easily comprehensible if email summarization is employed [ 2 ]; for e-learning systems, text summarization is highly beneficial; text summaries are highly useful in determining the polarity and sentiment of a document [ 3 ]. Hence, we find an excellent motivation to work on the text summarization and improve the process [ 4 ].

Text summarization is categorized into abstractive, extractive, and hybrid summaries depending on whether only a subset of sentences is chosen without any transformation of the selected sentences. If a subset of sentences is chosen that are representatives of the original document, without any modification of the original sentences, we call this extractive summarization. While as if some transformation is done for the selected subset of sentences, we call it abstractive summarization. We apply natural language generation tools to extractive summaries to obtain their abstractive version. Thus, the summarized sentences are distinct from the original set of sentences. The summarised document’s contents are different from the initial set of sentences in the original document. Both abstractive and extractive rules are applied in the hybrid summarization, hence the hybrid name for the summarization technique.

All three summarization techniques employ stylistic and syntactic rules to obtain summaries. Standard features used are the relative position of the sentence, length of the sentences, and presence of verb phrases and noun phrases. These features scale up well, but these methods miss out on an essential and characteristic feature of the textual structure, i.e., textual semantics. Semantics form an inherent and crucial feature of the input document but are overlooked by the existing summarizers in the literature. Hence, the existing summarizers are overlooking this aspect of the text. These summarizers assume that only statistical features are central to the summarization and thus miss out on semantics.

We use distributional semantic models to capture the semantics of the text. However, directly applying these models to biomedical text processing has limitations. These models are trained on general domain datasets; thus, their application on biomedical documents does not yield good results [ 5 ]. These documents include medical terminologies like abbreviations, synonyms, and hyponyms specific to the medical domain only. The word distributions of the biomedical documents differ from the other domain documents. Thus, applying the general domain semantic models to biomedical documents for text summarization achieves inaccurate results [ 6 ].

We introduce the application of bio-semantic models for text summarization. These bio-semantic models are extensions of distributional semantic models trained on biomedical datasets. In this study, we primarily focus on summarizing textual papers in the biomedical domain, which are an integral part of the vast biomedical literature.

In this paper, we hypothesize that the bio-semantic models that are extensions of the distributional semantic models result in better biomedical text summarizers. We implement text summarizers using these bio-semantic models. We use these models to capture semantics from the text by computing semantic coherence between two textual elements and then use them as a text summarization feature. Our evaluation of the results proves that the bio-semantic summarizer produces better quality summaries than the summarizers that do not use semantic models for the biomedical domain and achieve state-of-the-art results. The characteristic of our biomedical semantic summarizer is exciting and novel, considering that no summarization in the biomedical domain uses bio-semantic models for extracting semantic features and then using them for text summarization.

We propose an extractive summarization technique using bio-semantics models. The proposed bio-semantic summarizing system consists of four steps:

Semantics of the text is used as a feature to obtain text summaries. We use bio-semantic models to obtain semantics. These models are used to obtain our novel big-vectors . Big-vectors are semantic bag-of-words extensions of the sentences. Each sentence of the input text document is fed to the bio-semantic model to obtain semantic transformation of the sentence. More precisely, the words of the sentence are given to the bio-semantic models to retrieve their word vectors. These word vectors for the given sentence are then concatenated to obtain a unique big-vector for the sentence.

Next, we use a k-means clustering algorithm to cluster these big-vectors into different clusters.

Next, ranking is performed on each cluster to obtain ranked sentences. We use our novel ranking algorithm that uses sentence scoring functions to obtain ranks for each sentence in the input document.

The last step chooses the summary sentences among the highest-ranked sentences.

The term ’semantic bag-of-words’ refers to a representation that captures the semantic content of a document or sentence in a manner similar to the traditional bag-of-words model. In a standard bag-of-words approach, the emphasis is on word frequency, treating each word as an isolated unit without considering its semantic relationships with other words. In contrast, the ’semantic bag-of-words’ extends this concept by incorporating semantic information. Each ’big-vector,’ which is an extension of the traditional bag-of-words representation, not only considers the occurrence of individual words but also encodes their semantic meaning. This is achieved by leveraging distributional semantic models that analyze the contextual relationships between words. In essence, ’big-vectors’ serve as semantic extensions of traditional bag-of-words representations, providing a more nuanced understanding of the text by considering the semantic associations between words.

To validate our hypothesis, we performed experiments using two different bio-semantic models. We used the dataset used by Givchi et al. [ 5 ]. The dataset is publicly available. Footnote 1

In the realm of biomedical text summarization, an extensive body of literature exists, with numerous studies delving into various aspects of this field. These investigations have yielded valuable insights into the challenges and opportunities surrounding the summarization of biomedical documents.

Within this context, it is imperative to recognize that while existing research has significantly contributed to the domain of text summarization, it also reveals some noteworthy gaps and unexplored avenues. Notably, many of the conventional summarization methods have primarily emphasized statistical and structural aspects of the text, often sidelining a pivotal facet–the intricate semantics of biomedical documents. Despite the rich and nuanced semantics intrinsic to the biomedical domain, previous summarization models have been limited in their ability to harness this semantic wealth.

As a result, a critical literature gap emerges that underscores the need for approaches capable of incorporating and leveraging the semantic intricacies inherent in biomedical texts. The vast and diverse terminology, including medical abbreviations, synonyms, and domain-specific word distributions, poses a unique challenge that necessitates the development of specialized methods. Existing summarizers, trained on more general datasets, often fall short when applied to the idiosyncrasies of biomedical documents.

These considerations lay the foundation for our study’s motivations. In this research, we aim to bridge the gap between conventional summarization methods and the distinctive semantic landscape of biomedical literature. By introducing “bio-semantic” models, extensions of distributional semantic models trained on biomedical datasets, we aspire to enhance the summarization process and elevate it to new levels of effectiveness. Our primary goal is to investigate the potential of these models in improving the quality of biomedical text summarization.

Furthermore, by conducting rigorous experiments and evaluations, we seek to provide empirical evidence that substantiates the superiority of bio-semantic summarization over conventional approaches. Our pursuit of state-of-the-art results is driven by the compelling motivation to offer the biomedical community an advanced summarization tool that optimally captures and conveys the intricate semantics of biomedical documents.

By addressing these literature gaps and motivations, our study contributes to the ongoing evolution of text summarization in the biomedical domain and presents a novel approach that is tailor-made for the unique characteristics of this field.

The results of our summarizer are evaluated using ROUGE (Recall-Oriented Understudy for Gisting Evaluation). ROUGE produces results using the comparison of ground truth (human summaries) and system-generated summaries. It quantifies results by generating different metrics like precision, recall, and f-score. Three different ROUGE scores namely ROUGE-1, ROUGE-2 and ROUGE-L are reported for evaluation of our results. We have used these specific ROUGE types since they are standard metrics for evaluating text summarization models, and many previous studies have reported results using these metrics. Their widespread adoption enables easy comparisons with existing literature and benchmarking against other models.

We also compare our system with other biomedical summarizers found in the literature. The results confirmed our hypothesis and effectively described the competitive effectiveness of our proposed system.

The rest of the paper is described in the following manner. Section “ Related works " discusses exhaustive state-of-art text summarization, Sect. “ Methodology ” discusses the methodology employed, Sect. “ Experimental setup and results ”  discusses the results and comparative analysis, and we finally conclude in Sect. “ Conclusion and future work ”.

Related works

This section covers various aspects of text summarization, including the techniques employed by automatic summarization systems, their limitations, and the benefits they offer compared to others. A comprehensive examination of various methods, techniques, and feature selection is conducted to draw conclusions and identify remaining challenges.

Unsupervised techniques

The unsupervised approach of text summarization uses statistical features of text for summarizing the documents. It is the most used approach for summarization. The earliest attempt to obtain text summaries was made by Luhn in 1950. Luhn worked with the assumption that a document consists of various concepts and terms. The most important terms for summarization are the most frequent terms used in the document. This leads to the conclusion that the more frequently the term is in the document, the higher its importance for the summary. Luhn also concluded that these frequent terms are thematic terms or descriptive terms. Luhn thus proposed that such terms should be included in the summaries [ 7 ].

Edmundson et al. [ 8 ] extended Luhn’s work by incorporating extra features that improve the score of relevant sentences for text summarization. They selected the following characteristics to score the sentences in the documents: (1) word frequency; (2) the number of title word occurrences in a sentence; (3) Position of the sentence in the document; the more important the sentence, the higher it is in the document. Baxendale et al. [ 9 ] laid the framework for abstract summarizers. The paper mentions that summaries can be created by adding sentences from those not in the document’s text.

The unsupervised technique employed in the literature includes the works like [ 10 , 11 , 12 ]. They use different features to rank sentences and produce extractive summaries. These features are statistical and used to rank sentences for summarization. Mihalcea et al. introduced textrank [ 13 ], an algorithm that uses several statistical features and one specialized function that calculates a weighted sum for calculating the similarity between sentences [ 14 ].

Query based text summarization

In this technique, the summaries are generated by a scoring mechanism. The sentences that contain the query terms are scored higher and thus obtain higher ranks. These high-ranked sentences constitute the summary of the input documents. More specifically, this approach centres around the user query terms and their extensions. The user-input query terms form the basis of this type of summarizing model.

Higher-scoring sentences and other structural elements are the summarizer’s output. With query-based summarization, the output summaries may have extracts composed of different portions or sections of the input text; hence, summaries are the union of these other extracts. In their paper [ 15 ] uses long text queries to solve word sense disambiguation problems in Urdu information retrieval system.

To overcome the issue caused by the incomplete information within the initial queries, the paper [ 16 ] proposed to combine a query expansion method with a graph-based summarization strategy. The input text document used as an input for summarization is used for query expansion rather than any external sources like WordNet. The paper [ 17 ] uses sentence-to-sentence and word-to-sentence mapping for query expansion. They evaluated their system using DUC 2006 and DUC 2007 datasets. The evaluation of the results confirms that their system performs better than the system that does not use query expansion. The Support Vector Regression (SVR) is used by [ 18 ] to find important sentences in the input document for summary generation in response to the user query. [ 19 ] designed a query-based text summarizer using word sense disambiguation. They expand the queries using common sense knowledge. [ 20 ] uses query-based summarization to obtain slide-based presentations for the research articles.

Machine learning based approaches

Recent advances in machine learning algorithms have made automatic text summarizing (ATS) applications possible. The techniques employed by these algorithms include identifying and extracting suitable feature and their corresponding application for the design of ATS. These algorithms create a model and then use that for the ATS. The learned model then determines the sentences that will be part of the summaries. There are various ATSs defined in the literature using machine learning. Several studies have utilized different methods to generate extractive summaries. For instance, [ 21 ] employed the Support Vector Machines (SVM) and the Naive Bayes model with relevant, topic, event, and content features. Meanwhile, [ 22 ] developed an extractive summarizer using machine learning techniques and statistical and linguistic data directly from the source textual document. [ 23 ] utilized hidden Markov chains (HMMs), and [ 24 ] enhanced extractive summary outcomes by ordering sentences in the documents.

Neural networks based approaches

Deep learning has gained popularity due to recent technological advancements and decreased memory costs. When appropriate training data is available, neural network summarizers perform better than traditional automatic summarizers with minimal human intervention. In their paper, [ 25 ] presents an overview of all the neural network algorithms that are used as state-of-art for summarising text. Researchers have used neural networks in various forms to develop text summarization systems. For example, [ 26 ] utilized continuous vectors based on neural networks to create an extractive summarizer and achieved better results. The first abstractive summarizer using CNNs was introduced by [ 27 ]. [ 28 ] built on this work by creating an abstractive summarizer using CNNs and other neural networks. An RNN with an attentional encoder-decoder was used by [ 29 ] to generate an abstractive summary. COPYNET, a sequence-to-sequence learning technique that copies text segments at specific intervals, was introduced by [ 30 ]. The off-vocabulary problem was tackled by [ 31 ] using a neural network and a pointer-generator approach. The Chinese corpus was used by [ 32 ] to generate a summary. [ 33 ] described a neural network that uses a distraction strategy to allow users to focus on different input parts. A semantic relevance-based neural network was used by [ 34 ] to create semantically important summaries of Chinese data. Finally, [ 35 ] used a bi-directional LSTM encoder to generate an extracted summary.

Graph based approaches

[ 36 ] Graph-based techniques used supervised and unsupervised learning schemes to create extractive single-document summaries. The goal is to find relevant sentences by extracting statistical aspects from these two approaches and then use graphs to determine sentence coherence. [ 37 ] computed word similarity adjacency networks to arbitrate authorship. They used text as a graph to identify the author. [ 38 ] employs multi-layer graph approaches for summarising several documents, where nodes represent sentences and edges represent the coherence between the two sentences. In their paper [ 39 ] achieved good results on multi-lingual datasets.

Biomedical summarization

With the increased focus on domain-specific NLP applications, automatic text summarization for biomedical documents has recently gained much attention. Various biomedical text summarizers exist in the literature that achieves good results. [ 40 ] designed a deep bidirectional language model that uses contextual embeddings. The system works on the context and achieves better results than the baselines. [ 41 ] uses reinforcement learning-based biomedical summarization to summarize biomedical papers from their abstracts as headlines. These headlines are the domain-aware abstractive summaries of the input papers. [ 42 ] uses BERT and openAI GPT-2 to design the biomedical text summarizer. The designed system uses keywords extracted from the original text articles. [ 43 ] uses transfer learning employing text-to-text summarization set up. The model uses RNN with an attention mechanism to perform abstractive summarization of the medical documents.

These state-of-the-art systems achieve efficient results and contribute to the knowledge base, leading to efficient text summaries. However, all these systems have some drawbacks. Neural network-based summarizers produce good summaries; however, they require large volumes of data and enormous computation time. Machine learning-based systems require feature identification and selection for performing competitively. Unsupervised and graph-based methods tend to miss out on essential and fundamental features, i.e., semantics and meaning of data. Thus, we propose a domain-specific text summarizer for the biomedical domain that captures the text’s underlying meaning and semantics. The proposed approach uses neural networks and unsupervised algorithms to summarize biomedical documents. The evaluation and analysis of our results confirm that using semantics as a feature for text summarization improves the system’s performance as a whole.

Methodology

In this section, we describe our methodology for constructing our ATS for biomedical documents using bio-semantic models. The novelty of our approach is using bio-semantic models for utilizing semantics as a feature for ATS. We employ bio-semantic models for extracting semantic features and then use these features along with other stylistic and statistical features. Figure  1 shows the architecture of our proposed approach.

figure 1

Overall working of the system

More specifically, our proposed approach consists of the following steps: (1) Preprocessing text for text normalization and inconsistency removal; (2) Distributional semantic models used to capture text’s semantics;(3) The vectors generated by distributional semantic models are combined through our novel vector generation algorithm known as the big-vector generation algorithm to produce dense semantic extensions of the input sentences.; (4) We then cluster the semantically similar sentences together in a single cluster using the clustering algorithms. The single cluster is a coherent representation of the most similar sentences.; (5) Ranking algorithm to obtain scores for each sentence from each cluster; and (6) Normalization of scores for efficient sentence extraction to obtain the summary.

Preprocessing

Preprocessing is an essential step in our system. The purpose of preprocessing is to clean the data and remove inconsistencies. Preprocessing is performed to make data uniform and processing-ready. We perform the following functions during preprocessing:

Remove unnecessary parts from papers: The unnecessary elements from the paper, like abstract, title, tables, figures, and references, are unnecessary regarding summarization. Thus, they are removed from input documents before our system processes them.

Tokenization: The input text is processed in the same units for uniformity. We break text into words for efficient processing.

Removing punctuations and numbers: Numbers and punctuations convey no meaningful information content and are thus removed from the input text.

Lowercase: All the text is converted to lowercase for efficient processing.

Lemmatization: Lemmatization is a text normalization technique used in Natural Language Processing (NLP). It’s used to identify word variations and determine the root of a word. Lemmatization groups different inflected forms of words into the root form, which have the same meaning. For example, a lemmatization algorithm would reduce the word “better” to its root word, or “lemma” or “good”. The words extracted during tokenization are then transformed into their base or root form through lemmatization, ensuring consistency and aiding in subsequent analysis. This step is facilitated using the Stanford Core NLP package [ 44 ].

Capturing semantics using distributional bio-semantic models

Bio-semantic models are the extensions of distributional semantic models. These models are domain-specific and limited to the biomedical domain. These models work on the principle of distributional hypothesis. According to this hypothesis, words used in the same context tend to have similar meanings. These models do not require any lexical and linguistic analysis. Furthermore, these models are independent of external information to obtain semantics.

Bio-semantic models are the pre-trained language representational models for the bio-medical domain. They are trained in the biomedical domain corpora like PubMed abstracts and PMC full-text articles. These models are used in various biomedical NLP applications like Named Entity Recognition, Question-Answering systems [ 45 , 46 ], Neural Machine Translation [ 47 ], and Relation Extraction [ 48 ].

Our approach uses word embeddings to build semantic models that capture the coherence between two textual elements. These models create word embeddings by using statistical computations on the context in which the word appears, considering the words that occur close to the target word. The resulting dense vector representation of the target word is called word embedding. The coherence between two elements is measured using these word embeddings. Using these models in text summarization helps capture the meaning and relationships between sentences, leading to a summary that preserves the semantic coherence of the original text. Word embeddings are high-dimensional real-valued vectors generated for each word. They have been found to be helpful in NLP tasks such as text classification, sentiment analysis [ 49 ], and machine translation. In high-dimensional vector spaces, these representations and their geometric properties can help determine the coherence of various word usages. This results in the observation that words close in high-dimensional vector spaces are syntactically and semantically similar.

We have employed two distributional bio-semantic models, namely BioBERT [ 45 ] and a biomedical extension of Word2Vec [ 50 ], in various experiments to validate our hypothesis. These models are extensions of distributional semantic models. Word2Vec, a two-layer neural network model, excels at producing high-quality text semantics. It accomplishes this by transforming words into high-dimensional vector space embeddings, with the model taking a word as input and generating a semantic-rich vector as output–a name that aptly describes its functionality.

Word2Vec offers two architectures: Continuous Bag of Vectors (CBOW) and Skip-gram. The CBOW model predicts a word from its context, while Skip-gram is used to predict the context surrounding a given word. Moen et al. [ 50 ] crafted 200-dimensional vectors using Word2Vec [ 51 ], leveraging all publication abstracts from PubMed and full-text documents from the PubMed Central Open Access subset. They employed a skip-gram model with a window size of 5, hierarchical softmax training, and a frequent word subsampling threshold of 0.001 to construct these 200-dimensional vectors.

BioBERT [ 45 ] (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) represents a specialized extension of BERT [ 52 ] tailored for the biomedical domain. In the subsequent subsection, we introduce our novel Big-vector generation algorithm, which plays a crucial role in capturing semantic information from the input biomedical documents.

Big-vector generation

Our big-vector generation process introduces a novel algorithm for constructing big-vectors from the vectors generated by distributional bio-semantic models. To achieve this, we feed all the words in the input text into the bio-semantic models. This allows us to obtain concatenated vectors for each sentence, effectively creating a comprehensive bag of words represented by a single vector.

Let’s denote \(\beta (w)\) as a function responsible for retrieving a list of the top ’m’ similar words from a given bio-semantic model. Mathematically, this function can be expressed as \(w' = \beta (w) = w'_1 \oplus w'_2 \oplus \ldots \oplus w'_m\) . For a sentence comprising a sequence of ’k’ tokenized words represented as \(W = \{w_1, w_2, w_3, \ldots , w_k\}\) , where ’w’ represents a word from the sentence. We construct a big-vector BGV by concatenating the respective top ’m’ similar words for each word. In other words, BGV can be defined as \(BGV = \{\beta (w_1) \oplus \beta (w_2) \oplus \ldots \oplus \beta (w_k)\}\) .

This process allows us to create meaningful big-vectors that capture the essence of the input text and serve as valuable semantic representations for our summarization tasks.

This rich semantic vector is then fed to the clustering algorithm to obtain different clusters representing all the semantic information for that cluster [ 53 ].

In this phase, we employ the K-means clustering algorithm to group semantically rich big-vectors obtained from the input sentences. These big-vectors represent an extension of the sentences as a semantic bag of words. It’s essential to note that these vectors result from applying distributional semantic models to capture the intricate semantic structures within the text.

The K-means clustering algorithm [ 54 ] is utilized to organize the big-vectors into distinct clusters. However, it’s crucial to highlight that the choice of clustering algorithm and parameters plays a pivotal role in shaping the summarization outcome. The algorithm aims to divide the sample space of big-vectors into semantically meaningful clusters, ensuring that each cluster comprises sentences with similar semantic content.

While the semantic content is a priority during clustering, the specific similarity metric employed is an important consideration. In this context, we focus on capturing the semantic relationships between sentences without explicitly considering word positional information.

The clustering process serves two key purposes in enhancing the summarization process. Firstly, it promotes diversity in the summary by ensuring the representation of various semantic dimensions in the original document. Secondly, it facilitates the efficient extraction of the most salient sentences from each cluster, contributing to the overall coherence and informativeness of the final summary.

figure a

Summarizing algorithm

Ranking algorithm

Our novel ranking algorithm aims to assign sentence ranks according to different surface-level features. The features that we include in our system are the following:

Sentence length: The length of the sentence is directly proportional to its importance. We use sentence length as a feature for our summarizer.

Sentence Position: the position of the sentence describes its importance. The more important the sentence, the higher its position. Thus, using it as a feature for ranking is important for summary generation. The sentence position score is calculated as follows:

where, \(s^{p}_{i}\) is the sentence position score of \(i\text{th}\) sentence of the input document S , and | S | is the number of sentences in the input document.

Frequency (TF-IDF): TF-IDF is a crucial distinguishing feature in any text summarization system. It locates the most important terms in any document. The ranking algorithm uses this attribute to find the essential words and, thus, sentences in the textual content. The ranking algorithm calculates the TF-IDF score of individual words and then uses the score of individual words to calculate the TF-IDF score of sentences. TF-IDF of the sentence is the sum of the TF-IDF scores of the individual words in the sentence. Formally, \(TF-IDF\) of a sentence \(s_i\) is calculated as:

where the \(t_f(w)\) is a function which gives the \(TF-IDF\) score of a word w .

Verb phrase noun phrase: The most important sentence in the input text has both a noun and a verb phrase. An imperative sentence contains one of these two phrases, and the precedence of either is ranked higher by our ranking algorithm.

Proper nouns: Proper nouns contain direct referrals to the subject. Hence, their existence in a sentence increases the importance of the sentence.

The cosine similarity measure determines the similarity between two documents. In this algorithm, cosine similarity is a feature used in the ranking process. The cosine similarity between two sentences is calculated, and the higher the cosine similarity, the higher their rank. The average cosine similarity of the \(i^{th}\) sentence, \(s^{c}{i}\) , is calculated by taking the sum of cosine similarity between sentence i and all other sentences, divided by the total number of sentences in the document (| S |). The equation is as follows: \(s^{c}{i} = \frac{ \sum _{j=1, j \ne i}^{|S|}c(s_i,s_j)}{|S|}\) , where \(c(s_i,s_j)\) is the cosine similarity between sentence i and j .

The total score of the sentence is then calculated by summing the individual normalized scores of each sentence. Algorithm   1 explains the proposed system algorithmically. Table  1 shows an example of scores calculated for some sentences using the ranking algorithm.

Connecting words are used to establish a relationship between two sentences. Thus, these words connect two consecutive sentences. The words like however, moreover, but, and because are examples of connecting words. The morphology of a sentence with these connecting words is such that the meaning of two sentences is incomplete without its connecting sentence. Thus, if a sentence starts with a connecting word chosen by our algorithm, we make including a connecting sentence essential for the summary, irrespective of its rank. After the execution of the ranking algorithm, we have sentences sorted in some order according to their ranks. The system then selects the best-ranked sentences for inclusion in the summary.

This ATS (Automatic Text Summarization) algorithm aims to reduce redundancy in summaries by identifying and eliminating semantically similar sentences. It does this by clustering semantically similar sentences into a single cluster. If two sentences in the same cluster have almost similar ranking scores, it implies that they convey similar meanings and are included only once. The sentence with a higher sentence position score will be selected in summary. This feature is important in producing summaries of long technical papers where authors repeat sentences differently. Still, despite their high ranks, the algorithm can identify and discard them from the summaries.Figure  2 shows the overall working of the ranking algorithm.

figure 2

Functioning of ranking algorithm

Experimental setup and results

This section describes the datasets and the experiments to evaluate the proposed algorithm and presents the recorded results.

To the best of our knowledge, no proper dataset consisting of articles and their corresponding human-generated summary, also known as reference summary, exists for the biomedical text-domain. Several types of research exist in literature where authors have used biomedical papers from various sources like PubMed and BioMed central and their corresponding abstracts as reference summaries [ 55 , 56 ]. We have also used a similar kind of approach in this paper. The dataset we use is curated by [ 5 ]. In this dataset, 400 random articles are downloaded from BioMed Central. The paper’s text is used as input files, and their corresponding abstracts act as reference files for comparison with the system-generated summaries. The dataset is publicly available for download. Footnote 2 The reason for using this as our dataset is that the experiments can be easily replicated. Table  2 shows the statistical information of the dataset used in this paper.

We evaluate our approach against the following sate-of-art baselines

Graph-based abstractive biomedical text summarization [ 5 ] Givchi et al. [ 5 ] use the graph-based technique to generate extractive summaries of biomedical text documents. They use the concept of frequent itemset mining to identify frequent concept sets that are then represented as graphs, and the shortest path is used to obtain extractive summaries.

Genism [ 57 ] The Gensim summarizer is based on the TextRank algorithm [ 13 ], which is a graph-based ranking algorithm used for summarization. In TextRank , the importance of a sentence is determined recursively based on the global state of the graph. The algorithm works by voting, where a vertex that is linked to other vertices receives votes, and the more votes it receives, the higher its rank.

PyTextRank is a Python implementation of the TextRank algorithm for graph-based summarization, just like Genism, and it produces text summaries using feature vectors. The main difference between the two is that PyTextRank uses spaCy for natural language processing and graph building, while Genism uses its own implementation.

PKUSUMSUM [ 58 ] is a versatile summarization platform written in Java, offering support for multiple languages and incorporating ten different summarization methods. It caters to three primary summarization tasks: Single-document summarization, Multi-document summarization, and Topic-based Multi-document summarization. The platform features a diverse and stable set of summarization techniques, making it a reliable reference system for our evaluation. Noteworthy summarization methods included within the platform are Centroid , LexPageRank , and TextRank . For our evaluation, we have employed the single-document summarization approach, leveraging the LexPageRank method for summarization.

The baseline systems used in this paper are all state-of-art ATS. These systems achieve good results and are used for comparison with several other systems in the literature. These systems employ different kinds of summarization techniques, and thus, we can perform an exhaustive and comprehensive comparative analysis of our system. Furthermore, all these systems are publicly available; thus, experiments described in our paper are easily repeatable.

Summary evaluation

Out of the 400 input documents in the dataset, we randomly chose 30 input documents to evaluate our system and compare the results with the baselines. We generate summaries for all the 30 papers selected using our proposed approach and the baselines. For extensive evaluation, short and lengthy summaries are generated by fixing the length of summaries to 15% and 25% of the original input document, respectively.

For evaluation, we use ROUGE  (Recall-Oriented Understudy for Gisting Evaluation) automatic summarization evaluation toolkit [ 59 ]. ROUGE is publicly available and can be downloaded. Footnote 3 It consists of a set of parameters for assessing automated summaries of texts. ROUGE generates four distinct types of ROUGE metrics, namely ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 values, by comparing summaries at various levels of granularity. ROUGE-1(2) uses unigrams(bigrams) for measuring coherence between the system summaries and the reference summaries; ROUGE-L uses the summary level Longest Common Sub-sequence (LCS) to match the coherence between the reference and system-generated summaries, and ROUGE-SU4 uses both skip-grams and unigrams for the measurements. ROUGE evaluates summaries of our system and the baselines by comparing them against the base truth, i.e., reference summaries, and computes different matrices. These matrices are Precision ( Pr ), Recall ( Rc ) and \(F-score (Fs)\) . The results for all 30 input documents are averaged and presented in the Tables  3 and  4 . Table  3 shows the results for 15% summary length, and Table  4 shows the results for 25% summary length.

The evaluation of the results obtained using bio-semantic models for capturing semantics for the text summarization process proves that the bio-semantic models aid in producing better summarization results. We have used evaluation techniques similar to the one employed in [ 60 ] and [ 61 ]. The results show that the proposed model performs better than the baselines. The baseline models do not employ semantic features, so our results are better than the baselines. Our system achieves better results in terms of precision and F-scores using both short and long text summaries. These results confirm our hypothesis that using bio-semantic models to capture the semantics of bio-medical documents and using these semantic features for text summarization improves the performance of bio-medical text summarization.

figure 3

F-Score for 15% summary length

figure 4

F-Score for 25% summary length

We obtained both long and short text summaries for extensive evaluation. The length of the summaries for the longer version is restricted to 25%, and for more concise summaries, we limited the length to 15% of the original text length. We obtained the macro-average of the precision for 25% summary length; we got 88%, 53%, and 79% and 81%, 46% and 71% for Rouge-1, Rouge-2, and Rouge-L using Bio-BERT and Bio-Word2Vec respectively.

The F-score for Rouge-1, Rouge-2, and Rouge-L obtained for 25% summary length are 43%, 32% and 46% using Bio-BERT and 43%, 32% and 42% using Bio-Word2Vec. These scores are higher compared to the baselines. These scores are statistically significant. We measured the statistical significance using a p-test. The baselines employ different summarization techniques and algorithms and are currently state-of-the-art in text summarization. Thus, comparing them leads us to exhaustively and comprehensively evaluate our system. The evaluation and comparison with the baselines prove the competitive efficacy of our proposed approach. Also, the baselines are open source, and thus, experiments are easily repeatable.

Furthermore, the evaluation and comparison of results obtained using 15% summary length show our system achieves consistently good performance on shorter length summaries. Our precision, recall, and f-score are better than the rest of the baselines.

The competitive superiority of our system is attributed to using semantic features for the text summarization process. Hence, we conclude that using semantics as a feature for summarization improves the system’s performance. The proposed system using the bio-semantic model outperforms the baselines with better f-scores and precision. The lower recall value in some cases is attributed to the fact that our system discards some statically significant sentences that are semantically similar to some sentences already chosen.

Figures  3 and  4 show our system’s comparison with the baselines on 30 randomly selected papers. Figures show the f-scores of these 30 documents obtained for different rouge matrices.

Statistical testing

In addition to presenting our experimental results, we conducted a statistical paired t-test to establish the significance of the outcomes produced by our proposed model. To do this, we started by assuming the opposite scenario, known as the null hypothesis. According to the null hypothesis, our proposed method did not yield statistically significant results when compared to the baseline methods. We carried out the paired t-test on a set of 30 summary samples where the results of our proposed model were compared with those of the baseline algorithms. The results of this statistical analysis clearly indicate that our proposed model consistently selected more semantically-rich sentences compared to the baseline models.

To further ascertain whether the results obtained from our proposed algorithm are statistically meaningful or merely coincidental in comparison to the baseline methods, we conducted a significance test. During this test, we set a significance level of 5%. In simple terms, we ran 30 samples with a 95% confidence level. The initial null hypothesis, which suggested no significant difference between our proposed model and the baseline model, was rejected in favor of an alternative hypothesis. This alternative hypothesis implies that the outcomes produced by our proposed model are indeed significantly different from those of the baseline model. Therefore, our model consistently generates effective results.

Our analysis of F-score values for ROUGE-1, ROUGE-2, and ROUGE-L metrics reveals that the compression rates of 0.05% to 0.5% considered in our experiments contribute to the creation of coherent, less redundant, and diverse summaries, with a strong emphasis on semantic attributes. The p -value, as illustrated in Table  5 , represents the proportion of observations where our model outperformed the baselines.

Conclusion and future work

As stated in the introduction, the rationale behind this research is to use bio-semantic features for the text summarization process to improve the quality of summaries.

This paper presents a novel way of extracting and using these bio-semantic features for text summarization. We use bio-semantic models to capture these features and then use these features to produce high-quality summaries for the bio-medical documents. According to evaluation and comparative analysis, our summarizer’s appropriateness, reliability, and scalability show that our proposed approach performs better than the baselines. The main conclusions of our paper are: (1) Usage of bio-semantic models to capture semantics in the bio-medical domain is an excellent choice for capturing semantics. (2) The semantic features work well in improving the efficacy of the text summarization process. (3) The usage of semantic features improves the text summarising algorithms’ overall accuracy. The primary disadvantage of these models is that they are computationally expensive.

Our future research will deal with (1) Improvement of ranking algorithms by exploring more semantic features to be incorporated into our ranking algorithms, as semantic features tend to improve overall system performance; (2) testing the technique on more than one dataset; (3) Using BLEU (bilingual evaluation understudy) for evaluation along with the ROUGE.

Availability of data and materials

The dataset used and analyzed during this study is included in the paper [ 5 ] and is publicly available and can be downloaded from https://github.com/azadehgivchi/abs_biomed_summary (last accessed on April 19, 2023).

https://github.com/azadehgivchi/abs_biomed_summary/tree/main .

https://github.com/kavgan/ROUGE-2.0 .

Moradi M, Ghadiri N. Different approaches for identifying important concepts in probabilistic biomedical text summarization. Artif Intell Med. 2018;84:101–16.

Article   PubMed   Google Scholar  

Kirmani M, Kaur G, Mohd M. ShortMail: an email summarizer system. Software Impacts. 2023;17:100543.

Article   Google Scholar  

Mohd M, Wani MA, Khanday HA, Mir UB, Nasrullah S, Maqbool Z, et al. Semantic-summarizer: semantics-based text summarizer for English language text. Software Impacts. 2023;18:100582.

Mohd M, Jan R, Shah M. Text document summarization using word embedding. Expert Syst Appl. 2020;143:112958.

Givchi A, Ramezani R, Baraani-Dastjerdi A. Graph-based abstractive biomedical text summarization. J Biomed Inform. 2022;132:104099.

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. 2019; CoRR. arXiv:1901.08746 .

Luhn HP. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):159–65.

Edmundson HP, Wyllys RE. Automatic abstracting and indexing–survey and recommendations. Commun ACM. 1961;4(5):226–34.

Baxendale PB. Machine-made index for technical literature-an experiment. IBM J Res Dev. 1958;2(4):354–61.

Afantenos S, Karkaletsis V, Stamatopoulos P. Summarization from medical documents: a survey. Artif Intell Med. 2005;33(2):157–77.

Bhat IK, Mohd M, Hashmy R. SumItUp: a hybrid single-document text summarizer. In: Pant M, Ray K, Sharma TK, Rawat S, Bandyopadhyay A, editors. Soft computing: theories and applications. Singapore: Springer; 2018. p. 619–34.

Chapter   Google Scholar  

Mohd M, Shah MB, Bhat SA, Kawa UB, Khanday HA, Wani AH, et al. Sumdoc: a unified approach for automatic text summarization. In: Pant M, Deep K, Bansal JC, Nagar A, Das KN, (eds). In: Proceedings of fifth international conference on soft computing for problem solving. Singapore: Springer Singapore; 2016. p. 333–343.

Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing; 2004.

Kirmani M, Kaur G, Mohd M. Analysis of abstractive and extractive summarization methods. Int J Emerg Technol Learn. 2024;19(1).

Shoaib U, Fiaz L, Chakraborty C, Rauf HT. Context-aware Urdu information retrieval system. Trans Asian Low-Resource Lang Inform Process. 2023;22(3):1–19.

Zhao L, Wu L, Huang X. Using query expansion in graph-based approach for query-focused multi-document summarization. Inform Process Manag. 2009;45(1):35–41.

Li W, Li W, Li B, Chen Q, Wu M. The Hong Kong polytechnic university at DUC 2005. In: Proceedings of document understanding conferences. Citeseer; 2005.

Ouyang Y, Li W, Li S, Lu Q. Applying regression models to query-focused multi-document summarization. Inform Process Manag. 2011;47(2):227–37.

Rahman N, Borah B. Improvement of query-based text summarization using word sense disambiguation. Complex Intell Syst. 2020;6(1):75–85.

Sun E, Hou Y, Wang D, Zhang Y, Wang NX. D2S: Document-to-slide generation via query-based text summarization. 2021. arXiv preprint arXiv:2105.03664 .

Wong KF, Wu M, Li W. Extractive summarization using supervised and semi-supervised learning. In: Proceedings of the 22nd international conference on computational linguistics-Volume 1. Association for Computational Linguistics; 2008. p. 985–992.

Neto JL, Freitas AA, Kaestner CA. Automatic text summarization using a machine learning approach. In: Brazilian symposium on artificial intelligence. Springer; 2002. p. 205–215.

Rabiner LR, Juang BH. An introduction to hidden Markov models. IEEE ASSP Magazine. 1986;3(1):4–16.

Conroy JM, O’leary DP. Text summarization via hidden markov models. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM; 2001. p. 406–407.

Dash DP, Kolekar MH, Chakraborty C, Khosravi MR. Review of machine and deep learning techniques in epileptic seizure detection using physiological signals and sentiment analysis. Trans Asian Low-Res Lang Inform Process. 2024;23(1):1–29.

Kågebäck M, Mogren O, Tahmasebi N, Dubhashi D. Extractive summarization using continuous vector space models. In: Proceedings of the 2nd workshop on continuous vector space models and their compositionality (CVSC); 2014. p. 31–39.

Rush AM, Chopra S, Weston J. A neural attention model for abstractive sentence summarization. 2015. arXiv preprint arXiv:1509.00685 .

Chopra S, Auli M, Rush AM. Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of the 2016 conference of the north american chapter of the association for computational linguistics: human language technologies; 2016. p. 93–98.

Nallapati R, Zhou B, Gulcehre C, Xiang B, et al. Abstractive text summarization using sequence-to-sequence rnns and beyond. 2016. arXiv preprint arXiv:1602.06023 .

Gu J, Lu Z, Li H, Li VO. Incorporating copying mechanism in sequence-to-sequence learning. 2016. arXiv preprint arXiv:1603.06393 .

See A, Liu PJ, Manning CD. Get to the point: summarization with pointer-generator networks. 2017. arXiv preprint arXiv:1704.04368 .

Hu B, Chen Q, Zhu F. Lcsts: A large scale chinese short text summarization dataset. 2015. arXiv preprint arXiv:1506.05865 .

Chen Q, Zhu X, Ling Z, Wei S, Jiang H. Distraction-based neural networks for document summarization. 2016. arXiv preprint arXiv:1610.08462 .

Ma S, Sun X, Li W, Li S, Li W, Ren X. Query and output: Generating words by querying distributed word representations for paraphrase generation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). vol. 1; 2018. p. 196–206.

Paulus R, Xiong C, Socher R. A deep reinforced model for abstractive summarization. 2017. arXiv preprint arXiv:1705.04304 .

Mao X, Yang H, Huang S, Liu Y, Li R. Extractive summarization using supervised and unsupervised learning. Expert Syst Appl. 2019;133:173–81.

Amancio DR, Silva FN, da F Costa L. Concentric network symmetry grasps authors’ styles in word adjacency networks. EPL (Europhys Lett). 2015;110(6):68001.

Tohalino JV, Amancio DR. Extractive multi-document summarization using multilayer networks. Physica A. 2018;503:526–39.

Chakraborty C, Dash TK, Panda G, Solanki SS. Phase-based cepstral features for automatic speech emotion recognition of low resource Indian languages. Trans Asian Low-Res Lang Inform Process. 2022. https://doi.org/10.1145/3563944 .

Moradi M, Ghadiri N. Text summarization in the biomedical domain. 2019. arXiv preprint arXiv:1908.02285 .

Gigioli P, Sagar N, Rao A, Voyles J. Domain-aware abstractive text summarization for medical documents. In: 2018 IEEE International conference on bioinformatics and biomedicine (BIBM). IEEE; 2018. p. 2338–2343.

Kieuvongngam V, Tan B, Niu Y. Automatic text summarization of covid-19 medical research articles using bert and gpt-2. 2020. arXiv preprint arXiv:2006.01997 .

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(140):1–67.

Google Scholar  

Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D. The Stanford CoreNLP Natural Language Processing Toolkit. In: Association for computational linguistics (ACL) System Demonstrations; 2014. p. 55–60. Available from: http://www.aclweb.org/anthology/P/P14/P14-5010 .

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40.

Article   CAS   PubMed   Google Scholar  

Mohd M, Hashmy R. Question classification using a knowledge-based semantic kernel. In: Soft computing: theories and applications: proceedings of SoCTA 2016, Volume 1. Springer; 2018. p. 599–606.

Kim D, Lee J, So CH, Jeon H, Jeong M, Choi Y, et al. A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access. 2019;7:73729–40.

Lin C, Miller T, Dligach D, Bethard S, Savova G. A BERT-based universal model for both within-and cross-sentence clinical temporal relation extraction. In: Proceedings of the 2nd clinical natural language processing workshop; 2019. p. 65–71.

Mohd M, Javeed S, Wani MA, Khanday HA, Wani AH, Mir UB, et al. poliWeet-Election prediction tool using tweets. Software Impacts. 2023;17:100542.

Moen S, Ananiadou TSS. Distributional semantics resources for biomedical text processing. Proceedings of LBM. 2013;p. 39–44.

Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 . 2013.

Devlin J, Chang MW, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. 2018. arXiv preprint arXiv:1810.04805 .

Mohd M, Javeed S, Nowsheena Wani MA, Khanday HA. Sentiment analysis using lexico-semantic features. J Inform Sci. 2022. https://doi.org/10.1177/01655515221124016 .

Hartigan JA, Wong MA. Algorithm AS 136: a k-means clustering algorithm. J Roy Stat Soc: Ser C (Appl Stat). 1979;28(1):100–8.

Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, et al. Text summarization in the biomedical domain: a systematic review of recent research. J Biomed Inform. 2014;52:457–67.

Plaza L, Díaz A, Gervás P. A semantic graph-based approach to biomedical summarisation. Artif Intell Med. 2011;53(1):1–14.

Barrios F, López F, Argerich L, Wachenchauzer R. Variations of the similarity function of textrank for automated summarization. arXiv preprint arXiv:1602.03606 . 2016.

Zhang J, Wang T, Wan X. PKUSUMSUM: a Java platform for multilingual document summarization. In: Proceedings of COLING 2016, the 26th International conference on computational linguistics: system demonstrations; 2016. p. 287–291.

Ganesan K. ROUGE 2.0: Updated and improved measures for evaluation of summarization tasks. arXiv preprint arXiv:1803.01937 . 2018.

Dash TK, Chakraborty C, Mahapatra S, Panda G. Mitigating information interruptions by COVID-19 face masks: a three-stage speech enhancement scheme. IEEE Trans Comput Social Syst. 2022. https://doi.org/10.1109/TCSS.2022.3210988 .

Dash TK, Chakraborty C, Mahapatra S, Panda G. Gradient boosting machine and efficient combination of features for speech-based detection of COVID-19. IEEE J Biomed Health Inform. 2022;26(11):5364–71.

Download references

Acknowledgements

Not applicable.

The authors declare that no funding from any organization was received for this research.

Author information

Authors and affiliations.

University Institute of Computing, Chandigarh University, NH-05-Chandigarh-Ludhiana, Mohali, Punjab, India

Mahira Kirmani & Gagandeep Kour

Department of Computer Science, University of Kashmir, South Campus, Anantnag, Jammu and Kashmir, India

Mudasir Mohd, Mohsin Altaf Wani & Abid Hussain Wani

IBM Research, Almaden, 650 Harry Rd, San Jose, CA, 95120, USA

Nasrullah Sheikh

Thndr, The Office 3, One central, DWTC, Dubai, United Arab Emirates

Dawood Ashraf Khan

Department of Computer Science, Government Degree College Bemina, Srinagar, Jammu and Kashmir, India

Zahid Maqbool

You can also search for this author in PubMed   Google Scholar

Contributions

MK Conceptualization, Data curation, Formal analysis, Writing—original draft, Writing—review & editing. GK Conceptualization, Data curation, Formal analysis. MM Conceptualization, Formal analysis, Writing—original draft. NS Formal analysis, Writing, revision. DAK Guidance. ZM Data curation, Formal analysis. MAW Data curation, Formal analysis. AHW Data curation, Formal analysis, revision.

Corresponding author

Correspondence to Mudasir Mohd .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kirmani, M., Kour, G., Mohd, M. et al. Biomedical semantic text summarizer. BMC Bioinformatics 25 , 152 (2024). https://doi.org/10.1186/s12859-024-05712-x

Download citation

Received : 15 March 2023

Accepted : 19 February 2024

Published : 16 April 2024

DOI : https://doi.org/10.1186/s12859-024-05712-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Biomedical text summarizaion
  • Text summarization
  • Text semantics
  • Semantic models

BMC Bioinformatics

ISSN: 1471-2105

how critically evaluate research paper

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • For authors
  • New editors
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • First, do no harm: a call to action to improve the evaluation of harms in clinical exercise research
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Simon Nørskov Thomsen 1 ,
  • http://orcid.org/0000-0002-5565-0997 Alejandro Lucia 2 ,
  • http://orcid.org/0000-0002-5446-5562 Rosalind R Spence 3 , 4 ,
  • Fabiana Braga Benatti 5 ,
  • Michael J Joyner 6 ,
  • Ronan Martin Griffin Berg 1 , 7 ,
  • http://orcid.org/0000-0002-8388-5291 Mathias Ried-Larsen 1 , 8 ,
  • Casper Simonsen 1
  • 1 Centre for Physical Activity Research , Rigshospitalet , Copenhagen , Region Hovedstaden , Denmark
  • 2 Faculty of Sport Sciences , Universidad Europea de Madrid , Madrid , Spain
  • 3 Menzies Health Institute Queensland , Griffith University , Brisbane , Queensland , Australia
  • 4 Improving Health Outcomes for People (ihop) Research Group , Brisbane , Queensland , Australia
  • 5 Faculdade de Ciências Aplicadas , Universidade Estadual de Campinas , Limeira , SP , Brazil
  • 6 Department of Anesthesiology & Perioperative Medicine , Mayo Clinic , Rochester , New York , USA
  • 7 Department of Biomedical Sciences, Faculty of Health and Medical Sciences , University of Copenhagen , Copenhagen , Region Hovedstaden , Denmark
  • 8 Institute of Sports and Clinical Biomechanics , University of Southern Denmark , Odense , Syddanmark , Denmark
  • Correspondence to Dr Casper Simonsen, Centre for Physical Activity Research, Rigshospitalet, Copenhagen, Denmark; casper.simonsen{at}regionh.dk

https://doi.org/10.1136/bjsports-2023-107579

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

  • Physical activity

Exercise as medicine has emerged as an independent discipline in clinical research. Over the last decades, numerous randomised controlled trials (RCTs) have documented the beneficial effects of exercise on various patient-related, disease-related and health-related outcomes in clinical populations. 1 Nevertheless, the evaluation of harms in clinical exercise research remains unsatisfactory ( table 1 ). 2 3 For instance, nearly half of all exercise trials do not report harms, and there is evidence of selective non-reporting of harms. 2 4 5 Furthermore, emerging evidence indicates that exercise might increase the risk of serious adverse events in certain populations. 2 We contend that this is concerning; as for any clinical intervention, the benefits of exercise should be carefully balanced against accurate risk estimates of harms to appropriately inform evidence-based clinical use. With this call to action, we aim to improve the evaluation of harms in clinical exercise research.

  • View inline

Suboptimal practices of harms collection, analysis and interpretation as well as their consequences 2 3 5

Update of exercise trial reporting guidelines

The Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network reporting guidelines have been instrumental in improving research reporting. However, we assert that the exercise-specific reporting guidelines do not adequately encompass several critical aspects relevant to clinical exercise prescription. 6 For example, in the Consensus on Exercise Reporting Template, 6 the reporting of harms focuses solely on adverse events (AEs) occurring during exercise. This is problematic for several reasons. First, it reinforces the common practice of monitoring and reporting AEs in the intervention arm only. This reduces controlled trials to single-arm trials for the assessment of harms thereby precluding comparative analyses. Second, it leads to a more frequent assessment of harms in the exercise groups. This increases the number of AEs reported and can lead to inflated risk estimates of harms in the exercise groups, if not controlled for. Third, it assumes that exercise-related AEs will solely manifest during exercise sessions. Yet, some exercise-related AEs can exhibit delayed occurrence. For example, exercise can induce a pro-thrombotic environment, particularly in exercise naïve individuals, thus increasing the risk of cardiovascular events following the completion of exercise. 7

Finally, common terminology that can be used to describe exercise-related harms is required to improve the consistency of harms reporting within and between trials. While such terminology has been developed to define, categorise and grade disease and treatment-related AEs for several diseases, these may not adequately describe exercise-related AEs. For example, musculoskeletal events (eg, muscle sprains, joint pain) are common and often labelled as ‘injuries’. Yet, there is currently no universal definition of an injury. 9 Drawing insights from the field of sports injury research, where best practice methods are currently debated, 9 could help advance the collection and reporting of exercise-related AEs.

Stricter trial designs

To ensure minimal harms to study participants, medical research conventionally follows a strict order of phases to sequentially establish dose-limiting toxicities, biological activity and preliminary efficacy before commencing definitive testing. However, in clinical exercise research, several large-scale, phase III RCTs have been conducted despite absence of evidence on potential harms or effective doses from earlier phase trials. 10 Adopting a less rigorous testing pathway may be justified in some clinical settings where exercise is widely used in clinical practice and where exercise, based on substantial real-world data, is being ‘Generally Recognised As Safe’. In contrast, we advocate for a more rigorous trial framework in clinical settings with limited or no existing data to gain an understanding of potential harms and interactions with the standard treatment. In such explorative trial stages, adaptive trial designs may be adopted to efficiently identify safe and effective doses. 11

Improved analyses and appropriate interpretations

Another critical challenge lies in the analysis of AEs. AEs are inherently multifaceted, and statistical analysis of harms should carefully consider factors such as severity, recurrency, competing risks, time of recurrence and data type; however, current analysis practice in clinical exercise research does not adequately account for these aspects and remains unsatisfactory ( table 1 ). Statistical methods to analyse harms are increasingly available, 12 and we advocate for the development of strategies to support their application among clinical exercise trialists.

As trials are seldom designed to investigate harms, they may lack statistical power to detect differences in rare but clinically important AEs. Nevertheless, hypothesis-testing of AEs, often accompanied by interpretation of ‘no harms’ when p<0.05, remains common. This practice is susceptible to type II error and, when repeated across trials, it can lead to a flawed consensus that exercise is safe. It is arguably unreasonable to afford trials with adequate power to establish safety, and we advocate for a paradigm shift in the evaluation of harms. Trialists should be cautious about making safety claims and instead contribute with data accumulation by reporting the harms outcomes alongside the main outcomes. Over time, this will enable the conduct of adequately powered meta-analyses. Finally, epidemiological and real-world data are important complementary sources for detecting rare AEs that are unlikely to occur in small trials. For instance, exercise training is standard of care for several clinical populations, providing a valuable yet largely underused opportunity to generate large-scale real-world datasets.

Moving forward

If ‘exercise as medicine’ is to succeed as a global initiative to improve the management of multiple chronic diseases, the evaluation of exercise-related harms must be considerably improved. While individual trialists hold responsibility, change of current practices requires collective commitment from the exercise research community, including reviewers, editors and sports medicine organisations. Only through unified efforts we can improve the evaluation of harms in clinical exercise research and responsibly prescribe exercise as medicine.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Pedersen BK ,
  • Thomsen SN ,
  • Lahart IM ,
  • Thomsen LM , et al
  • McMillan J ,
  • Salline K , et al
  • Koletsi D ,
  • Ioannidis JPA , et al
  • Folkerts A-K ,
  • Gollan R , et al
  • Dionne CE ,
  • Underwood M , et al
  • Thompson PD ,
  • Franklin BA ,
  • Balady GJ , et al
  • Junqueira DR ,
  • Zorzela L ,
  • Golder S , et al
  • Nielsen RO ,
  • Casals M , et al
  • Iyengar NM ,
  • Cornelius VR ,

X @RiedMathias

Contributors SNT, MR-L, RMGB and CS conceived the paper. SNT and CS wrote the first draft. All authors critically revised the manuscript and approved the final version. All authors quality for authorship and all persons qualifying for authorship are listed as authors.

Funding The authors have not received specific funding for the present research. The Centre for Physical Activity Research (CFAS) is supported by TrygFonden (grants ID: 101390, 20045 and 125132).

Competing interests None declared.

Provenance and peer review Commissioned; externally peer reviewed.

Read the full text or download the PDF:

Find Info For

  • Current Students
  • Prospective Students
  • Research and Partnerships
  • Entrepreneurship and Commercialization
  • Office of Engagement

Quick Links

  • HLA Department Site
  • College of Agriculture
  • Purdue Extension

Purdue University

Emmanuel Cooper and Colleagues Publish Sweetpotato Research

It is the policy of the purdue university that all persons have equal opportunity and access to its educational programs, services, activities, and facilities without regard to race, religion, color, sex, age, national origin or ancestry, marital status, parental status, sexual orientation, disability or status as a veteran. purdue is an affirmative action institution. this material may be available in alternative formats., purdue sites purdue sites.

  • Purdue Homepage
  • Employee Portal
  • Purdue Today
  • Campus Directory
  • University Calendar

College of Agriculture Sites College of Agriculture Sites

  • College of Ag Homepage
  • Education Store
  • Extension Events

© 2024 Purdue University | An equal access/equal opportunity university | Copyright Complaints | Maintained by HLA Happenings

If you have trouble accessing this page because of a disability, please contact HLA Happenings at [email protected] | Accessibility Resources

This paper is in the following e-collection/theme issue:

Published on 16.4.2024 in Vol 26 (2024)

Adverse Event Signal Detection Using Patients’ Concerns in Pharmaceutical Care Records: Evaluation of Deep Learning Models

Authors of this article:

Author Orcid Image

Original Paper

  • Satoshi Nishioka 1 , PhD   ; 
  • Satoshi Watabe 1 , BSc   ; 
  • Yuki Yanagisawa 1 , PhD   ; 
  • Kyoko Sayama 1 , MSc   ; 
  • Hayato Kizaki 1 , MSc   ; 
  • Shungo Imai 1 , PhD   ; 
  • Mitsuhiro Someya 2 , BSc   ; 
  • Ryoo Taniguchi 2 , PhD   ; 
  • Shuntaro Yada 3 , PhD   ; 
  • Eiji Aramaki 3 , PhD   ; 
  • Satoko Hori 1 , PhD  

1 Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan

2 Nakajima Pharmacy, Hokkaido, Japan

3 Nara Institute of Science and Technology, Nara, Japan

Corresponding Author:

Satoko Hori, PhD

Division of Drug Informatics

Keio University Faculty of Pharmacy

1-5-30 Shibakoen

Tokyo, 105-8512

Phone: 81 3 5400 2650

Email: [email protected]

Background: Early detection of adverse events and their management are crucial to improving anticancer treatment outcomes, and listening to patients’ subjective opinions (patients’ voices) can make a major contribution to improving safety management. Recent progress in deep learning technologies has enabled various new approaches for the evaluation of safety-related events based on patient-generated text data, but few studies have focused on the improvement of real-time safety monitoring for individual patients. In addition, no study has yet been performed to validate deep learning models for screening patients’ narratives for clinically important adverse event signals that require medical intervention. In our previous work, novel deep learning models have been developed to detect adverse event signals for hand-foot syndrome or adverse events limiting patients’ daily lives from the authored narratives of patients with cancer, aiming ultimately to use them as safety monitoring support tools for individual patients.

Objective: This study was designed to evaluate whether our deep learning models can screen clinically important adverse event signals that require intervention by health care professionals. The applicability of our deep learning models to data on patients’ concerns at pharmacies was also assessed.

Methods: Pharmaceutical care records at community pharmacies were used for the evaluation of our deep learning models. The records followed the SOAP format, consisting of subjective (S), objective (O), assessment (A), and plan (P) columns. Because of the unique combination of patients’ concerns in the S column and the professional records of the pharmacists, this was considered a suitable data for the present purpose. Our deep learning models were applied to the S records of patients with cancer, and the extracted adverse event signals were assessed in relation to medical actions and prescribed drugs.

Results: From 30,784 S records of 2479 patients with at least 1 prescription of anticancer drugs, our deep learning models extracted true adverse event signals with more than 80% accuracy for both hand-foot syndrome (n=152, 91%) and adverse events limiting patients’ daily lives (n=157, 80.1%). The deep learning models were also able to screen adverse event signals that require medical intervention by health care providers. The extracted adverse event signals could reflect the side effects of anticancer drugs used by the patients based on analysis of prescribed anticancer drugs. “Pain or numbness” (n=57, 36.3%), “fever” (n=46, 29.3%), and “nausea” (n=40, 25.5%) were common symptoms out of the true adverse event signals identified by the model for adverse events limiting patients’ daily lives.

Conclusions: Our deep learning models were able to screen clinically important adverse event signals that require intervention for symptoms. It was also confirmed that these deep learning models could be applied to patients’ subjective information recorded in pharmaceutical care records accumulated during pharmacists’ daily work.

Introduction

Increasing numbers of people are expected to develop cancers in our aging society [ 1 - 3 ]. Thus, there is increasing interest in how to detect and manage the side effects of anticancer therapies in order to improve treatment regimens and patients’ quality of life [ 4 - 8 ]. The primary approaches for side effect management are “early signal detection and early intervention” [ 9 - 11 ]. Thus, more efficient approaches for this purpose are needed.

It has been recognized that patients’ voices concerning adverse events represent an important source of information. Several studies have indicated that the number, severity, and time of occurrence of adverse events might be underevaluated by physicians [ 12 - 15 ]. Thus, patient-reported outcomes (PROs) have recently received more attention in the drug evaluation process, reflecting patients’ real voices. Various kinds of PRO measures have been developed and investigated in different disease populations [ 16 , 17 ]. Health care authorities have also encouraged the pharmaceutical industry to use PROs for drug evaluation [ 18 , 19 ], and it is becoming more common to take PRO assessment results into consideration for drug marketing approval [ 20 , 21 ]. Similar trends can be seen in the clinical management of individual patients. Thus, health care professionals have an interest in understanding how to appropriately gather patients’ concerns in order to improve safety management and clinical decisions [ 22 - 24 ].

The applications of deep learning for natural language processing have expanded dramatically in recent years [ 25 ]. Since the development of a high-performance deep learning model in 2018 [ 26 ], attempts to apply cutting-edge deep learning models to various kinds of patient-generated text data for the evaluation of safety events or the analysis of unscalable subjective information from patients have been accelerating [ 27 - 31 ]. Most studies have been conducted to use patients’ narrative data for pharmacovigilance [ 27 , 32 - 35 ], while few have been aimed at improvement of real-time safety monitoring for individual patients. In addition, there have been some studies on adverse event severity grading based on health care records [ 36 - 39 ], but none has yet aimed to extract clinically important adverse event signals that require medical intervention from patients’ narratives. It is important to know whether deep learning models could contribute to the detection of such important adverse event signals from concern texts generated by individual patients.

To address this question, we have developed deep learning models to detect adverse event signals from individual patients with cancer based on patients’ blog articles in online communities, following other types of natural language processing–related previous work [ 40 , 41 ]. One deep learning model focused on the specific symptom of hand-foot syndrome (HFS), which is one of the typical side effects of anticancer treatments [ 42 ], and another focused on a broad range of adverse events that impact patients’ activities of daily living [ 43 ]. We showed that our models can provide good performance scores in targeting adverse event signals. However, the evaluation relied on patients’ narratives from the patients’ blog data used for deep learning model training, so further evaluation is needed to ensure the validity and applicability of the models to other texts regarding patients’ concerns. In addition, the blog data source did not contain medical information, so it was not feasible to assess whether the models could contribute to the extraction of clinically important adverse event signals.

To address these challenges, we focused on pharmaceutical care records written by pharmacists at community pharmacies. The gold standard format for pharmaceutical care records in Japan is the SOAP (subjective, objective, assessment, plan)-based document that follows the “problem-oriented system” concept proposed by Weed [ 44 ] in 1968. Pharmacists track patients’ subjective concerns in the S column, provide objective information or observations in the O column, give their assessment from the pharmacist perspective in the A column, and suggest a plan for moving forward in the P column [ 45 , 46 ]. We considered that SOAP-based pharmaceutical care records could be a unique data source suitable for further evaluation of our deep learning models because they contain both patients’ concerns and professional health care records by pharmacists, including the medication prescription history with time stamps. Therefore, this study was designed to assess whether our deep learning models could extract clinically important adverse event signals that require intervention by medical professionals from these records. We also aimed to evaluate the characteristics of the models when applied to patients’ subjective information noted in the pharmaceutical care records, as there have been only a few studies on the application of deep learning models to patients’ concerns recorded during pharmacists’ daily work [ 47 - 49 ].

Here, we report the results of applying our deep learning models to patients’ concern text data in pharmaceutical care records, focusing on patients receiving anticancer treatment.

Data Source

The original data source was 2,276,494 pharmaceutical care records for 303,179 patients, created from April 2020 to December 2021 at community pharmacies belonging to the Nakajima Pharmacy Group in Japan [ 50 ]. To focus on patients with cancer, records of patients with at least 1 prescription for an anticancer drug were retrieved by sorting individual drug codes (YJ codes) used in Japan (YJ codes starting with 42 refer to anticancer drugs). Records in the S column (ie, S records) were collected from the patients with cancer as the text data of patients’ concerns for deep learning model analysis.

Deep Learning Models

The deep learning models used for this research were those that we constructed based on patients’ narratives in blog articles posted in an online community and that showed the best performance score in each task in our previous work (ie, a Bidirectional Encoder Representations From Transformers [BERT]–based model for HFS signal extraction [ 42 ] and a T5-based model for adverse event signal extraction [ 43 ]). BERT [ 26 ] and T5 [ 51 ] both belong to a type of deep learning model that has recently shown high performance in several studies [ 29 , 52 ]. Hereafter, we refer to the deep learning model for HFS signals as the HFS model, the model for any adverse event signals as All AE (ie, all or any adverse events) model, and the model for adverse event signals limited to patients’ activities of daily living as the AE-L (adverse events limiting patients’ daily lives) model. It was also confirmed that these deep learning models showed similar or higher performance scores for the HFS, All AE, or AE-L identification tasks using 1000 S records randomly extracted from the data source of this study compared to the values obtained in our previous work [ 42 , 43 ] (the performance scores of sentence-level tasks from our previous work are comparable, as the mean number of words in the sentences in the data source in our previous work was 32.7 [SD 33.9], which is close to that of the S records used in this study, 38.8 [SD 29.4]). The method and results of the performance-level check are described in detail in Multimedia Appendix 1 [ 42 , 43 ]. We applied the deep learning models to all text data in this study without any adjustment in setting parameters from those used in constructing them based on patient-authored texts in our previous work [ 42 , 43 ].

Evaluation of Extracted S Records by the Deep Learning Models

In this study, we focused on the evaluation of S records that our deep learning models extracted as HFS or AE-L positive. Each positive S record was assessed as if it was a true adverse event signal, a sort of adverse event symptom, whether or not an intervention was made by health care professionals. We also investigated the kind of anticancer treatment prescription in connection with each adverse event signal identified in S records.

To assess whether an extracted positive S record was a true adverse event signal, we used the same annotation guidelines as in our previous work [ 43 ]. In brief, each S record was treated as an “adverse event signal” if any untoward medical occurrence happened to the patient, regardless of the cause. For the AE-L model only, if a positive S record was confirmed as an adverse event signal, it was further categorized into 1 or more of the following adverse event symptoms: “fatigue,” “nausea,” “vomiting,” “diarrhea,” “constipation,” “appetite loss,” “pain or numbness,” “rash or itchy,” “hair loss,” “menstrual irregularity,” “fever,” “taste disorder,” “dizziness,” “sleep disorder,” “edema,” or “others.”

For the assessment of interventions by health care professionals and anticancer treatment prescriptions, information from the O, A, and P columns and drug prescription history in the data source were investigated for the extracted positive S records. The interventions by health care professionals were categorized in any of the following: “adding symptomatic treatment for the adverse event signal,” “dose reduction or discontinuation of causative anticancer treatment,” “consultation with physician,” “others,” or “no intervention (ie, just following up the adverse event signal).” The actions categorized in “others” were further evaluated individually. For this assessment, we also randomly extracted 200 S records and evaluated them in the same way for comparison with the results from the deep learning model. Prescription history of anticancer treatment was analyzed by primary category of mechanism of action (MoA) with subcategories if applicable (eg, target molecule for kinase inhibitors).

Applicability Check to Other Text Data Including Patients’ Concerns

To check the applicability of our deep learning models to data from a different source, interview transcripts from patients with cancer were also evaluated. The interview transcripts were created by the Database of Individual Patient Experiences-Japan (DIPEx-Japan) [ 53 ]. DIPEx-Japan divides the interview transcripts into sections for each topic, such as “onset of disease” and “treatment,” and posts the processed texts on its website. Processing is conducted by accredited researchers based on qualitative research methods established by the University of Oxford [ 54 ]. In this study, interview text data created from interviews with 52 patients with breast cancer conducted from January 2008 to October 2018 were used to assess whether our deep learning models can extract adverse event signals from this source. In total, 508 interview transcripts were included with the approval of DIPEx-Japan.

Ethical Considerations

This study was conducted with anonymized data following approval by the ethics committee of the Keio University Faculty of Pharmacy (210914-1 and 230217-1) and in accordance with relevant guidelines and regulations and the Declaration of Helsinki. Informed consent specific to this study was waived due to the retrospective observational design of the study with the approval of the ethics committee of the Keio University Faculty of Pharmacy. To respect the will of each individual stakeholder, however, we provided patients and pharmacists of the pharmacy group with an opportunity to refuse the sharing of their pharmaceutical care records by posting an overview of this study at each pharmacy store or on their web page regarding the analysis using pharmaceutical care records. Interview transcripts from DIPEx-Japan were provided through a data sharing arrangement for using narrative data for research and education. Consent for interview transcription and its sharing from DIPEx-Japan was obtained from the participants when the interviews were recorded.

From the original data source of 2,180,902 pharmaceutical care records for 291,150 patients, S records written by pharmacists for patients with a history of at least 1 prescription of an anticancer drug were extracted. This yielded 30,784 S records for 2479 patients with cancer ( Table 1 ). The mean and median number of words in the S records were 38.8 (SD 29.4) and 32 (IQR 20-50), respectively. We applied our deep learning models, HFS, All AE, and AE-L, to these 30,784 S records for the evaluation of the deep learning models for adverse event signal detection.

For interview transcripts created by DIPEx-Japan, the mean and median number of words were 428.9 (SD 160.9) and 416 (IQR 308-526), respectively, in the 508 transcripts for 52 patients with breast cancer.

a SOAP: subjective, objective, assessment, plan.

b S: subjective.

Application of the HFS Model

First, we applied the HFS model to the S records for patients with cancer. The BERT-based model was used for this research as it showed the best performance score in our previous work [ 42 ].

S Records Extracted as HFS Positive

The S records extracted as HFS positive by the HFS model ( Table 2 ) amounted to 167 (0.5%) records for 119 (4.8%) patients. A majority of the patients had 1 HFS-positive record in their S records (n=91, 76.5%), while 2 patients had as many as 6 (1.7%) HFS-positive records. When we examined whether the extracted S records were true adverse event signals or not, 152 records were confirmed to be adverse event signals, while the other 15 records were false-positives. All the false-positive S records were descriptions about the absence of symptoms or confirmation of improving condition (eg, “no diarrhea, mouth ulcers, or limb pain so far” or “the skin on the soles of my feet has calmed down a lot with this ointment”). Some examples of S records that were predicted as HFS positive by the model are shown in Table S1 in Multimedia Appendix 2 .

The same examination was conducted with interview transcripts from DIPEx-Japan. Only 1 (0.2%) transcript was extracted as HFS positive by the HFS model, and it was a true adverse event signal (100%). The actual transcript extracted as HFS positive is shown in Table S2 in Multimedia Appendix 2 .

a S: subjective.

b HFS: hand-foot syndrome.

c All false-positive S records were denial of symptoms or confirmation of improving condition.

Interventions by Health Care Professionals

The 167 S records extracted as HFS positive as well as 200 randomly selected records were checked for interventions by health care professionals ( Figure 1 ). The proportion showing any action by health care professionals was 64.1% for 167 HFS-positive S records compared to 13% for the 200 random S records. Among the actions taken for HFS positives, “adding symptomatic treatment” was the most common, accounting for around half (n=79, 47.3%), followed by “other” (n=18, 10.8%). Most “other” actions were educational guidance from pharmacists, such as instructions on moisturizing, nail care, or application of ointment and advice on daily living (eg, “avoid tight socks”).

how critically evaluate research paper

Anticancer Drugs Prescribed

The types of anticancer drugs prescribed for HFS-positive patients are summarized based on the prescription histories in Table 3 . For the 152 adverse event signals identified by the HFS model in the previous section, the most common MoA class of anticancer drugs used for the patients was antimetabolite (n=62, 40.8%), specifically fluoropyrimidines (n=59, 38.8%). Kinase inhibitors were next (n=49, 32.2%), with epidermal growth factor receptor (EGFR) inhibitors and multikinase inhibitors as major subgroups (n=28, 18.4% and n=14, 9.2%, respectively). The third and fourth most common MoAs were aromatase inhibitors (n=24, 15.8%) and antiandrogen or estrogen drugs (n=7, 4.6% each) for hormone therapy.

a EGFR: epidermal growth factor receptor.

b VEGF: vascular endothelial growth factor.

c HER2: human epidermal growth factor receptor-2.

d CDK4/6: cyclin-dependent kinase 4/6.

Application of the All AE or AE-L model

The All AE and AE-L models were also applied to the same S records for patients with cancer. The T5-based model was used for this research as it gave the best performance score in our previous work [ 43 ].

S Records Extracted as All AE or AE-L positive

The numbers of S records extracted as positive were 7604 (24.7%) for 1797 patients and 196 (0.6%) for 142 patients for All AE and AE-L, respectively. In the case of All AE, patients tended to have multiple adverse event positives in their S records (n=1315, 73.2% of patients had at least 2 positives). In the case of AE-L, most patients had only 1 AE-L positive (n=104, 73.2%), and the largest number of AE-L positives for 1 patient was 4 (2.8%; Table 4 ).

We focused on AE-L evaluation due to its greater importance from a medical viewpoint and lower workload for manual assessment, considering the number of positive S records. Of the 197 AE-L–positive S records, it was confirmed that 157 (80.1%) records accurately extracted adverse event signals, while 39 (19.9%) records were false-positives that did not include any adverse event signals ( Table 4 ). The contents of the 39 false-positives were all descriptions about the absence of symptoms or confirmation of improving condition, showing a similar tendency to the HFS false-positives (eg, “The diarrhea has calmed down so far. Symptoms in hands and feet are currently fine” and “No symptoms for the following: upset in stomach, diarrhea, nausea, abdominal pain, abdominal pain or stomach cramps, constipation”). Examples of S records that were predicted as AE-L positive are shown in Table S3 in Multimedia Appendix 2 .

The deep learning models were also applied to interview transcripts from DIPEx-Japan in the same manner. The deep learning models identified 84 (16.5%) and 18 (3.5%) transcripts as All AE or AE-L positive, respectively. Of the 84 All AE–positive transcripts, 73 (86.9%) were true adverse event signals. The false-positives of All AE (n=11, 13.1%) were categorized into any of the following 3 types: explanations about the disease or its prognosis, stories when their cancer was discovered, or emotional changes that did not include clear adverse event mentions. With regard to AE-L, all the 18 (100%) positives were true adverse event signals (Table S4 in Multimedia Appendix 2 ). Examples of actual transcripts extracted as All AE or AE-L positive are shown in Table S5 in Multimedia Appendix 2 .

b All AE: all (or any of) adverse event.

c AE-L: adverse events limiting patients’ daily lives.

d All false-positive S records were denial of symptoms or confirmation of improving condition.

Whether or not interventions were made by health care professionals was investigated for the 196 AE-L–positive S records. As in the HFS model evaluation, data from 200 randomly selected S records were used for comparison ( Figure 2 ). In total, 91 (46.4%) records in the 196 AE-L–positive records were accompanied by an intervention, while the corresponding figure in the 200 random records was 26 (13%) records. The most common action in response to adverse event signals identified by the AE-L model was “adding symptomatic treatment” (n=71, 36.2%), followed by “other” (n=11, 5.6%). “Other” included educational guidance from pharmacists, inquiries from pharmacists to physicians, or recommendations for patients to visit a doctor.

how critically evaluate research paper

The types of anticancer drugs prescribed for patients with adverse event signals identified by the AE-L model were summarized based on the prescription histories ( Table 5 ). In connection with the 157 adverse event signals, the most common MoA of the prescribed anticancer drug was antimetabolite (n=62, 39.5%) and fluoropyrimidine (n=53, 33.8%), which accounted for the majority. Kinase inhibitor (n=31, 19.7%) was the next largest category with multikinase inhibitor (n=14, 8.9%) as the major subgroup. These were followed by antiandrogen (n=27, 17.2%), antiestrogen (n=10, 6.4%), and aromatase inhibitor (n=10, 6.4%) for hormone therapy.

b JAK: janus kinase.

c VEGF: vascular endothelial growth factor.

d BTK: bruton tyrosine kinase.

e FLT3: FMS-like tyrosine kinase-3.

f PARP: poly-ADP ribose polymerase.

g CDK4/6: cyclin-dependent kinase 4/6.

h CD20: cluster of differentiation 20.

Adverse Event Symptoms

For the 157 adverse event signals identified by the AE-L model, the symptoms were categorized according to the predefined guideline in our previous work [ 43 ]. “Pain or numbness” (n=57, 36.3%) accounted for the largest proportion followed by “fever” (n=46, 29.3%) and “nausea” (n=40, 25.5%; Table 6 ). Symptoms classified as “others” included chills, tinnitus, running tears, dry or peeling skin, and frequent urination. When comparing the proportion of the symptoms associated with or without interventions by health care professionals, a trend toward a greater proportion of interventions was observed in “fever,” “nausea,” “diarrhea,” “constipation,” “vomiting,” and “edema” ( Figure 3 , black boxes). On the other hand, a smaller proportion was observed in “pain or numbness,” “fatigue,” “appetite loss,” “rash or itchy,” “taste disorder,” and “dizziness” ( Figure 3 , gray boxes).

how critically evaluate research paper

This study was designed to evaluate our deep learning models, previously constructed based on patient-authored texts posted in an online community, by applying them to pharmaceutical care records that contain both patients’ subjective concerns and medical information created by pharmacists. Based on the results, we discuss whether these deep learning models can extract clinically important adverse event signals that require medical intervention, and what characteristics they show when applied to data on patients’ concerns in pharmaceutical care records.

Performance for Adverse Event Signal Extraction

The first requirement for the deep learning models is to extract adverse event signals from patients’ narratives precisely. In this study, we evaluated the proportion of true adverse event signals in positive S records extracted by the HFS or AE-L model. True adverse event signals amounted to 152 (91%) and 157 (80.1%) for the HFS and AE-L models, respectively ( Tables 2 and 4 ). Given that the proportion of true adverse event signals in 200 randomly extracted S records without deep learning models was 54 (27%; categories other than “no adverse event” in Figures 1 and 2 ), the HFS and AE-L models were able to concentrate S records with adverse event mentions. Although 15 (9%) for the HFS model and 39 (19.9%) for the AE-L model were false-positives, it was confirmed all of the false-positive records described a lack of symptoms or confirmation of improving condition. We considered that such false-positives are due to the unique feature of pharmaceutical care records, where pharmacists might proactively interview patients about potential side effects of their medications. As the data set of blog articles we used to construct the deep learning models included few such cases (especially comments on lack of symptoms), our models seemed unable to exclude them correctly. Even though we confirmed that the proportion of true “adverse event” signals extracted from the S records by the HFS or AE-L model was more than 80%, the performance scores to extract true “HFS” or “AE-L” signals were not so high based on the performance check using 1000 randomly extracted S records ( F 1 -scores were 0.50 and 0.22 for true HFS and AE-L signals, respectively; Table S1 in Multimedia Appendix 1 ). It is considered that the performance to extract true HFS and AE-L signals was relatively low due to the short length of texts in the S records, providing less context to judge the impact on patients’ daily lives, especially for the AE-L model (the mean word number of the S records was 38.8 [SD 29.4; Table 1 ], similar to the sentence-level tasks in our previous work [ 42 , 43 ]). However, we consider a true adverse event signal proportion of more than 80% in this study represents a promising outcome, as this is the first attempt to apply our deep learning models to a different source of patients’ concern data, and the extracted positive cases would be worthy of evaluation by a medical professional, as the potential adverse events could be caused by drugs taken by the patients.

When the deep learning models were applied to DIPEx-Japan interview transcripts, including patients’ concerns, the proportion of true adverse event signals was also more than 80% (for All AE: n=73, 86.9% and for HFS and AE-L: n=18, 100%). The difference in the results between pharmaceutical care S records and DIPEx-Japan interview transcripts was the features of false-positives, descriptions about lack of symptoms or confirmation of improving condition in S records versus explanations about disease or its prognosis, stories about when their cancer was discovered, or emotional changes in interview transcripts. This is considered due to the difference in the nature of the data source; the pharmaceutical care records were generated in a real-time manner by pharmacists through their daily work, where adverse event signals are proactively monitored, while the interview transcripts were purely based on patients’ retrospective memories. Our deep learning models were able to extract true adverse event signals with an accuracy of more than 80% from both text data sources in spite of the difference in their nature. When looking at future implementation of the deep learning models in society (discussed in the Potential for Deep Learning Model Implementation in Society section), it may be desirable to further adjust deep learning models to reduce false-positives depending upon the features of the data source.

Identification of Important Adverse Events Requiring Medical Intervention

To assess whether the models could extract clinically important adverse event signals, we investigated interventions by health care professionals connected with the adverse event signals that are identified by our deep learning models. In the 200 randomly extracted S records, only 26 (13%) consisted of adverse event signals, leading to any intervention by health care professionals. On the other hand, the proportion of signals associated with interventions was increased to 107 (64.1%) and 91 (46.4%) in the S records extracted as positive by the HFS and AE-L models, respectively ( Figures 1 and 2 ). These results suggest that both deep learning models can screen clinically important adverse event signals that require intervention from health care professionals. The performance level in screening adverse event signals requiring medical intervention was higher in the HFS model than in the AE-L model (n=107, 64.1% vs n=91, 46.4%; Figures 1 and 2 ). Since the target events were specific and adverse event signals of HFS were narrowly defined, which is one of the typical side effects of some anticancer drugs, we consider that health care providers paid special attention to HFS-related signals and took action proactively. In both deep learning models, similar trends were observed in actions taken by health care professionals in response to extracted adverse event signals; common actions were attempts to manage adverse event symptoms by symptomatic treatment or other mild interventions, including educational guidance from pharmacists or recommendations for patients to visit a doctor. More direct interventions focused on the causative drugs (ie, “dose reduction or discontinuation of anticancer treatment”) amounted to less than 5%; 7 (4.2%) for the HFS model and 6 (3.1%) for the AE-L model ( Figures 1 and 2 ). Thus, it appears that our deep learning models can contribute to screening mild to moderate adverse event signals that require preventive actions such as symptomatic treatments or professional advice from health care providers, especially for patients with less sensitivity to adverse event signals or who have few opportunities to visit clinics and pharmacies.

Ability to Catch Real Side Effect Signals of Anticancer Drugs

Based on the drug prescription history associated with S records extracted as HFS or AE-L positive, the type and duration of anticancer drugs taken by patients experiencing the adverse event signals were investigated. For the HFS model, the most common MoA of anticancer drug was antimetabolite (fluoropyrimidine: n=59, 38.8%), followed by kinase inhibitors (n=49, 32.2%, of which EGFR inhibitors and multikinase inhibitors accounted for n=28, 18.4% and n=14, 9.2%, respectively) and aromatase inhibitors (n=24, 15.8%; Table 3 ). It is known that fluoropyrimidine and multikinase inhibitors are typical HFS-inducing drugs [ 55 - 58 ], suggesting that the HFS model accurately extracted HFS side effect signals derived from these drugs. Note that symptoms such as acneiform rash, xerosis, eczema, paronychia, changes in the nails, arthralgia, or stiffness of limb joints, which are common side effects of EGFR inhibitors or aromatase inhibitors [ 59 , 60 ], might be extracted as closely related expressions to those of HFS signals. When looking at the MoA of anticancer drugs for patients with adverse event signals identified by the AE-L model, antimetabolite (fluoropyrimidine) was the most common one (n=53, 33.8%), as in the case of those identified by the HFS model, followed by kinase inhibitors (n=31, 19.7%) and antiandrogens (n=27, 17.2%; Table 5 ). Since the AE-L model targets a broad range of adverse event symptoms, it is difficult to rationalize the relationship between the adverse event signals and types of anticancer drugs. However, the type of anticancer drugs would presumably closely correspond to the standard treatments of the cancer types of the patients. Based on the prescribed anticancer drugs, we can infer that a large percentage of the patients had breast or lung cancer, indicating that our study results were based on data from such a population. Thus, a possible direction for the expansion of this research would be adjusting the deep learning models by additional training with expressions for typical side effects associated with standard treatments of other cancer types. To interpret these results correctly, it should be noted that we could not investigate anticancer treatments conducted outside of the pharmacies (eg, the time-course relationship with intravenously administered drugs would be missed, as the administration will be done at hospitals). To further evaluate how useful this model is in side effect signal monitoring for patients with cancer, comprehensive medical information for the eligible patients would be required.

Suitability of the Deep Learning Models for Specific Adverse Event Symptoms

Among the adverse event signals identified by the AE-L model, the type of symptom was categorized according to a predefined annotation guideline that we previously developed [ 43 ]. The most frequently recorded adverse event signals identified by the AE-L model were “pain or numbness” (n=57, 36.3%), “fever” (n=46, 29.3%), and “nausea” (n=40, 25.5%; Table 6 ). Since the pharmaceutical care records had information about interventions by health care professionals, the frequency of the presence or absence of the interventions for each symptom was examined. A trend toward a greater proportion of interventions was observed in “fever,” “nausea,” “diarrhea,” “constipation,” “vomiting,” and “edema” ( Figure 3 , black boxes). There seem to be 2 possible explanations for this: these symptoms are of high importance and require early medical intervention or effective symptomatic treatments are available for these symptoms in clinical practice so that medical intervention is an easy option. On the other hand, a trend for a smaller proportion of adverse event signals to result in interventions was observed for “pain or numbness,” “fatigue,” “appetite loss,” “rash or itchy,” “taste disorder,” and “dizziness” ( Figure 3 , gray boxes). The reason for this may be the lack of effective symptomatic treatments or the difficulty of judging whether the severity of these symptoms justifies medical intervention by health care providers. In either case, there may be room for improvement in the quality of medical care for these symptoms. We expect that our research will contribute to a quality improvement in safety monitoring in clinical practice by supporting adverse event signal detection in a cost-effective manner.

Potential for Deep Learning Model Implementation in Society

Although we evaluated our deep learning models using pharmaceutical care records in this study, the main target of future implementation of our deep learning models in society would be narrative texts that patients directly write to record their daily experiences. For example, the application of these deep learning models to electronic media where patients record their daily experiences in their lives with disease (eg, health care–related e-communities and disease diary applications) could enable information about adverse event signal onset that patients experience to be provided to health care providers in a timely manner. Adverse event signals can automatically be identified and shared with health care providers based on the concern texts that patients post to any platform. This system will have the advantage that health care providers can efficiently grasp safety-related events that patients experience outside of clinic visits so that they can conduct more focused or personalized interactions with patients at their clinic visits. However, consideration should be given to avoid an excessive burden on health care providers. For instance, limiting the sharing of adverse event signals to those of high severity or summarizing adverse event signals over a week rather than sharing each one in a real-time manner may be reasonable approaches for medical staff. We also need to think about how to encourage patients to record their daily experiences using electronic tools. Not only technical progress and support but also the establishment of an ecosystem where both patients and medical staff can feel benefit will be required. Prospective studies with deep learning models to follow up patients in the long term and evaluate outcomes will be needed. We primarily looked at patient-authored texts as targets of implementation, but our deep learning models may also be worth using medical data including patients’ subjective concerns, such as pharmaceutical care S records. As this study confirmed that our deep learning models are applicable to patients’ concern texts tracked by pharmacists, it should be possible to use them to analyze other “patient voice-like” medical text data that have not been actively investigated so far.

Limitations

First, the major limitation of this study was that we were not able to collect complete medical information of the patients. Although we designed this study to analyze patients’ concerns extracted by the deep learning models and their relationship with medical information contained in the pharmaceutical care records, some information could not be tracked (eg, missing history of medical interventions or anticancer treatment at hospitals as well as diagnosis of patients’ primary cancers). Second, there might be a data creation bias in S records for patients’ concerns by pharmacists. For example, symptoms that have little impact on intervention decisions might less likely be recorded by them. It should be also noted that the characteristics of S records may not be consistent at different community pharmacies.

Conclusions

Our deep learning models were able to screen clinically important adverse event signals that require intervention by health care professionals from patients’ concerns in pharmaceutical care records. Thus, these models have the potential to support real-time adverse event monitoring of individual patients taking anticancer treatments in an efficient manner. We also confirmed that these deep learning models constructed based on patient-authored texts could be applied to patients’ subjective information recorded by pharmacists through their daily work. Further research may help to expand the applicability of the deep learning models for implementation in society or for analysis of data on patients’ concerns accumulated in professional records at pharmacies or hospitals.

Acknowledgments

This work was supported by Japan Society for the Promotion of Science, Grants-in-Aid for Scientific Research (KAKENHI; grant 21H03170) and Japan Science and Technology Agency, Core Research for Evolutional Science and Technology (CREST; grant JPMJCR22N1), Japan. Mr Yuki Yokokawa and Ms Sakura Yokoyama at our laboratory advised SN about the structure of pharmaceutical care records. This study would not have been feasible without the high quality of pharmaceutical care records created by many individual pharmacists at Nakajima Pharmacy Group through their daily work.

Data Availability

The data sets generated and analyzed during this study are available from the corresponding author on reasonable request.

Authors' Contributions

SN and SH designed the study. SN retrieved the subjective records of patients with cancer from the data source for the application of deep learning models and organized other data for subsequent evaluations. SN ran the deep learning models with the support of SW. SN, YY, and KS checked the adverse event signals for each subjective record that was extracted as positive by the models for hand-foot syndrome or adverse events limiting patients’ daily lives and evaluated the adverse event signal symptoms, details of interventions taken by health care professionals, and types of anticancer drugs prescribed for patients based on available data from the data source. HK and SI advised on the study concept and process. MS and RT provided pharmaceutical records at their community pharmacies along with advice on how to use and interpret them. SY and EA supervised the natural language processing research as specialists. SH supervised the study overall. SN drafted and finalized the paper. All authors reviewed and approved the paper.

Conflicts of Interest

SN is an employee of Daiichi Sankyo Co, Ltd. All other authors declare no conflicts of interest.

Performance evaluation of deep learning models.

Examples of S records and sample interview transcripts.

  • Global cancer observatory: cancer over time. World Health Organization. URL: https://gco.iarc.fr/overtime/en [accessed 2023-07-02]
  • Mattiuzzi C, Lippi G. Current cancer epidemiology. J Epidemiol Glob Health. 2019;9(4):217-222. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Montazeri F, Komaki H, Mohebi F, Mohajer B, Mansournia MA, Shahraz S, et al. Editorial: disparities in cancer prevention and epidemiology. Front Oncol. 2022;12:872051. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lasala R, Santoleri F. Association between adherence to oral therapies in cancer patients and clinical outcome: a systematic review of the literature. Br J Clin Pharmacol. 2022;88(5):1999-2018. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Pudkasam S, Polman R, Pitcher M, Fisher M, Chinlumprasert N, Stojanovska L, et al. Physical activity and breast cancer survivors: importance of adherence, motivational interviewing and psychological health. Maturitas. 2018;116:66-72. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Markman M. Chemotherapy-associated neurotoxicity: an important side effect-impacting on quality, rather than quantity, of life. J Cancer Res Clin Oncol. 1996;122(9):511-512. [ CrossRef ] [ Medline ]
  • Jitender S, Mahajan R, Rathore V, Choudhary R. Quality of life of cancer patients. J Exp Ther Oncol. 2018;12(3):217-221. [ Medline ]
  • Di Nardo P, Lisanti C, Garutti M, Buriolla S, Alberti M, Mazzeo R, et al. Chemotherapy in patients with early breast cancer: clinical overview and management of long-term side effects. Expert Opin Drug Saf. 2022;21(11):1341-1355. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Cuomo RE. Improving cancer patient outcomes and cost-effectiveness: a Markov simulation of improved early detection, side effect management, and palliative care. Cancer Invest. 2023;41(10):858-862. [ CrossRef ] [ Medline ]
  • Pulito C, Cristaudo A, La Porta C, Zapperi S, Blandino G, Morrone A, et al. Oral mucositis: the hidden side of cancer therapy. J Exp Clin Cancer Res. 2020;39(1):210. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Bartal A, Mátrai Z, Szûcs A, Liszkay G. Main treatment and preventive measures for hand-foot syndrome, a dermatologic side effect of cancer therapy. Magy Onkol. 2011;55(2):91-98. [ FREE Full text ] [ Medline ]
  • Basch E, Jia X, Heller G, Barz A, Sit L, Fruscione M, et al. Adverse symptom event reporting by patients vs clinicians: relationships with clinical outcomes. J Natl Cancer Inst. 2009;101(23):1624-1632. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Basch E. The missing voice of patients in drug-safety reporting. N Engl J Med. 2010;362(10):865-869. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Fromme EK, Eilers KM, Mori M, Hsieh YC, Beer TM. How accurate is clinician reporting of chemotherapy adverse effects? A comparison with patient-reported symptoms from the Quality-of-Life Questionnaire C30. J Clin Oncol. 2004;22(17):3485-3490. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Liu L, Suo T, Shen Y, Geng C, Song Z, Liu F, et al. Clinicians versus patients subjective adverse events assessment: based on patient-reported outcomes version of the common terminology criteria for adverse events (PRO-CTCAE). Qual Life Res. 2020;29(11):3009-3015. [ CrossRef ] [ Medline ]
  • Churruca K, Pomare C, Ellis LA, Long JC, Henderson SB, Murphy LED, et al. Patient-reported outcome measures (PROMs): a review of generic and condition-specific measures and a discussion of trends and issues. Health Expect. 2021;24(4):1015-1024. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Pérez-Alfonso KE, Sánchez-Martínez V. Electronic patient-reported outcome measures evaluating cancer symptoms: a systematic review. Semin Oncol Nurs. 2021;37(2):151145. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Patient-reported outcome measures: use in medical product development to support labeling claims. U.S. Food & Drug Administration. 2009. URL: https:/​/www.​fda.gov/​regulatory-information/​search-fda-guidance-documents/​patient-reported-outcome-measures-use-medical-product-development-support-labeling-claims [accessed 2023-11-26]
  • Appendix 2 to the guideline on the evaluation of anticancer medicinal products in man—the use of patient-reported outcome (PRO) measures in oncology studies—scientific guideline. European Medicines Agency. 2016. URL: https:/​/www.​ema.europa.eu/​en/​appendix-2-guideline-evaluation-anticancer-medicinal-products-man-use-patient-reported-outcome-pro [accessed 2023-11-26]
  • Weber SC. The evolution and use of patient-reported outcomes in regulatory decision making. RF Q. 2023;3(1):4-9. [ FREE Full text ]
  • Teixeira MM, Borges FC, Ferreira PS, Rocha J, Sepodes B, Torre C. A review of patient-reported outcomes used for regulatory approval of oncology medicinal products in the European Union between 2017 and 2020. Front Med (Lausanne). 2022;9:968272. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Newell S, Jordan Z. The patient experience of patient-centered communication with nurses in the hospital setting: a qualitative systematic review protocol. JBI Database System Rev Implement Rep. 2015;13(1):76-87. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yagasaki K, Takahashi H, Ouchi T, Yamagami J, Hamamoto Y, Amagai M, et al. Patient voice on management of facial dermatological adverse events with targeted therapies: a qualitative study. J Patient Rep Outcomes. 2019;3(1):27. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Giardina TD, Korukonda S, Shahid U, Vaghani V, Upadhyay DK, Burke GF, et al. Use of patient complaints to identify diagnosis-related safety concerns: a mixed-method evaluation. BMJ Qual Saf. 2021;30(12):996-1001. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lauriola I, Lavelli A, Aiolli F. An introduction to deep learning in natural language processing: models, techniques, and tools. Neurocomputing. 2022;470:443-456. [ CrossRef ]
  • Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. 2019. Presented at: 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; June 2-7, 2019;4171-4186; Minneapolis, MN, USA.
  • Dreisbach C, Koleck TA, Bourne PE, Bakken S. A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data. Int J Med Inform. 2019;125:37-46. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sim JA, Huang X, Horan MR, Stewart CM, Robison LL, Hudson MM, et al. Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: a systematic review. Artif Intell Med. 2023;146:102701. [ CrossRef ] [ Medline ]
  • Weissenbacher D, Banda JM, Davydova V, Estrada-Zavala D, Gascó Sánchez L, Ge Y, et al. Overview of the seventh social media mining for health applications (#SMM4H) shared tasks at COLING 2022. 2022. Presented at: Proceedings of the Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task; October 12-17, 2022;221-241; Gyeongju, Republic of Korea. URL: https://aclanthology.org/2022.smm4h-1.54/
  • Matsuda S, Ohtomo T, Okuyama M, Miyake H, Aoki K. Estimating patient satisfaction through a language processing model: model development and evaluation. JMIR Form Res. 2023;7(1):e48534. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yu D, Vydiswaran VGV. An assessment of mentions of adverse drug events on social media with natural language processing: model development and analysis. JMIR Med Inform. 2022;10(9):e38140. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Liu X, Chen H. A research framework for pharmacovigilance in health social media: identification and evaluation of patient adverse drug event reports. J Biomed Inform. 2015;58:268-279. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nikfarjam A, Sarker A, O'Connor K, Ginn R, Gonzalez G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc. 2015;22(3):671-681. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kakalou C, Dimitsaki S, Dimitriadis VK, Natsiavas P. Exploiting social media for active pharmacovigilance: the PVClinical social media workspace. Stud Health Technol Inform. 2022;290:739-743. [ CrossRef ] [ Medline ]
  • Bousquet C, Dahamna B, Guillemin-Lanne S, Darmoni SJ, Faviez C, Huot C, et al. The adverse drug reactions from patient reports in social media project: five major challenges to overcome to operationalize analysis and efficiently support pharmacovigilance process. JMIR Res Protoc. 2017;6(9):e179. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Young IJB, Luz S, Lone N. A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis. Int J Med Inform. 2019;132:103971. [ CrossRef ] [ Medline ]
  • Jacobsson R, Bergvall T, Sandberg L, Ellenius J. Extraction of adverse event severity information from clinical narratives using natural language processing. Pharmacoepidemiol Drug Saf. 2017;26(S2):37. [ FREE Full text ]
  • Liang C, Gong Y. Predicting harm scores from patient safety event reports. Stud Health Technol Inform. 2017;245:1075-1079. [ CrossRef ] [ Medline ]
  • Jiang G, Wang L, Liu H, Solbrig HR, Chute CG. Building a knowledge base of severe adverse drug events based on AERS reporting data using semantic web technologies. Stud Health Technol Inform. 2013;192(1-2):496-500. [ CrossRef ] [ Medline ]
  • Usui M, Aramaki E, Iwao T, Wakamiya S, Sakamoto T, Mochizuki M. Extraction and standardization of patient complaints from electronic medication histories for pharmacovigilance: natural language processing analysis in Japanese. JMIR Med Inform. 2018;6(3):e11021. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Watanabe T, Yada S, Aramaki E, Yajima H, Kizaki H, Hori S. Extracting multiple worries from breast cancer patient blogs using multilabel classification with the natural language processing model bidirectional encoder representations from transformers: infodemiology study of blogs. JMIR Cancer. 2022;8(2):e37840. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nishioka S, Watanabe T, Asano M, Yamamoto T, Kawakami K, Yada S, et al. Identification of hand-foot syndrome from cancer patients' blog posts: BERT-based deep-learning approach to detect potential adverse drug reaction symptoms. PLoS One. 2022;17(5):e0267901. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nishioka S, Asano M, Yada S, Aramaki E, Yajima H, Yanagisawa Y, et al. Adverse event signal extraction from cancer patients' narratives focusing on impact on their daily-life activities. Sci Rep. 2023;13(1):15516. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Weed LL. Medical records that guide and teach. N Engl J Med. 1968;278(11):593-600. [ CrossRef ] [ Medline ]
  • Podder V, Lew V, Ghassemzadeh S. SOAP Notes. Treasure Island, FL. StatPearls Publishing; 2023.
  • Shenavar Masooleh I, Ramezanzadeh E, Yaseri M, Sahere Mortazavi Khatibani S, Sadat Fayazi H, Ali Balou H, et al. The effectiveness of training on daily progress note writing by medical interns. J Adv Med Educ Prof. 2021;9(3):168-175. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Grothen AE, Tennant B, Wang C, Torres A, Sheppard BB, Abastillas G, et al. Application of artificial intelligence methods to pharmacy data for cancer surveillance and epidemiology research: a systematic review. JCO Clin Cancer Inform. 2020;4:1051-1058. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ohno Y, Kato R, Ishikawa H, Nishiyama T, Isawa M, Mochizuki M, et al. Using the natural language processing system MedNER-J to analyze pharmaceutical care records. medRxiv. Preprint posted online on October 2, 2023. [ CrossRef ]
  • Ranchon F, Chanoine S, Lambert-Lacroix S, Bosson JL, Moreau-Gaudry A, Bedouch P. Development of artificial intelligence powered apps and tools for clinical pharmacy services: a systematic review. Int J Med Inform. 2023;172:104983. [ CrossRef ] [ Medline ]
  • Nakajima Pharmacy. URL: https://www.nakajima-phar.co.jp/ [accessed 2023-12-07]
  • Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(140):1-67. [ FREE Full text ]
  • Pathak A. Comparative analysis of transformer based language models. Comput Sci Inf Technol. 2021.:165-176. [ FREE Full text ] [ CrossRef ]
  • DIPEx Japan. URL: https://www.dipex-j.org/ [accessed 2024-02-04]
  • Herxheimer A, McPherson A, Miller R, Shepperd S, Yaphe J, Ziebland S. Database of patients' experiences (DIPEx): a multi-media approach to sharing experiences and information. Lancet. 2000;355(9214):1540-1543. [ CrossRef ] [ Medline ]
  • Lara PE, Muiño CB, de Spéville BD, Reyes JJ. Hand-foot skin reaction to regorafenib. Actas Dermosifiliogr. 2016;107(1):71-73. [ CrossRef ]
  • Zaiem A, Hammamia SB, Aouinti I, Charfi O, Ladhari W, Kastalli S, et al. Hand-foot syndrome induced by chemotherapy drug: case series study and literature review. Indian J Pharmacol. 2022;54(3):208-215. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • McLellan B, Ciardiello F, Lacouture ME, Segaert S, Van Cutsem E. Regorafenib-associated hand-foot skin reaction: practical advice on diagnosis, prevention, and management. Ann Oncol. 2015;26(10):2017-2026. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ai L, Xu Z, Yang B, He Q, Luo P. Sorafenib-associated hand-foot skin reaction: practical advice on diagnosis, mechanism, prevention, and management. Expert Rev Clin Pharmacol. 2019;12(12):1121-1127. [ CrossRef ] [ Medline ]
  • Tenti S, Correale P, Cheleschi S, Fioravanti A, Pirtoli L. Aromatase inhibitors-induced musculoskeletal disorders: current knowledge on clinical and molecular aspects. Int J Mol Sci. 2020;21(16):5625. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lacouture ME, Melosky BL. Cutaneous reactions to anticancer agents targeting the epidermal growth factor receptor: a dermatology-oncology perspective. Skin Therapy Lett. 2007;12(6):1-5. [ FREE Full text ] [ Medline ]

Abbreviations

Edited by G Eysenbach; submitted 25.12.23; peer-reviewed by CY Wang, L Guo; comments to author 24.01.24; revised version received 14.02.24; accepted 09.03.24; published 16.04.24.

©Satoshi Nishioka, Satoshi Watabe, Yuki Yanagisawa, Kyoko Sayama, Hayato Kizaki, Shungo Imai, Mitsuhiro Someya, Ryoo Taniguchi, Shuntaro Yada, Eiji Aramaki, Satoko Hori. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 16.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

IMAGES

  1. Critical Analysis Of A Research Paper

    how critically evaluate research paper

  2. Evaluating sources for a research paper in 2021

    how critically evaluate research paper

  3. (PDF) Critical Evaluation of Research

    how critically evaluate research paper

  4. Admission essay: Critically evaluate essay

    how critically evaluate research paper

  5. How to Critically Evaluate a RESEARCH Paper? Part 1: Introduction

    how critically evaluate research paper

  6. 009 Essay Example Critical Evaluation Critically Evaluate Analysis

    how critically evaluate research paper

VIDEO

  1. Research Methodology:- Lecture:-1| PG Design & CADME

  2. Research Methodology

  3. Literature Review videos 3

  4. How to Evaluate Research Sources

  5. Educational Research Methods

  6. APS1028CR1 Self Critique video Athul Abraham

COMMENTS

  1. Critical appraisal of published research papers

    INTRODUCTION. Critical appraisal of a research paper is defined as "The process of carefully and systematically examining research to judge its trustworthiness, value and relevance in a particular context."[] Since scientific literature is rapidly expanding with more than 12,000 articles being added to the MEDLINE database per week,[] critical appraisal is very important to distinguish ...

  2. Write a Critical Review of a Scientific Journal Article

    Use the questions below to help you evaluate the quality of the authors' research: Title. Does the title precisely state the subject of the paper? Abstract. Read the statement of purpose in the abstract. Does it match the one in the introduction? Acknowledgments. Could the source of the research funding have influenced the research topic or ...

  3. Evaluating Research in Academic Journals: A Practical Guide to

    Academic Journals. Evaluating Research in Academic Journals is a guide for students who are learning how to. evaluate reports of empirical research published in academic journals. It breaks down ...

  4. Critically reviewing literature: A tutorial for new researchers

    For example, my critical review of the definition of consumer agency uncovered that many papers in consumer research, even those with agency in the title, did not define agency. ... However, research students usually find the task of critically evaluating the literature far more challenging. This article has explained the nature, purposes and ...

  5. PDF Planning and writing a critical review

    appraisal, critical analysis) is a detailed commentary on and critical evaluation of a text. You might carry out a critical review as a stand-alone exercise, or as part of your research and preparation for writing a literature review. The following guidelines are designed to help you critically evaluate a research article. What is meant by ...

  6. The fundamentals of critically appraising an article

    In a nutshell when appraising an article, you are assessing: 1. Its relevance, methods, and validity. The strengths and weaknesses of the paper. Relevance to specific circumstances. 2. In this ...

  7. Full article: Critical appraisal

    Appraising the body of research. The critical appraisal of individual studies occurs within the broader goal of exploring how a body of work contributes to knowledge, policy, and practice. Methods exist to help reviewers assess how the research they have examined can contribute to practice and real world impact.

  8. Critically appraising qualitative research

    Six key questions will help readers to assess qualitative research #### Summary points Over the past decade, readers of medical journals have gained skills in critically appraising studies to determine whether the results can be trusted and applied to their own practice settings. Criteria have been designed to assess studies that use quantitative methods, and these are now in common use. In ...

  9. Critical evaluation of publications

    Critical evaluation is the process of examining the research for the strength or weakness of the findings, validity, relevance, and usefulness of the research findings. [ 1] The availability of extensive information and the difficulty in differentiating the relevant information obligate the primary need of critical appraisal.

  10. 5. Critically Analyze and Evaluate

    Take notes on the articles as you read them and identify any themes or concepts that may apply to your research question. This sample template (below) may also be useful for critically reading and organizing your articles. Or you can use this online form and email yourself a copy.

  11. 1 Important points to consider when critically evaluating published

    Critically evaluate the research paper using the checklist provided, making notes on the key points and your overall impression. Discussion. Critical appraisal checklists are useful tools to help assess the quality of a study. Assessment of various factors, including the importance of the research question, the design and methodology of a study ...

  12. How to read a paper, critical review

    To be critical of a text means you question the information and opinions in the text, in an attempt to evaluate or judge its worth overall. An evaluation is an assessment of the strengths and weaknesses of a text. This should relate to specific criteria, in the case of a research article. You have to understand the purpose of each section, and ...

  13. Critical Appraisal and Analysis

    Primary sources are the raw material of the research process. Secondary sources are based on primary sources. For example, if you were researching Konrad Adenauer's role in rebuilding West Germany after World War II, Adenauer's own writings would be one of many primary sources available on this topic.

  14. Critical Analysis: The Often-Missing Step in Conducting Literature

    SUBMIT PAPER. Journal of Human Lactation. Impact Factor: 2.6 / 5-Year ... Perhaps as this method of research becomes more refined critical reflection will become an expectation of authors. ... (2017). Best practices in developing, conducting, and evaluating inductive research. Human Resource Management Review, 27(2), 255-264.doi:10.1016/j ...

  15. PDF Critical appraisal of a journal article

    5. finally, you evaluate your new or amended practice. ces ol: n n n e l Critical appraisal is essential to: combat information overload; identify papers that are clinically relevant; Continuing Professional Development (CPD) - critical appraisal is a requirement for the evidence based medicine component of many membership exams.

  16. A guide to critical appraisal of evidence : Nursing2020 Critical Care

    Critical appraisal is the assessment of research studies' worth to clinical practice. Critical appraisal—the heart of evidence-based practice—involves four phases: rapid critical appraisal, evaluation, synthesis, and recommendation. This article reviews each phase and provides examples, tips, and caveats to help evidence appraisers ...

  17. PDF Step'by-step guide to critiquing research. Part 1: quantitative research

    critiquing the literature, critical analysis, reviewing the literature, evaluation and appraisal of the literature which are in essence the same thing (Bassett and Bassett, 2003). Terminology in research can be confusing for the novice research reader where a term like 'random' refers to an organized manner of selecting items or participants ...

  18. Critical Appraisal of Clinical Research

    Critical appraisal is the course of action for watchfully and systematically examining research to assess its reliability, value and relevance in order to direct professionals in their vital clinical decision making [ 1 ]. Critical appraisal is essential to: Continuing Professional Development (CPD).

  19. Critical Analysis of Clinical Research Articles: A Guide for Evaluation

    Abstract. Critical evaluation is used to identify the strengths and weaknesses of an article, in order to evaluate the usefulness and validity of research results. The components of the critical ...

  20. Evaluating Research

    Definition: Evaluating Research refers to the process of assessing the quality, credibility, and relevance of a research study or project. This involves examining the methods, data, and results of the research in order to determine its validity, reliability, and usefulness. Evaluating research can be done by both experts and non-experts in the ...

  21. Critical Analysis and Evaluation

    Critical Analysis and Evaluation. Many assignments ask you to critique and evaluate a source. Sources might include journal articles, books, websites, government documents, portfolios, podcasts, or presentations. When you critique, you offer both negative and positive analysis of the content, writing, and structure of a source.

  22. Writing Tips: Critically Evaluating Research

    To develop the skill of being able to critically evaluate, when reading research articles in psychology read with an open mind and be active when reading. Ask questions as you go and see if the answers are provided. Initially skim through the article to gain an overview of the problem, the design, methods, and conclusions. Then read for details ...

  23. Critical Appraisal of a qualitative paper

    Critical appraisal of a qualitative paper. This guide aimed at health students, provides basic level support for appraising qualitative research papers. It's designed for students who have already attended lectures on critical appraisal. One framework for appraising qualitative research (based on 4 aspects of trustworthiness) is provided and ...

  24. Critical Literature Review : How to Critique a Research Article?

    Lack of Critical Analysis: AI cannot critically analyze academic texts with the depth and nuance that a human researcher can. It lacks the ability to evaluate the quality or bias in research studies. Potential for Inaccuracies: AI models can sometimes generate incorrect or misleading information, requiring careful fact-checking.

  25. Biomedical semantic text summarizer

    Text summarization is a challenging problem in Natural Language Processing, which involves condensing the content of textual documents without losing their overall meaning and information content, In the domain of bio-medical research, summaries are critical for efficient data analysis and information retrieval. While several bio-medical text summarizers exist in the literature, they often ...

  26. National pattern of city subsidence

    Conclusions. We provided a national-scale, systematic evaluation of China's city subsidence. Of the urban lands in China's major cities, 45% are subsiding with a velocity faster than 3 mm/year, and 16% are subsiding faster than 10 mm/year; these urban lands contain 29 and 7% of urban population, respectively.

  27. First, do no harm: a call to action to improve the evaluation of harms

    Exercise as medicine has emerged as an independent discipline in clinical research. Over the last decades, numerous randomised controlled trials (RCTs) have documented the beneficial effects of exercise on various patient-related, disease-related and health-related outcomes in clinical populations.1 Nevertheless, the evaluation of harms in clinical exercise research remains unsatisfactory ...

  28. Journal of Medical Internet Research

    Background: With the rapid aging of the global population, the prevalence of mild cognitive impairment (MCI) and dementia is anticipated to surge worldwide. MCI serves as an intermediary stage between normal aging and dementia, necessitating more sensitive and effective screening tools for early identification and intervention. The BrainFx SCREEN is a novel digital tool designed to assess ...

  29. Emmanuel Cooper and Colleagues Publish Sweetpotato Research

    The first chapter of Emmanuel Cooper's MS thesis entitled "Evaluation of critical weed-free period for three sweetpotato (Ipomoea batatas) cultivars," was recently accepted and published by Weed Science.The journal article is co-authored by HLA members Stephen Meyers, Ashley Adair, and Jeanine Arana as well as Katie Jennings from North Carolina State University and Kevin Gibson and Bill ...

  30. Journal of Medical Internet Research

    Background: Early detection of adverse events and their management are crucial to improving anticancer treatment outcomes, and listening to patients' subjective opinions (patients' voices) can make a major contribution to improving safety management. Recent progress in deep learning technologies has enabled various new approaches for the evaluation of safety-related events based on patient ...