Book cover

  • © 2012

Principles of Research Methodology

A Guide for Clinical Investigators

  • Phyllis G. Supino 0 ,
  • Jeffrey S. Borer 1

, Cardiovascular Medicine, SUNY Downstate Medical Center, Brooklyn, USA

You can also search for this editor in PubMed   Google Scholar

, Cardiovascualr Medicine, SUNY Downstate Medical Center, Brooklyn, USA

Based on a highly regarded and popular lecture series on research methodology

Comprehensive guide written by experts in the field

Emphasizes the essentials and fundamentals of research methodologies

75k Accesses

20 Citations

7 Altmetric

  • Table of contents

About this book

Editors and affiliations, bibliographic information.

  • Publish with us

Buying options

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (13 chapters)

Front matter, overview of the research process.

Phyllis G. Supino

Developing a Research Problem

  • Phyllis G. Supino, Helen Ann Brown Epstein

The Research Hypothesis: Role and Construction

Design and interpretation of observational studies: cohort, case–control, and cross-sectional designs.

  • Martin L. Lesser

Fundamental Issues in Evaluating the Impact of Interventions: Sources and Control of Bias

Protocol development and preparation for a clinical trial.

  • Joseph A. Franciosa

Data Collection and Management in Clinical Research

  • Mario Guralnik

Constructing and Evaluating Self-Report Measures

  • Peter L. Flom, Phyllis G. Supino, N. Philip Ross

Selecting and Evaluating Secondary Data: The Role of Systematic Reviews and Meta-analysis

  • Lorenzo Paladino, Richard H. Sinert

Sampling Methodology: Implications for Drawing Conclusions from Clinical Research Findings

  • Richard C. Zink

Introductory Statistics in Medical Research

  • Todd A. Durham, Gary G. Koch, Lisa M. LaVange

Ethical Issues in Clinical Research

  • Eli A. Friedman

How to Prepare a Scientific Paper

Jeffrey S. Borer

Back Matter

Principles of Research Methodology: A Guide for Clinical Investigators is the definitive, comprehensive guide to understanding and performing clinical research. Designed for medical students, physicians, basic scientists involved in translational research, and other health professionals, this indispensable reference also addresses the unique challenges and demands of clinical research and offers clear guidance in becoming a more successful member of a medical research team and critical reader of the medical research literature. The book covers the entire research process, beginning with the conception of the research problem to publication of findings. Principles of Research Methodology: A Guide for Clinical Investigators comprehensively and concisely presents concepts in a manner that is relevant and engaging to read. The text combines theory and practical application to familiarize the reader with the logic of research design and hypothesis construction, the importance of research planning, the ethical basis of human subjects research, the basics of writing a clinical research protocol and scientific paper, the logic and techniques of data generation and management, and the fundamentals and implications of various sampling techniques and alternative statistical methodologies. Organized in thirteen easy to read chapters, the text emphasizes the importance of clearly-defined research questions and well-constructed hypothesis (reinforced throughout the various chapters) for informing methods and in guiding data interpretation. Written by prominent medical scientists and methodologists who have extensive personal experience in biomedical investigation and in teaching key aspects of research methodology to medical students, physicians and other health professionals, the authors expertly integrate theory with examples and employ language that is clear and useful for a general medical audience. A major contribution to the methodology literature, Principles of Research Methodology: A Guide for Clinical Investigators is an authoritative resource for all individuals who perform research, plan to perform it, or wish to understand it better.

From the reviews:

Book Title : Principles of Research Methodology

Book Subtitle : A Guide for Clinical Investigators

Editors : Phyllis G. Supino, Jeffrey S. Borer

DOI : https://doi.org/10.1007/978-1-4614-3360-6

Publisher : Springer New York, NY

eBook Packages : Medicine , Medicine (R0)

Copyright Information : Springer Science+Business Media, LLC 2012

Hardcover ISBN : 978-1-4614-3359-0 Published: 22 June 2012

Softcover ISBN : 978-1-4939-4292-3 Published: 23 August 2016

eBook ISBN : 978-1-4614-3360-6 Published: 22 June 2012

Edition Number : 1

Number of Pages : XVI, 276

Topics : Oncology , Cardiology , Internal Medicine , Endocrinology , Neurology

Policies and ethics

  • Find a journal
  • Track your research
  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Methodological...

Methodological standards for qualitative and mixed methods patient centered outcomes research

  • Related content
  • Peer review
  • Bridget Gaglio , senior program officer 1 ,
  • Michelle Henton , program manager 1 ,
  • Amanda Barbeau , program associate 1 ,
  • Emily Evans , research health science specialist 2 ,
  • David Hickam , director of clinical effectiveness and decision sciences 1 ,
  • Robin Newhouse , dean 3 ,
  • Susan Zickmund , research health scientist and professor 4 5
  • 1 Patient-Centered Outcomes Research Institute, 1828 L Street, Suite 900, Washington, DC, 20036, USA
  • 2 Veterans Health Administration, United States Department of Veterans Affairs, Washington, DC, USA
  • 3 Indiana University School of Nursing, Indianapolis, IN, USA
  • 4 United States Department of Veterans Affairs, Salt Lake City, UT, USA
  • 5 University of Utah School of Medicine, Salt Lake City, UT, USA
  • Correspondence to: B Gaglio bgaglio{at}pcori.org
  • Accepted 20 October 2020

The Patient-Centered Outcomes Research Institute’s (PCORI) methodology standards for qualitative methods and mixed methods research help ensure that research studies are designed and conducted to generate the evidence needed to answer patients’ and clinicians’ questions about which methods work best, for whom, and under what circumstances. This set of standards focuses on factors pertinent to patient centered outcomes research, but it is also useful for providing guidance for other types of clinical research. The standards can be used to develop and evaluate proposals, conduct the research, and interpret findings. The standards were developed following a systematic process: survey the range of key methodological issues and potential standards, narrow inclusion to standards deemed most important, draft preliminary standards, solicit feedback from a content expert panel and the broader public, and use this feedback to develop final standards for review and adoption by PCORI’s board of governors. This article provides an example on how to apply the standards in the preparation of a research proposal.

Rigorous methodologies are critical for ensuring the trustworthiness of research results. This paper will describe the process for synthesizing the current literature providing guidance on the use of qualitative and mixed methods in health research; and the process for development of methodology standards for qualitative and mixed methods used in patient centered outcomes research. Patient centered outcomes research is comparative clinical effectiveness research that aims to evaluate the clinical outcomes resulting from alternative clinical or care delivery approaches for fulfilling specific health and healthcare needs. By focusing on outcomes that are meaningful to patients, studies on patient centered outcomes research strengthen the evidence base and inform the health and healthcare decisions made by patients, clinicians, and other stakeholders.

The methods used in patient centered outcomes research are diverse and often include qualitative methodologies. Broadly, qualitative research is a method of inquiry used to generate and analyze open ended textual data to enhance the understanding of a phenomenon by identifying underlying reasons, opinions, and motivations for behavior. Many different methodologies can be used in qualitative research, each with its own set of frameworks and procedures. 1 This multitude of qualitative approaches allows investigators to select and synergize methods with the specific needs associated with the aims of the study.

Qualitative methods can also be used to supplement and understand quantitative results; the integration of these approaches for scientific inquiry and evaluation is known as mixed methods. 2 This type of approach is determined a priori, because the research question drives the choice of methods, and draws on the strengths of both quantitative and qualitative approaches to resolve complex and contemporary issues in health services. This strategy is achieved by integrating qualitative and quantitative approaches at the design, methods, interpretation, and reporting levels of research. 3 Table 1 lists definitions of qualitative methods, mixed methods, and patient centered outcomes research. The methodology standards described here are intended to improve the rigor and transparency of investigations that include qualitative and mixed methods. The standards apply to designing projects, conducting the studies, and reporting the results. Owing to its focus on patient centered outcomes research, this article is not intended to be a comprehensive summary of the difficulties encountered in the conduct of qualitative and mixed methods research.

Terms and definitions used in the development of the Patient-Centered Outcomes Research Institute’s (PCORI) qualitative and mixed methods research methodology standards

  • View inline

Summary points

Many publications provide guidance on how to use qualitative and mixed methods in health research

The methodological standards reported here and adopted by Patient-Centered Outcomes Research Institute (PCORI) synthesize and refine various recommendations to improve the design, conduct, and reporting of patient centered, comparative, clinical effectiveness research

PCORI has developed and adopted standards that provide guidance on key areas where research applications and research reports have been deficient in the plans for and use of qualitative and mixed methods in conducting patient centered outcomes research

The standards provide guidance to health researchers to ensure that studies of this research are designed and conducted to generate valid evidence needed to analyze patients’ and clinicians’ questions about what works best, for whom, and under what circumstances

Established by the United States Congress in 2010 13 and reauthorized in 2019, 14 the Patient-Centered Outcomes Research Institute (PCORI) funds scientifically rigorous comparative effectiveness research, previously defined as patient centered outcomes research, to improve the quality and relevance of evidence that patients, care givers, clinicians, payers, and policy makers need to make informed healthcare decisions. Such decisions might include choices about which prevention strategies, diagnostic methods, and treatment options are most appropriate based on personal preferences and unique patient characteristics.

PCORI’s focus on patient centeredness and stakeholder engagement in research has generated increased interest in and use of methodologies of qualitative and mixed methods research within comparative effectiveness research studies. Qualitative data have a central role in understanding the human experience. As with any research, the potential for these studies to generate high integrity, evidence based information depends on the quality of the methods and approaches that were used. PCORI’s authorizing legislation places a unique emphasis on ensuring scientific rigor, including the creation of a methodology committee that develops and approves methodology standards to guide PCORI funded research. 13 The methodology committee consists of 15 individuals who were appointed by the Comptroller General of the US and the directors of the Agency for Healthcare Research and Quality and the National Institutes of Health. The members of the committee are medical and public health professionals with expertise in study design and methodology for comparative effectiveness research or patient centered outcomes research ( https://www.pcori.org/about-us/governance/methodology-committee ).

The methodology committee began developing its initial group of methodology standards in 2012 (with adoption by the PCORI’s board of governors that year). Since then, the committee has revised and expanded the standards based on identified methodological issues and input from stakeholders. Before the adoption of the qualitative and mixed methods research standards, the PCORI methodology standards consisted of 56 individual standards in 13 categories. 15 The first five categories of the standards are crosscutting and relevant to most studies on patient centered outcomes research, while the other eight categories are applicable depending on a study’s purpose and design. 15

Departures from good research practices are partially responsible for weaknesses in the quality and subsequent relevance of research. The PCORI methodology standards provide guidance that helps to ensure that studies on patient centered outcomes research are designed and conducted to generate the evidence needed to answer patients’ and clinicians’ questions about what works best, for whom, and under what circumstances. These standards do not represent a complete, comprehensive set of all requirements for high quality patient centered outcomes research; rather, they cover topics that are likely to contribute to improvements in quality and value. Specifically, the standards focus on selected methodological issues that have substantial deficiencies or inconsistencies regarding how available methods are applied in practice. These methodological issues might include a lack of rigor or inappropriate use of approaches for conducting patient centered outcomes research. As a research funder, PCORI uses the standards in the scientific review of applications, monitoring of funded research projects, and evaluation of final reports of research findings.

Use of qualitative methods has become more prevalent over time. Based on a PubMed search in June 2020 (search terms “qualitative methods” and “mixed methods”), the publication of qualitative and mixed methods studies has grown steadily from 1980 to 2019. From 1980 to 1989, 63 qualitative and 110 mixed methods papers were identified. Between 1990 to 1999, the number of qualitative and mixed methods papers was 420 and 58, respectively; by 2010 to 2019, these numbers increased to 5481 and 17 031, respectively. The prominent increase in publications in recent years could be associated with more sophisticated indexing methods in PubMed as well as the recognition that both qualitative and mixed methods research are important approaches to scientific inquiry within the health sciences. These approaches allow investigators to obtain a more detailed perspective and to incorporate patients’ motivations, beliefs, and values.

Although the use of qualitative and mixed methods research has increased, consensus regarding definitions and application of the methods remain elusive, reflecting wide disciplinary variation. 16 17 Many investigators and organizations have attempted to resolve these differences by proposing guidelines and checklists that help define essential components. 12 16 18 19 20 21 22 23 24 25 26 27 28 29 For example, Treloar et al 20 offer direction for qualitative researchers in designing and publishing research by providing a 10 point checklist for assessing the quality of qualitative research in clinical epidemiological studies. Tong et al 22 provide a 32 item checklist to help investigators report important aspects of the research process for interviews and focus groups such as the study team, study methods, context of the study, findings, analysis, and interpretations.

The goal of the PCORI Methodology Standards on Qualitative and Mixed Methods is to provide authoritative guidance on the use of these methodologies in comparative effectiveness research and patient centered outcomes research. The purpose of these types of research is to improve the clinical evidence base and, particularly, to help end users understand how the evidence provided by individual research studies can be applied to particular clinical circumstances. Use of qualitative and mixed methods can achieve this goal but can also introduce specific issues that need to be captured in PCORI’s methodological guidance. The previously published guidelines generally have a broader focus and different points of emphasis.

This article describes the process for synthesizing the current literature providing guidance on the use of qualitative and mixed methods in health research; and developing methodology standards for qualitative and mixed methods used in patient centered outcomes research. We then provide an example showing how to apply the standards in the design of a patient centered outcomes research application.

Methodology standards development process

Literature review and synthesis.

The purpose of the literature review was to identify published journal articles that defined criteria for rigorous qualitative and mixed methods research in health research. With the guidance of PCORI’s medical librarian, we designed and executed searches in PubMed, and did four different keyword searches for both qualitative and mixed methods (eight searches in total; supplemental table 1). We aimed to identify articles that provided methodological guidance rather than studies that simply used the methods.

We encountered two major challenges. First, qualitative and mixed methods research has a broad set of perspectives. 30 31 Second, some medical subject headings (MeSH terms) in our queries were not introduced until recently (eg, “qualitative methods” introduced in 2003, “comparative effectiveness” introduced in 2010), which required us to search for articles by identifying a specific qualitative method (eg, interviews, focus groups) to capture the literature before 2003 ( table 1 ). These challenges could have led to missed publications. To refine and narrow our search results, we applied the following inclusion criteria:

Articles on health services or clinical research, published in English, and published between 1 January 1990 and 14 April 2017

Articles that proposed or discussed a guideline, standard, framework, or set of principles for conducting rigorous qualitative and mixed methods research

Articles that described or discussed the design, methods for, or reporting of qualitative and mixed methods research.

The search queries identified 1933 articles (1070 on qualitative methods and 863 on mixed methods). The initial citation lists were reviewed, and 204 duplicates were removed. Three authors (BG, MH, and AB) manually reviewed the 1729 remaining article abstracts. Titles and abstracts were independently evaluated by each of the three reviewers using the inclusion criteria. Disagreements were adjudicated by an in-person meeting to determine which articles to include. This initial round of review yielded 212 references, for which the full articles were obtained. The full articles were reviewed using the same inclusion and exclusion criteria as the abstracts. Most of these articles were studies that had used a qualitative or mixed methods approach but were only reporting on the results of the completed research. Therefore, these articles were not able to inform the development of standards for conducting qualitative and mixed methods research and they were excluded, resulting in the final inclusion of 56 articles (supplemental table 2). Following the original search, the literature was scanned for new articles providing guidance on qualitative and mixed methods, resulting in four articles being added to the final set of literature. These articles come from psychology and health psychology specialties and seek to provide not only minimal standards in relation to qualitative and mixed methods research but also standards for best practice that apply across a wide range of fields. 32 33 34 35

Initial set of methodology standards

Using an abstraction form that outlined criteria for qualitative and mixed methods manuscripts and research proposals, we abstracted the articles to identify key themes, recommendations, and guidance under each criterion. Additional information was noted when considered relevant. A comprehensive document was created to include the abstractions and notes for all articles. This document outlined the themes in the literature related to methodological guidance. We began with the broadest set of themes organized into 11 major domains: the theoretical approach, research topics, participants, data collection, analysis and interpretation, data management, validity and reliability, presentation of results, context of research, impact of the researchers (that is, reflexivity), and mixed methods. As our goal was to distill the themes into broad standards that did not overlap with pre-existing PCORI methodology standards, we initially condensed the themes into six qualitative and three mixed methods standards. Following discussion among members of the working group, some standards were combined and two were dropped because of substantial overlap with each other or with previously developed PCORI methodology standards.

The key themes identified from the abstracted information were used as the foundation for the first draft of the new methodology standards. We then further discussed the themes as a team and removed redundancies, refined the labeling of themes, and removed themes deemed extraneous through a team based adjudication process. The draft standards were presented to PCORI’s methodology committee to solicit feedback. Revisions were made on the basis of this feedback.

Expert panel one day workshop

A one day expert panel workshop was held in Washington, DC, on 18 January 2018. Ten individuals regarded as international leaders in qualitative and mixed methods were invited to attend—including those who had created standards previously or had a substantial number of peer reviewed publications reporting qualitative and mixed methods in health research; had many years’ experience as primary researchers; and had served as editors of major textbooks and journals. The panel was selected on the basis of their influence and experience in these methodologies as well as their broad representation from various fields of study. The representation of expertise spanned the fields of healthcare, anthropology, and the social sciences (supplemental table 3).

Before the meeting, we emailed the panel members the draft set of qualitative and mixed methods standards, PCORI’s methodology standards document, and the background document describing how the draft standards had been developed. At the meeting, the experts provided extensive feedback, including their recommendations regarding what needs to be done well when using these methodological approaches. The panel emphasized that when conducting mixed methods research, this approach should be selected a priori, based on the research question, and that integration of the mixed approaches is critical at all levels of the research process (from inception to data analysis). The panel emphasized that when conducting qualitative research, flexibility and reflexive iteration should be maintained throughout the process—that is, the sampling, data collection, and data analysis. The main theme from the meeting was that the draft standards were not comprehensive enough to provide guidance for studies on patient centered outcomes research or comparative effectiveness research that involved qualitative and mixed methods. After the conclusion of the workshop, feedback and recommendations were synthesized, and the draft standards were reworked in the spring of 2018 ( fig 1 ). This work resulted in a new set of four qualitative methods standards and three mixed methods standards representing the unique features of each methodology that were not already included in the methodology standards previously adopted by PCORI.

Fig 1

Process of development and adoption of the Patient-Centered Outcomes Research Institute’s (PCORI) methodology standards on qualitative and mixed methods research

  • Download figure
  • Open in new tab
  • Download powerpoint

Continued refinement and approval of methodology standards

In late spring 2018, the revised draft methodology standards were presented to PCORI’s methodology committee first by sharing a draft of the standards and then via oral presentation. Feedback from the methodology committee centered around eliminating redundancy in the standards proposed (both across the draft standards and in relation to the previously adopted categories of standards) and making the standards more actionable. The areas where the draft standards overlapped with the current standards were those for formulating research questions, for patient centeredness, and for data integrity and rigorous analyses. Each draft standard was reviewed and assessed by the methodology committee members and the staff workgroup to confirm its unique contribution to PCORI’s methodology standards. After this exercise, each remaining standard was reworded to be primarily action guiding (rather than explanatory). This version of proposed standards was approved by the methodology committee to be sent to PCORI’s board of governors for a vote to approve for public comment. The board of governors approved the standards to be posted for public comment.

The public comment period hosted on PCORI’s website ( https://www.pcori.org/engagement/engage-us/provide-input/past-opportunities-provide-input ) was held from 24 July 2018 to 21 September 2018. Thirty nine comments were received from nine different stakeholders—seven health researchers, one training institution, and one professional organization. Based on the public comments, minor wording changes were made to most of the draft standards. The final version of the standards underwent review by both the methodology committee and PCORI’s board of governors. The board voted to adopt the final version of the standards on 26 February 2019 ( table 2 ).

Patient-Centered Outcomes Research Institute’s (PCORI) methodology standards for qualitative methods and mixed methods

Application of methodology standards in research design

The standards can be used across the research continuum, from research design and application development, conduct of the research, and reporting of research findings. We provide an example for researchers on how these standards can be used in the preparation of a research application ( table 3 ).

Guidance for researchers on how to use Patient-Centered Outcomes Research Institute’s (PCORI) methodology standards for qualitative and mixed methods research in application preparation

QM-1: State the qualitative approach to research inquiry, design, and conduct

Many research proposals on patient centered outcomes research or comparative effectiveness research propose the use of qualitative methods but lack adequate description of and justification for the qualitative approach that will be used. Often the rationale for using qualitative methods is not tied back to the applicable literature and the identified evidence gap, missing the opportunity to link the importance of the approach in capturing the human experience or patient voice in the research aims. The approach to inquiry should be explicitly stated along with the rationale and a description of how it ties to the research question(s). The research proposal should clearly define how the qualitative approach will be operationalized and supports the choice of methods for participant recruitment, data collection, and analysis. Moreover, procedures for data collection should be stated, as well as the types of data to be collected, when data will be collected (that is, one point in time v longitudinal), data management, codebook development, intercoder reliability process, data analysis, and procedures for ensuring full confidentiality.

QM-2: Select and justify appropriate qualitative methods and sampling strategy

While the number of participants who will be recruited for focus groups or in-depth interviews is usually described, the actual sampling strategy is often not stated. The description of the sampling strategy should state how it aligns with the qualitative approach, how it relates to the research question(s), and the variation in sampling that might occur over the course of the study. Furthermore, most research proposals state that data will be collected until thematic saturation is reached, but how this will be determined is omitted. As such, this standard outlines the information essential for understanding who is participating in the study and aims to reduce the likelihood of making unsupported statements, emphasizing transparency in the criteria used to determine the stopping point for recruitment and data collection.

QM-3: Link the qualitative data analysis, interpretations, and conclusions to the study question

Qualitative analysis transforms data into information that can be used by the relevant stakeholder. It is a process of reviewing, synthesizing, and interpreting data to describe and explain the phenomena being studied. The interpretive process occurs at many points in the research process. It begins with making sense of what is heard and observed during data gathering, and then builds understanding of the meaning of the data through data analysis. This is followed by development of a description of the findings that makes sense of the data, in which the researcher’s interpretation of the findings is embedded. Many research proposals state that the data will be coded, but it is unclear by whom, their qualifications, or the process. Very little, if any, description is provided as to how conclusions will be drawn and how they will be related to the original data, and this standard highlights the need for detailed information on the analytical and interpretive processes for qualitative data and its relationship to the overall study.

QM-4: Establish trustworthiness and credibility of qualitative research

The qualitative research design should incorporate elements demonstrating validity and reliability, which are also known by terms such as trustworthiness and credibility. Studies with qualitative components can use several approaches to help ensure the validity and reliability of their findings, including audit trail, reflexivity, negative or deviant case analysis, triangulation, or member checking (see table 1 for definitions).

MM-1: Specify how mixed methods are integrated across design, data sources, and/or data collection phases

This standard requires investigators to declare and support their intent to conduct a mixed methods approach a priori in order to avoid a haphazard approach to the design and resulting data. Use of mixed methods can enhance the study design, by using the strengths of both quantitative and qualitative research as investigators are afforded the use of multiple data collection tools rather than being restricted to one approach. Mixed methods research designs have three key factors: integration of data, relative timing, and implications of linkages for methods in each component. Additionally, the standards for mixed methods, quantitative, and qualitative methodologies must be met in the design, implementation, and reporting stages. This is different from a multimethod research design in which two or more forms of data (qualitative, quantitative, or both) are used to resolve different aspects of the research question independently and are not integrated.

MM-2: Select and justify appropriate mixed methods sampling strategy

Mixed methods research aims to contribute insights and knowledge beyond that obtained from quantitative or qualitative methods only, which should be reflected in the sampling strategies as well as in the design of the study and the research plan. Qualitative and quantitative components can occur simultaneously or sequentially, and researchers must select and justify the most appropriate mixed method sampling strategy and demonstrate that the desired number and type of participants can be achieved with respect to the available time, cost, research team skillset, and resources. Those sampling strategies that are unique to mixed methods (eg, interdependent, independent, and combined) should focus on the depth and breadth of information across research components.

MM-3: Integrate data analysis, data interpretations, and conclusions

Qualitative and quantitative data often are analyzed in isolation, with little thought given to when these analyses should occur or how the analysis, interpretation, and conclusions integrate with one another. There are multiple approaches to integration in the analysis of qualitative and quantitative data (eg, merging, embedding, and connecting). As such, the approach to integration should determine the priority of the qualitative and quantitative components, as well as the temporality with which analysis will take place (eg, sequentially, or concurrently; iterative or otherwise). Either a priori or emergently, where appropriate, researchers should define these characteristics, identify the points of integration, and explain how integrated analyses will proceed with respect to the two components and the selected approach.

The choice between multiple options for prevention, diagnosis, and treatment of health conditions presents a considerable challenge to patients, clinicians, and policy makers as they seek to make informed decisions. Patient centered outcomes research focuses on the pragmatic comparison of two or more health interventions to determine what works best for which patients and populations in which settings. 5 The use of qualitative and mixed methods research can enable more robust capture and understanding of information from patients, caregivers, clinicians, and other stakeholders in research, thereby improving the strength, quality, and relevance of findings. 4

Despite extensive literature on qualitative and mixed methods research in general, the use of these methodologies in the context of patient centered outcomes research or comparative effectiveness research continues to grow and requires additional guidance. This guidance could facilitate the appropriate design, conduct, analysis, and reporting of these approaches. For example, the need for including multiple stakeholder perspectives, understanding how an intervention was implemented across multiple settings, or documenting the clinical context so decision makers can evaluate whether findings would be transferable to their respective settings pose unique challenges to the rigor and agility of qualitative and mixed methods approaches.

PCORI’s methodology standards for qualitative and mixed methods research represent an opportunity for further strengthening the design, conduct, and reporting of patient centered outcomes research or comparative effectiveness research by providing guidance that encompasses the broad range of methods that stem from various philosophical assumptions, disciplines, and procedures. These standards directly affect factors related to methodological integrity, accuracy, and clarity as identified by PCORI staff, methodology committee members, and merit reviewers in studies on patient centered outcomes research or comparative effectiveness research. The standards are presented at a level accessible to researchers new to qualitative and mixed methods research; however, they are not a substitute for appropriate expertise.

The challenges of ensuring rigorous methodology in the design and conduct of research are not unique to qualitative and mixed methods research, because the imperative to increase value and reduce waste in research design, conduct, and analysis is widely recognized. 36 Consistent with such efforts, PCORI recognizes the importance of continued methodological development and evaluation and is committed to listening to the research community and providing updated guidance based on methodological advances and research needs. 37

Acknowledgments

We thank the Patient-Centered Outcomes Research Institute’s (PCORI) methodology committee during this work (Naomi Aronson, Ethan Bach, Stephanie Chang, David Flum, Cynthia Girman, Steven Goodman (chairperson), Mark Helfand, Michael S Lauer, David O Meltzer, Brian S Mittman, Sally C Morton, Robin Newhouse (vice chairperson), Neil R Powe, and Adam Wilcox); and Frances K Barg, Benjamin F Crabtree, Deborah Cohen, Michael Fetters, Suzanne Heurtin-Roberts, Deborah K Padgett, Janice Morse, Lawrence A Palinkas, Vicki L Plano Clark, and Catherine Pope, for participating in the expert panel meeting and consultation.

Contributors: BG led the development of the methodology standards and wrote the first draft of the paper. MH, AB, SZ, EE, DH, and RN made a substantial contribution to all stages of developing the methodology standards. BG, SZ, MH, and AB drafted the methodology standards. DH, EE, and RN gave critical insights into PCORI’s methodology standards development processes and guidance. SZ served as qualitative methods consultant to the workgroup. BG provided project leadership and guidance. MH and AB facilitated the expert panel meeting. SZ is senior author. BG and SZ are the guarantors of this work and accept full responsibility for the finished article and controlled the decision to publish. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: No funding was used to support this work. All statements in this report, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of PCORI, its board of governors or methodology committee. The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.

Competing interests: All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/conflicts-of-interest/ and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

The lead author affirms that the manuscript is an honest, accurate, and transparent account of the work being reported; that no important aspects of the study have been omitted; and that any discrepancies from the work as planned have been explained.

Provenance and peer review: Not commissioned; externally peer reviewed.

Patient and public involvement: Patients and stakeholders were invited to comment on the draft standards during the public comment period held from 24 July 2018 to 21 September 2018. Comments were reviewed and revisions made accordingly. Development of the standards, including the methods, were presented at two PCORI board of governors’ meetings, which are open to the public, recorded, and posted on the PCORI website.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

  • Collins CS ,
  • Stockton CM
  • Creswell JW ,
  • Plano Clark VL
  • Fetters MD ,
  • Creswell JW
  • ↵ Patient-Centered Outcomes Research Institute. Patient-centered outcomes research. 2010-19. https://www.pcori.org/research-results/about-our-research/research-we-support .
  • Institute of Medicine
  • Crabtree BF ,
  • Klassen AC ,
  • Plano Clark VL ,
  • Clegg Smith KC ,
  • Office of Behavioral and Social Sciences Research
  • ↵ Patient Protection and Affordable Care Act, Pub. L. No. 111-148 Stat. 119 (March 23, 2010).
  • ↵ Further Consolidated Appropriations Act, 2020, Pub. L. No. 116-94 (20 December 2019).
  • ↵ Patient-Centered Outcomes Research Institute (PCORI). PCORI methodology standards. 2011-19. https://www.pcori.org/research-results/about-our-research/research-methodology/pcori-methodology-standards .
  • Molina-Azorin JF
  • Chapple A ,
  • Treloar C ,
  • Champness S ,
  • Simpson PL ,
  • Higginbotham N
  • Cesario S ,
  • Santa-Donato A
  • Sainsbury P ,
  • Flemming K ,
  • McInnes E ,
  • Davidoff F ,
  • Batalden P ,
  • Stevens D ,
  • Mooney SE ,
  • SQUIRE development group
  • Gagnon MP ,
  • Griffiths F ,
  • Johnson-Lafleur J
  • ↵ National Cancer Institute. Qualitative methods in implementation science. 2018. https://cancercontrol.cancer.gov/sites/default/files/2020-09/nci-dccps-implementationscience-whitepaper.pdf
  • O’Brien BC ,
  • Harris IB ,
  • Beckman TJ ,
  • ↵ National Institute for Health and Clinical Excellence. The guidelines manual. Appendix H: Methodology checklist: qualitative studies. https://www.nice.org.uk/process/pmg6/resources/the-guidelines-manual-appendices-bi-2549703709/chapter/appendix-h-methodology-checklist-qualitative-studies .
  • Crabtree BF
  • Johnson RB ,
  • Onwuegbuzie AJ ,
  • American Psychological Association
  • Levitt HM ,
  • Bamberg M ,
  • Josselson R ,
  • Suárez-Orozco C
  • Motulsky SL ,
  • Morrow SL ,
  • Ponterotto JG
  • Bishop FL ,
  • Horwood J ,
  • Chilcot J ,
  • Ioannidis JPA ,
  • Greenland S ,
  • Hlatky MA ,
  • ↵ Patient-Centered Outcomes Research Institute. The PCORI methodology report. 2019. https://www.pcori.org/sites/default/files/PCORI-Methodology-Report.pdf .

research methodology in medical research

  • Advanced search
  • Peer review

research methodology in medical research

We are delighted to announce that CVIA has received its first Journal Impact Factor ( 0.5 ) in the 2023 Journal Citation Reports Release. Interested in becoming a  CVIA published author?

  • Platinum Open Access with no APCs. 
  • Fast peer review/Fast publication online after article acceptance.

Submissions should be made electronically at:  https://mc04.manuscriptcentral.com/cvia-journal .

Please refer to the Author Guidelines at  https://cvia-journal.org/instructions-to-authors/  before submission.

research methodology in medical research

Cardiovascular Innovations and Applications

  • 117 Common Statistical Methods and Reporting of Results in Medical Research
  • 127 A Case Summary of the Application of a Drug-Eluting Stent Combined with a Drug-Coated Balloon in Left Main Coronary Artery Disease
  • 135 Risk Factors for Prognosis after the Maze IV Procedure in Patients with Atrial Fibrillation Undergoing Valve Surgery
  • 147 Prognostic Significance of HbA 1c Level in Asian Patients with Prediabetes and Coronary Artery Disease
  • 161 Visfatin and 25-Hydroxyvitamin D 3 Levels Affect Coronary Collateral Circulation Development in Patients with Chronic Coronary Total Occlusion
  • 171 Effects of Tirofiban and Nicorandil on Effective Reperfusion and the Levels of IL-4 and sICAM-1 After PCI for Chronic Coronary Total Occlusion
  • 181 Fasting Blood Glucose but not TMAO is Associated with In-Stent Restenosis in Patients with Acute Coronary Syndrome
  • 191 Valve-in-Valve Transcatheter Aortic Valve Replacement in a High-Risk Patient with a Biocor Bioprosthesis and a Flail Prosthetic Valve Leaflet
  • Record : found
  • Abstract : found
  • Article : found

Common Statistical Methods and Reporting of Results in Medical Research

research methodology in medical research

  • Download PDF
  • Review article
  • Invite someone to review

Statistical analysis is critical in medical research. The objective of this article is to summarize the appropriate use and reporting of commonly used statistical methods in medical research, on the basis of existing statistical guidelines and the authors’ experience in reviewing manuscripts, to provide recommendations for statistical applications and reporting.

Main article text

Introduction.

In medical research, statistical analysis is essential, and it involves two aspects: correct application of statistical methods, and correct presentation of statistical results. The former ensures the reliability of results [ 1 – 3 ], and the latter is equally important in the publication of articles. Non-standard results may not clearly express the authors’ intentions and may increase the difficulty of future utilization of articles by researchers.

Although many methods for medical statistical analysis exist, clinical studies commonly use comparisons of multiple groups (such as t test, analysis of variance (ANOVA) or chi-square test), correlation analysis and regression analysis (such as linear regression or logistic regression) [ 4 – 7 ]. Although these methods are not complicated, they are among the most error-prone in practical applications. Many suggestions or guidelines have been made regarding statistical reports [ 8 – 12 ], primarily in scientific research design and data preprocessing, such as population selection, variable selection, randomization and outliers. In contrast to previous studies, this article describes the correct application and presentation of statistical results for the comparison of multiple groups, correlation analysis, regression analysis and survival analysis, according to the given study purpose, to provide a reference for clinical researchers.

Descriptive Methods

Descriptive statistics.

The most commonly used descriptive statistics for quantitative data are the mean, standard deviation, median and interquartile range (Q1 for 25th percentile and Q3 for 75th percentile). The statistics of qualitative data primarily comprises the frequency, proportion and rate. For quantitative data, the data distribution must be considered. If the data are normally distributed, reporting as mean (standard deviation, SD) or mean ± SD is recommended. If the data do not follow a normal distribution, reporting as median (Q1–Q3) is recommended, e.g., 135 (128–143).

Reporting Descriptive Statistics

The general tabular form for statistical description is shown in Table 1 [ 13 ].

Characteristics of the Participants at Baseline † .

†Plus–minus values are means ± SD. No significant differences were observed between groups ( P >0.05) in any baseline characteristics.

*The number of patients included in each analysis is provided if it differs from the total number in the trial group.

#The total percentage of classified variables may not be 100%, owing to rounding in the calculation.

The following aspects must be emphasized for statistical descriptions:

Basic principles of the statistical description table: First, the group factor is usually used as the column head, and the characteristics being compared are listed in the leftmost column of the table (stub column), because many baseline characteristics are usually present. Second, the corresponding units of measurement (such as ng/ml or age) should be listed for the different variables. Providing this information is particularly important for variables with multiple units of measurement. Third, any further explanation, if required, is usually indicated at the bottom of the table. For example, Table 1 may state why the sum of percentages does not equal 100%. Fourth, the number of cases in each group should be listed in the table. If the number of missing cases varies among variables, the number of missing cases should be listed for each variable. Fifth, if two very clear classifications of categorical variables are present, and one of them is of greater interest, only one type of data may be listed. For example, the variable of previous conception in Table 1 is divided into yes and no, and only the frequency with the percentage of the yes category is listed.

How many decimal places should be retained for quantitative data? No clear rules exist regarding this issue. For example, the Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines [ 14 ] recommend rounding to a reasonable extent for ease of comprehension and simplicity. The European Association of Science Editors (EASE) guidelines [ 15 ] recommend providing numbers with two to three effective digits. Habibzadeh has provided additional suggestions [ 16 ]: the precision for reporting of each statistic depends on how that statistic is derived; moreover, the number of decimal places reported for the mean, SD, median and IQR in scientific reports should not exceed that of the precision of the measurement in the raw data. We recommend following this suggestion, such that the number of decimal places depends on the accuracy of the original data. For example, if the measurement precision of a red blood cell count is one digit after the decimal point, and the hemoglobin level is an integer, the following could be reported: “the mean (standard deviation) red blood cell count is 4.7 (0.4)×10 12 /L, and the mean (standard deviation) hemoglobin is 136 (12) g/L.”

How many decimal places should be reported for percentages? In most cases, percentages can be reported with one decimal place, and two decimals can be used for the main variables of interest. If the number of cases in the denominator is less than 100, the percentage has been recommended to be reported as an integer without retaining the decimal point [ 17 , 18 ]. For example, if 20 of 80 people were positive, the data can be reported as follows: “20 (25%) of 80 people had positive outcomes.” When the denominator is less than 100 cases, the change range is greater than 1% for each increase or decrease in the number of cases in the numerator.

For percentage reporting, first, if the total number of cases is too small (e.g., the denominator is less than 20), some articles have recommended not reporting percentages at all, because they can easily be misleading [ 19 , 20 ]. For example, if six of ten cases are effective, the conclusion that “60% of cases are effective” is not convincing. Reporting the percentage together with the number of cases or 95% confidence interval ( CI ) is added and conclusions should be drawn carefully. Second, if the reporting rate is the main research focus, reporting the 95% CI is recommended, to reflect the precision of the results. If only the rate is reported, the information provided is insufficient. For example, for the same incidence rate of 30%, the 95% CI for 30 of 100 cases is 21–39%, and that for 3000 of 10000 cases is 29.1–30.9%. The precision of the two results differs by 10 fold.

Must statistics and P values be reported for baseline comparisons? The requirements depend on the study design. For randomized controlled trials, reporting P values is not recommended, because such trials are randomly grouped, and randomization ensures that any differences between groups are by definition due to chance. In this case, statistical analysis is unnecessary and illogical [ 21 – 24 ]. For observational studies, however, owing to the lack of randomization, group differences may occur because of the selection of cases or exposures. Therefore, statistical analysis can be performed, and the statistics and P values can be reported.

How should the percentage of classified variables be presented? The percentages of categorical variables are usually displayed in two ways (as shown in Table 2 ), which convey different meanings. In Table 2 , when the total amount of the row is 100%, the incidence rate in men and women is emphasized. When the total of the column is 100%, the data indicate the proportions of men and women in the case and control groups.

Two Ways to Present Percentages.

The general principle for displaying percentages is that the total percentage for each group variable is 100%. As shown in Table 2 , if gender is used as a group variable, the total percentage for each row should be 100%. If the outcome (case or control) is the group variable, the total percentage of each column should be 100%.

Methods for Comparison of Groups

The comparison of groups can be used not only for the main research variables but also for the baseline characteristics. In experimental studies, cohort studies, case-control studies and cross-sectional surveys, comparison of groups can be used according to different purposes, and the meaning of the groups in various study types differs [ 25 – 27 ]. In experimental studies, the groups are usually intervention and non-intervention groups; in cohort studies, the groups are usually exposed and non-exposed groups; and in case-control studies, the groups are case and control groups.

Introduction to Methods

A variety of methods can be used for the comparison of groups [ 28 ]. Common methods and applications are shown in Table 3 .

Descriptive Statistics and Methods for Comparing Multiple Groups.

Presentation of Results

In most cases, because more than one research variable is compared between groups, the variables are listed in the leftmost column of the table, and the group variable is displayed as a column spanner, as shown in Table 4 .

Presentation of a Comparison of Two Groups.

Correlation Analysis

Methods for correlation analysias.

The use of correlation coefficients depends on the data type and data distribution [ 29 – 31 ]. In general, the Pearson correlation coefficient can be used for quantitative data conforming to a normal distribution; Spearman correlation can be used for quantitative data that do not follow a normal distribution. The correlation analysis between two nominal variables can be described by, e.g., the Pearson contingency coefficient or phi coefficient. The correlation between ordinal variables can be described by, e.g., the Kendall correlation coefficient or Spearman correlation coefficient.

Correlation Reporting

In cases with only several variables, reporting the mean, SD, correlation coefficient and 95% CI is recommended ( Table 5 ). If many variables are present, the mean and SD may not be listed, but reporting the correlation coefficient and 95% CI , instead of the correlation coefficient and P value, is recommended. Reporting values to two decimal places is recommended for the correlation coefficient and its 95% CI .

Mean, SD and Correlation Analysis for Four Variables.

When describing correlation analysis results, directly reporting the correlation coefficient is recommended, without subjectively describing the correlation as high, moderate or low. For example, “the correlation coefficient between OAI and AHI is 0.67 (0.62–0.71)” is recommended rather than “there is a high correlation between OAI and AHI.”

Regression Analysis

In medical research, regression analysis is commonly used in three applications [ 32 ]: (1) exploring risk factors, (2) correcting confounding factors and (3) establishing predictive models. The commonly used regression analysis methods are linear regression, logistic regression, Poisson regression and Cox regression, which correspond to the dependent variables for continual data, categorical data, count data and survival data, respectively.

Methods and Methodology

Before the application of regression models, the relevant assumptions must be met [ 33 ]. A linear regression model must satisfy the LINE assumption, that is, linearity, independence, normality and equal variance; logit regression with ordinal outcome must meet the proportional odds assumption; and Cox regression must meet the proportional hazards (PH) assumption.

A regression model should report different content according to the research purpose [ 34 ]. For example, for analyses aimed at correcting confounding factors, the main research factors and confounding factors must be clearly stated. For analyses aimed at exploring risk factors, the method for variable screening (such as the stepwise regression method or optimal subset method) must be explained. For analyses aimed at establishing a predictive model, the indicators used to reflect the goodness of fit of the model must also be explained; these indicators may include R-squared, the Akaike information criterion, the Bayesian information criterion, root mean square error, area under the ROC curve and 95% CI , specificity or sensitivity.

In linear regression analysis, the parameter estimation and its 95% CI , standardized regression coefficient (preferred if the numerical units of the respective variables are substantially different), standard error, t value and P value must usually be reported. If space is limited, reporting at least the parameter estimation and its 95% CI , instead of the parameter estimation and P value, is recommended. In logistic regression, parameter estimation, standard error, the Wald χ 2 , P value, and odds ratio ( OR ) with 95% CI must usually be reported. If space is limited, reporting at least the OR and its 95% CI is recommended. The reporting form for Poisson regression and Cox regression is similar to that for logistic regression, but the OR is substituted by the risk ratio and hazard ratio ( HR ), respectively.

Table 6 and Table 7 show routine reporting of linear regression and logistic regression results, respectively.

Results of Linear Regression for Life Satisfaction Rating.

Results of Logistic Regression for Cardiovascular Disease.

In a regression model, polytomous variables must be noted, such as living status in Table 7 . In most cases, polytomous variables should be included in the form of dummy variables with a pre-specified reference category [ 35 ], and the comparison results between other categories and the reference category should be reported. As shown in Table 7 , compared with that of living alone (reference category), the life satisfaction rating of living with a spouse (excluding other families) is 4.262 higher on average, that of living with family (excluding spouse) is 1.946 higher on average, and that of living with a spouse and family is 4.748 higher on average. As shown in Table 8 , compared with that of the <35 year age group, the OR values for cardiovascular disease in the 35–55 year age group and >55 year age group are 3.34 and 3.61, respectively.

Log-Rank Test for Survival Time.

Survival Analysis

Survival analysis is a series of analytic processes [ 36 ] including description, comparison of groups and regression analysis.

For the description of survival data, the survival rate and median survival time are usually estimated with the Kaplan-Meier method [ 37 ]. For comparison of groups of survival data, the log-rank test and Gehan-Breslow-Wilcoxon test are commonly used. The log-rank test, which tends to perform best toward the right side of the survival curve, is often used when the PH assumption is met [ 38 ], whereas the Gehan-Breslow-Wilcoxon test, which tends to perform best on the left side of the survival curve, is the fallback method when the PH assumption fails. For regression analysis of survival data, Cox regression, a semi-parametric method, is widely used, but the assumption of PH must be satisfied [ 39 ].

When survival analysis methods are introduced in an article, the following should be noted. (1) The starting time (such as follow-up after surgery) and outcome (such as death) should be clearly defined. (2) Statistical description indicators, usually the median survival time and its 95% CI , should be stated. Sometimes the median follow-up time may also be stated. (3) The estimation method for survival rate, such as the Kaplan-Meier method, should be stated. Of note, the Kaplan-Meier method is a method for estimating survival, not a statistical inference method. For example, it can be said that “the survival rate is estimated by Kaplan-Meier method” or “the Kaplan-Meier survival curve is drawn,” but it cannot be said that “the Kaplan-Meier method is used to compare the survival curves of the two groups.” (4) The method used for statistical inference should be stated. For example, the log-rank test should not be used if a clear intersection is present between survival curves. Cox regression should meet the PH assumption; otherwise, a non-PH model should be considered.

Reporting of Results

Statistical description.

The follow-up profile, such as the number of cases in each group or the number lost to follow up, should be stated. The median survival time and its 95% CI should also be reported; for example, “the median survival times of the three groups were 5.7 (3.7–8.0) months, 7.1 (4.6–7.9) months and 7.9 (2.3–13.0) months.” Sometimes, depending the purpose of the study, the survival rate at a fixed time point (95% CI ) can also be reported. For example, “the 1-year Kaplan Meier survival rates in the treatment group and the control group were estimated to be 0.677 (0.588–0.766) and 0.206 (0.173–0.239), respectively.”

Reporting the survival curve of the main analysis indicators ( Figure 1 ) is strongly recommended because it can visually indicate the changes in the survival rates in two or more groups. If possible, the number of people at risk at different follow-up times in each group in the survival curve should be reported. At the bottom of Figure 1 , the number of risk sets of the three dose groups at 0, 10, 20 and 30 months is shown. Reporting the 95% confidence band is recommended if only one survival curve is shown. Of note, the survival curve corresponds to the confidence band rather than the confidence interval. The confidence interval is the interval for each time point, and the confidence band is the interval of the entire survival function.

research methodology in medical research

Survival Curves for Three Groups.

Statistical Inference

When comparing survival data between groups, the median survival time should be reported if only one grouping variable is compared. The results of statistical analysis can be stated as text in the results, such as “the median survival times of the treatment group and the control group are 280 (159–352) days and 99 (67–151) days, respectively, and the difference between groups is statistically significant ( χ 2 =16.126, P <0.001).” If multiple grouping variables are present, the results of each variable should be displayed in a table, as shown in Table 8 .

If Cox regression is used for multiple analysis, the test output of the PH assumption must first be reported to validate that the model is applicable, followed by reporting of the results of the regression analysis. For Cox regression, parameter estimation, the standard error, Wald χ 2 , P value, HR and its 95% CI must usually be reported. If space is limited, reporting at least the HR and its 95% CI is recommended. The reported results of Cox regression multiple analysis are given in Table 9 .

Results of Cox Regression for Survival Time.

We provide a summary of the appropriate application and reporting of commonly used statistical methods, such as comparison of groups, correlation analysis, regression analysis and survival analysis. These recommendations do not include all statistical methods, nor do they establish a comprehensive standard. Instead, they are aimed at providing suggestions for clinical researchers, to avoid statistical application errors in medical articles. No single document can cover all statistical methods. Clinical researchers should consult a statistician with experience if necessary.

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Citation Information

Download Citation .

MS Thiese , ZC Arnold , SD Walker . The misuse and abuse of statistics in biomedical research. Biochem Med (Zagreb) 2015,25(1):5–11.

JS Gardenier , DB Resnik . The misuse of statistics: concepts, tools, and a research agenda. Account Res 2002,9(2):65–74.

VB Nyirongo , MM Mukaka , LV Kalilani-Phiri . Statistical pitfalls in medical research. Malawi Med J 2008;20:15–8.

YV Sebastião , SD St Peter . An overview of commonly used statistical methods in clinical research. Semin Pediatr Surg 2018,27(6): 367–74.

H Nour-Eldein . Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study. J Fam Med Prim Care 2016;5(1):24–33.

R Narayanan , R Nugent , K Nugent . An Investigation of the variety and complexity of statistical methods used in current internal medicine literature. South Med J 2015;108(10):629–34.

J He . Randomized controlled trial and statistical analysis methods for clinical medicine: the existing problems. Acad J Second Mil Med Univ 2006;27(7):697–700.

AD Althouse , JE Below , BL Claggett , NJ Cox , JA de Lemos , RC Deo , et al. Recommendations for statistical reporting in cardiovascular medicine: a special report from the American Heart Association. Circulation 2021;27;144(4):e70–91.

American Psychological Association. Publication manual of the American Psychological Association. 6th ed. Washington, DC; 2010.

AL Jorgensen , PR Williamson . Methodological quality of pharmacogenetic studies: issues of concern. Stat Med 2008;27:6547–69.

M Assel , D Sjoberg , A Elders , X Wang , D Huo , A Botchway , et al. Guidelines for reporting of statistics for clinical research in urology. BJU Int 2019;123:401–10.

AJ Vickers , MJ Assel , DD Sjoberg , R Qin , Z Zhao , T Koyama , et al. Guidelines for reporting of figures and tables for clinical research in urology. Urology 2020;142: 1–13.

Y Shi , Y Sun , C Hao , H Zhang , D Wei , Y Zhang , et al. Transfer of Fresh versus Frozen Embryos in Ovulatory Women. N Engl J Med 2018;378(2):126–36.

TA Lang , DG Altman . Basic Statistical reporting for articles published in biomedical journals: the “Statistical Analyses and Methods in the Published Literature” or The SAMPL Guidelines”. Nurs Stud 2015;52:5–9.

European Association of Science Editors. EASE guidelines for authors and translators of scientific articles to be published in English. 2018. Available from: https://www.ease.org.uk/wp-content/uploads/2018/11/doi.10.20316.ESE_.2018.44.e1.pdf .

F Habibzadeh , P Habibzadeh . How much precision in reporting statistics is enough? Croat Med J 2015;56:490–2.

TA Lang , M Secic . How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. 2nd ed. Philadelphia: American College of Physicians; 2006.

F Habibzadeh . Common statistical mistakes in manuscripts submitted to biomedical journals. Eur Sci Edit 2013;39:92–4.

HJ Priebe . The results. In: GM Hall . editor. How to write a paper. 3rd ed. London: BMJ Publishing Group; 2003. pp. 22–35.

F Habibzadeh . Statistical data editing in scientific articles. J Korean Med Sci 2017;32:1072–6.

MJ Knol , RHH Groenwold , DE Grobbee . P-values in baseline tables of randomised controlled trials are inappropriate but still common in high impact journals. Eur J Cardiovasc Prev Rehabil 2011,19(2):231–2.

GA Keriazes . Misuse of the p Value for baseline characteristics. Pharmacotherapy 2012;32(9):e172–3.

D Moher , S Hopewell , KF Schulz , V Montori , P Gøtzsche , P Devereaux , et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. Int J Surg 2012;10(1):28–55.

U Wadgave , MR Khairnar , Y Wadgave . Statistical issues in randomized controlled trials: an editorial. Electron Physician 2018;10(10):7293–8.

R Cataldo , M Arancibia , J Stojanova , C Papuzinski . General concepts in biostatistics and clinical epidemiology: observational studies with cross-sectional and ecological designs. Medwave 2019;19(8):e7698.

CJ Mann . Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emerg Med J 2003;20(1):54–60.

JW Song , KC Chung . Observational studies: cohort and case-control studies. Plast Reconstr Surg 2010;126(6):2234–42.

P Mishra , U Singh , CM Pandey , P Mishra , G Pandey . Application of student’st-test, analysis of variance, and covariance. Ann Card Anaesth 2019;22(4):407–11.

P Schober , C Boer , LA Schwarte . Correlation coefficients: appropriate use and interpretation. Anesth Analg 2018;126(5):1763–8.

AJ Bishara , JB Hittner . Testing the significance of a correlation with nonnormal data: comparison of Pearson, Spearman, transformation, and resampling approaches. Psychol Methods 2012;17(3):399–417.

KH Zou , K Tuncali , SG Silverman . Correlation and simple linear regression. Radiology 2003;227(3):617–22.

R Bender . Introduction to the use of regression models in epidemiology. Methods Mol Biol 2009;471:179–95.

P Ali , A Younas . Understanding and interpreting regression analysis. Evid Based Nurs 2021;24(4):116–8.

SK Singh , B Kaplan , SJ Kim . Multivariable regression models in clinical transplant research: principles and pitfalls. Transplantation 2015;99(12):2451–7.

TJ Cleophas , R Atiqi , AH Zwinderman . Handling categories properly: a novel objective of clinical research. Am J Ther 2012;19(4):287–93.

N Benítez-Parejo , MM Rodríguez del Águila , S Pérez-Vicente . Survival analysis and Cox regression. Allergol Immunopathol (Madr) 2011;39(6):362–73.

A Barakat , A Mittal , D Ricketts , BA Rogers . Understanding survival analysis: actuarial life tables and the Kaplan-Meier plot. Br J Hosp Med (Lond) 2019;80(11):642–6.

D Koletsi , N Pandis . Survival analysis, part 2: Kaplan-Meier method and the log-rank test. Am J Orthod Dentofacial Orthop 2017;152(4):569–71.

B George , S Seals , I Aban . Survival analysis and regression models. J Nucl Cardiol 2014;21(4):686–94.

Author and article information

Affiliations, author notes.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 Unported License (CC BY-NC 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See https://creativecommons.org/licenses/by-nc/4.0/ .

Comment on this article

This paper is in the following e-collection/theme issue:

Published on 22.2.2024 in Vol 26 (2024)

Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis

Authors of this article:

Author Orcid Image

Original Paper

  • Dongxiao Gu 1 , PhD   ; 
  • Qin Wang 1 , MD   ; 
  • Yidong Chai 1 , PhD   ; 
  • Xuejie Yang 1 , PhD   ; 
  • Wang Zhao 1 , PhD   ; 
  • Min Li 1 , MD   ; 
  • Oleg Zolotarev 2 , PhD   ; 
  • Zhengfei Xu 1 , MD   ; 
  • Gongrang Zhang 1 , PhD  

1 School of Management, Hefei University of Technology, Hefei, China

2 Russian New University, Moscow, Russian Federation

Corresponding Author:

Dongxiao Gu, PhD

School of Management, Hefei University of Technology

193 Tunxi Road

Hefei, 230009

Phone: 86 13866167367

Email: [email protected]

Background: Allergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens and inhalation irritants. Analyzing the potential risk factors that can trigger AR can provide reference material for individuals to use to reduce its occurrence in their daily lives. Nowadays, social media is a part of daily life, with an increasing number of people using at least 1 platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and experience the same afflictions. Notably, these channels promote the ability to share health information.

Objective: This study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of AR based on these social media comments. The main questions were as follows: How many comments contained AR risk factor information? How many categories can these risk factors be summarized into? How do these risk factors trigger AR?

Methods: This study crawled all the data from May 2012 to May 2022 under the topic of allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments. We improved the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorized annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely gaining insight into how risk factors trigger AR.

Results: Our classifier identified more comments containing risk factors than the other classification models, with an accuracy rate of 96.1% and a recall rate of 96.3%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common risk factors. We gained insight into the risk factors expressed in each category; for example, seasonal changes and increased temperature differences between day and night can disrupt the body’s immune system and lead to the development of allergies.

Conclusions: Our approach can handle the amount of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for individuals to reduce AR in their daily lives. The experimental data also provide a potential pathway that triggers AR. This finding can guide the development of management plans and interventions for AR.

Introduction

Over the past few decades, the prevalence of chronic diseases has increased significantly, becoming a global public health concern. The World Health Organization has listed allergic diseases as one of the disease types that require priority research and prevention in the 21st century [ 1 ]. As a common chronic disease, allergic rhinitis (AR) is a multifactorial disease that is induced by environmental conditions or certain genes [ 2 ]. AR not only has a significant impact on individuals’ sleep, social life, and work attendance but also triggers comorbidities such as conjunctivitis, atopic dermatitis, and asthma [ 3 ]. Large-scale flow survey data showed that AR currently affects several people in China alone [ 4 ] and with an estimated prevalence between 15% and 20% worldwide [ 5 ]. The direct and indirect costs associated with the management of AR are also a significant burden on society. For instance, the total cost of AR in Sweden, with a population of 9.5 million, was estimated at €1.3 (US $1.41) billion annually [ 6 ]. These unexpectedly high costs could be related to the high prevalence of disease, in combination with the previously often underestimated indirect costs that arise from reduced work efficiency and absenteeism and the potential costs associated with treating AR comorbidities [ 6 ].

Currently, there is no cure for AR, and individuals need to avoid the disease risk factors such as exposure to allergens and inhalation irritants [ 7 ] during the long self-management process. Therefore, identifying AR risk factors can provide a reference for patients to help reduce the condition in their daily lives [ 8 ].

A plethora of studies have been proposed to identify AR risk factors. These studies recruited participants with symptoms of AR and control participants without AR symptoms from a specific age group or a particular geographical area. These studies collected demographic information, lifestyle habits, family history, comorbidities, and residential areas through questionnaires. Subsequently, they used correlation methods to explore the relationship between these data and AR, aiming to identify the risk factors for AR within the specified age group or geographical area [ 9 ]. However, these studies have 2 limitations. First, these studies specifically target certain age groups or geographical areas, and questionnaires can only gather data on specific pieces of information. Owing to the constraints of questionnaire surveys, it is challenging to identify potential risk factors that may be present in individuals’ daily lives. As a result, the risk factors identified through survey-based studies have a limited scope and are incomplete. As such, they provide limited insights for a broader patient population. Second, the survey-based approach demands a commitment to long-term investigation and a substantial effort to collect representative responses [ 10 ]. In contrast, collecting information from social media platforms can cover large geographical areas at a comparatively low cost [ 10 ]. Social media platforms allow users to share experiences and opinions on various topics [ 11 , 12 ], including personal health issues [ 13 ]. Over time, highly unstructured and implicit knowledge has been generated in communities where users frequently participate [ 14 , 15 ], which can provide daily health records that are difficult to obtain from traditional questionnaire surveys. Therefore, social media can become a potential source of information for identifying risk factors for diseases such as AR [ 16 ].

Text-mining techniques are an effective tool for using voluminous social media data [ 17 ]. Some studies have combined social media data analysis to obtain knowledge about disease risk factors [ 18 , 19 ]. However, the abovementioned studies on disease risk factors used only shallow text features such as the number of social media text items and word cooccurrences, which are not conducive to identifying disease risk factors in the context of colloquial and diverse user expressions [ 20 ]. In this study, we designed a text-processing framework to automatically identify risk factors from social media data [ 21 ]. We used social media comments to construct a natural language processing–based AR risk factor identification method, aiming to tackle the problems of omission and low accuracy in traditional disease-related information identification methods that rely solely on shallow text features such as word frequency.

To be more specific, we developed an AR risk factor identification method that integrates pretrained word embeddings with text convolutional neural networks (CNNs). The Word2vec algorithm has proven to be superior in text vector representation [ 20 ]. This is a prediction-based approach that predicts the neighboring words that are most likely to appear within a window size around a center word in a corpus, resulting in high-dimensional vector representations that capture semantic aggregation. As social media users may mention related topics, such as symptoms and treatments, when describing risk factors in their comments, we used a local context window to achieve better semantic aggregation of AR risk factors, a method that has been demonstrated to be effective for such aggregation. In addition, using the Skip-gram model to train word pairs enables the incorporation of word thematic information, thus improving attention to risk factor phrases. The convolutional network can convolve the text in the word vector dimension and extract critical information through the max-pooling layer operation. In addition, this study used a clustering method with review mechanisms to concentrate on a large amount of text that contains risk factors within the observable range, thereby ensuring the usefulness of the content obtained through text mining.

Our main contributions were as follows:

  • First, this study proposed a framework (TopicS-ClusterREV) based on natural language processing for identifying the risk factors of AR. We used pretrained word embeddings and text convolutional networks to process social media text. Our model can identify more risk factors from social media comments with high accuracy and recall. To the best of our knowledge, this is the first study to use natural language processing techniques to identify risk factors for AR in social media comments.
  • Second, this study proposes a topic-enhanced word-embedding model. TopicS enhances the thematic information of words by adding a task that predicts the theme to which the center word belongs. This generates high-dimensional word vector representations with semantic aggregation and theme enhancement. We trained 2 types of word vectors using both the Skip-gram and TopicS models and separately input them into each risk factor classifier. The results showed that TopicS outperformed the baseline on the text classification task, demonstrating the effectiveness of our topic-enhanced word-embedding model.
  • Finally, we introduced automatic and manual review mechanisms to improve the single-pass algorithm, which allowed us to effectively identify and focus on a large amount of text that contains risk factors within the observable range. We ultimately identified 28 categories of risk factors including the common risk factors that lead to most individuals developing symptoms and previously overlooked risk factors that were not within the scope of previous research.

Identification of AR Risk Factors Through Surveys

AR has become a major global issue with a substantial increase in its prevalence in recent years. In Europe, the prevalence of AR among Danish adults progressively increased from 19% to 32% over the past 3 decades [ 22 ]. Understanding the risk factors, such as genetic, environmental, and lifestyle factors, helps in the management of AR, thus motivating many studies to focus on identifying potential risk factors. These studies are summarized in Table 1 . From Table 1 , we observed that the previous studies were based on survey methods, including cross-sectional surveys, cohort studies, and case-control studies.

a We searched for the literature related to AR risk factors and presented 9 papers from the past decade to showcase the methods and the identified risk factors.

These studies typically recruited participants with symptoms of AR and control participants without AR symptoms from a specific age group or a particular geographical area, collected demographic information through questionnaires, and then conducted correlation analysis, such as logistic regression, to explore the relationship between those metadata and AR [ 32 ]. For instance, Gao et al [ 9 ] conducted a cross-sectional survey to investigate the prevalence and risk factors of adult self-reported AR in the plain lands and hilly areas of Shenmu City in China and analyzed the differences between regions. The content of the web-based questionnaire included demographic factors, smoking status, the comorbidities of other allergic disorders, family history of allergies, and place of residence. The unconditional logistic regression analysis was used to screen for factors influencing AR. Finally, they found that the prevalence of AR existed in regional differences. Genetic and environmental factors were the important risk factors associated with AR. However, these studies have 2 limitations. First, these studies specifically targeted certain age groups or geographical areas, and questionnaires can only gather data on specific pieces of information. Owing to the constraints of questionnaire surveys, it is challenging to identify potential risk factors that may be present in individuals’ daily lives. As a result, the risk factors identified through survey-based studies have limited scope and are incomplete and they may provide limited insights for a broader patient population. Second, the survey-based approach demands a commitment to long-term investigation and a massive effort to collect representative responses [ 10 ].

Identification of Disease Risk Factors From Social Media Through Text Mining

Social media sites provide a convenient way for users to continuously update their day-to-day activities, which allows large groups of people to create and share information, opinions, and experiences about health conditions through web-based discussion [ 11 ]. Hence, social media can be considered a new data source to assess population health. As shown in Table 2 , some studies have combined text-mining techniques to classify and summarize voluminous social media data to obtain knowledge about chronic disease risk factors. Zhang and Ram [ 33 ] extracted behavioral features from Twitter posts of asthma users using keywords from an existing knowledge base. Griffis et al [ 34 ] collected 25,000 tweets containing and not containing diabetes, identified 5000 common words, used logistic regression to determine which common words were high-frequency expressions of diabetes, and finally grouped these high-frequency words using latent Dirichlet allocation to obtain the risk factors for diabetes. Schäfer et al [ 35 ] used syntactic analysis to identify portions of risk factors occurring before or after causal terms, grouped these portions using latent Dirichlet allocation, and obtained the risk factors for gastric discomfort. Pradeepa et al [ 19 ] performed clustering on stroke-related tweets using the Probability Neural Network, used the Apriori algorithm to identify frequent word sets related to risk, and thus identified risk factors for stroke [ 19 ]. In addition to the aforementioned approaches that use shallow text features such as keywords, frequent word sets, high-frequency words, and syntactic features for disease risk factor identification, other studies [ 36 - 38 ] trained risk factor classifiers using machine learning methods such as Naive Bayes, Maximum Entropy Model, and Naive Bayes Classifier–Term Frequency Inverse Document Frequency. These classifiers predict the presence of risk factors in text based on discrete vector representations such as bag-of-words and n-gram.

a We searched for studies related to identifying disease risk factors based on social media data. We found 7 papers from the past decade, highlighting the social media platforms, data, methods, features, diseases, and risk factors involved in research.

b LDA: latent Dirichlet allocation.

c MLP: multilayer perceptron.

The current methods for identifying disease risk factors on social media fall into 2 categories: shallow text feature methods and discrete word vector representations. Shallow text feature techniques often fail to capture important risk factors resulting in low accuracy, whereas discrete word vector approaches struggle to keep up with the dynamic vocabulary of social media text, missing new words, and trending expressions, thus inadequately representing the information conveyed.

Word Embedding and Text Classification Based on Deep Learning

Natural language processing technology promotes text analysis based on social media comments [ 39 ]; this technology can learn the deeper semantic features of the comment text and the features that are consistent with the current context, according to different training corpus, to input a better text vector representation for downstream classification tasks. Some researchers have used large-scale pretrained language models [ 40 ], global matrix decomposition [ 41 ], and local context windows [ 42 ] for text vector representation. Local context windows are more suitable for semantically aggregating AR risk factors [ 43 ]. Skip-gram and Continuous Bag-of-Words Model (CBOW) are prediction-based methods that learn the semantic representation of a center word by predicting the most likely neighboring words within a window size in a corpus. When users narrate risk factors in their comments, they may also mention symptoms, treatments, and other topics. These global contexts may dilute the key features of the risk factors expression. CBOW averages the context words to predict the target word and tends to predict high-frequency words in the corpus. In contrast, Skip-gram gives each word a chance to be a center word, making it better at predicting rare words compared with CBOW [ 44 ]. Therefore, in situations where social media users express a wide variety of ideas, the Skip-gram model can yield satisfactory outcomes. Moreover, the Skip-gram approach uses word pair training, which facilitates the incorporation of topic information into words [ 45 ], resulting in the generation of high-dimensional word vectors that feature semantic aggregation and topic enhancement. Therefore, we selected Skip-gram as the word-embedding model for our study.

Text classification has evolved to deep learning models, mainly including CNN-based models [ 46 ], recurrent neural network (RNN)–based models [ 47 ], and transformer models [ 48 ]. For the CNN algorithm, convolutional networks can convolve text on the word vector dimensions and extract key information through pooling layer operations. Consequently, this algorithm is capable of using essential data for classification tasks. Therefore, we used TextCNN for classifier training and evaluated the performance of RNN and transformer models on this task.

The framework used in this study consisted of 3 parts as shown in Figure 1 . The first part was data collection and processing, aimed at obtaining a clean data set. The second part was risk factor identification, which included the proposed TopicS method and training of a risk factor classifier. The implementation steps were as follows: (1) semiautomatically constructing a risk factor topic dictionary, (2) generating high-dimensional word vectors enhanced by TopicS-generated topics, and (3) vectorizing annotated text and training a risk factor classifier. The third part is text clustering and keyword extraction, which uses the ClusterREV method to cluster the identified risk factors and extract keywords from every category.

research methodology in medical research

Zhihu is a Chinese social media platform where people discuss topics in an web-based forum format. In May 2022, the Zhihu subcommunity allergic rhinitis had 1.04 million discussions. The posts on this social media platform allow other users to comment [ 49 ], and people can explain their situations to provide support or seek help effectively. Therefore, these comments provide a rich source of data for investigating the risk factors reported by different users [ 50 ]. In this study, we trained domain-specific word representations based on experimental data. A relatively domain-specific input corpus [ 51 ] is better at extracting meaningful semantic relations than a generic pretrained language model [ 52 ]. We crawled all the data from May 2012 to May 2022 under the topic allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments, including the post ID, comment ID, and post and comment content.

In this study, we preprocessed the data through regularization, stop word removal, and word separation. First, we removed special symbols, such as URLs and emoticons, in the comments through regularization and stop word removal to reduce the interference of noise with the text analysis task. Then, we compiled a dictionary of 169 specialized terms, including types of AR, medications, and comorbidities, to reduce the probability of incorrect word segmentation. After word separation, we obtained a lexicon of 68,863 words and ranked the words according to the number of occurrences. We found that the top 10,000 words accounted for 94.83% of the total words, suggesting that many words recurred and a relatively simple word vector could effectively train the model [ 53 ]. This further confirms the efficacy of our decision to use Skip-gram as the foundational model.

We observed ultrashort comment noise in the comments (eg, “Thank you!”). It is important to note that these ultrashort comments do not include any personal medical information. The ultrashort comments were filtered, resulting in 33,039 valid comments. This operation can effectively minimize the impact of noise on downstream text classification tasks. Table S1 in Multimedia Appendix 1 presents the examples of valid comments.

The data must be labeled before supervised learning and then trained end to end. If a comment directly mentions an allergen or indicates a condition that leads to the appearance or worsening of symptoms, the comment will be labeled as 1, indicating the presence of risk factors, as shown in Figure 2 .

research methodology in medical research

We randomly chose 2030 comments from the 33,039 comments, and 3 researchers labeled each comment as containing or not containing risk factors. To ensure high interannotator consistency, all 3 researchers annotated all 2030 comments. In cases with uncertainty in labeling, the 3 researchers discussed and arrived at a final label. After annotating and eliminating comments with religiously controversial content, 2000 labeled comments remained, consisting of 996 comments containing risk factors and 1004 comments not containing risk factors. The data set was divided into a 90% training set and a 10% test set. The 90% training set was further divided into 10 subsets, with 9 subsets used for training and the remaining subset used for validation, performing 10-fold cross-validation.

Topic Dictionary Construction

We used a combination of manual labeling and similarity calculation to identify keywords related to risk factors. Subsequently, we constructed a table of topic words using a semiautomated approach. The process of constructing the dictionary is depicted in Textbox 1 and is as follows: (1) label 400 randomly selected comments as described in the Annotation section, thereby obtaining 198 comments with risk factors; (2) extract risk factor phrases from annotated comments; (3) obtain risk factors topic word list; (4) remove duplicate word list, and the words in the current topic are used as seed words, word_set ; (5) use Skip-gram to find the top similar words to expand the topic words; (6) repeat steps 3 through 5 to expand the topic word; and (7) finally, obtain the topic words for the risk factor. A large weight was assigned to the risk factor theme words. Table S2 in Multimedia Appendix 1 shows examples of the risk factor topic dictionary.

Input: annotated comments

Output: topic dictionary

1. d i = Select Annotated data;

2. p i = Extract from d i

for w in p i :

list_i.append(w)

4. word_set=set(list)

5. for w in set: word_i.update(Skip-gram.mostsimilar(topn=n))

6. Loop step3, step4, step5

Ethical Considerations

As the use of text data from social media involves user privacy, this study adopted the following steps for deidentification: (1) We removed user account information and retained only anonymous comment information. (2) We used regular expressions to match and delete URLs and email addresses in the comments. (3) During the annotation process, annotators received only text that did not involve personal information. To evaluate the quality of deidentification, we randomly selected 500 text items for manual inspection and did not find any instances containing personal identity information. Our data are sourced from public discussions on Zhihu, a social media platform that can be accessed without registration. We followed strict ethical research protocols similar to the guidelines by Eysenbach and Till [ 54 ]. In addition, to protect the anonymity of participants, we have implemented measures including the removal of user information and avoiding verbatim quotations to prevent identification through search engines, protecting the privacy and security of personal data. It should be mentioned that our study was focused on the post level; we do not anticipate any negative ethical impact from our analysis.

Topic-Enhanced Word Embedding

TopicS performed 2 tasks during training, as shown in Figure 3 . The first task was to predict the neighboring words within the window of the central word. The second task was to predict the topic of the central word; the topic dictionary used for this purpose is described in the Topic Dictionary Construction section.

research methodology in medical research

The specific formula calculations for the loss function design, parameter updates, and error backpropagation of TopicS are explained subsequently.

First, we defined the loss function. For each word in the corpus, we used it as the central word for a sliding operation with a window size of c ; let S be the training sequence ( w 1 ,w 2 ,...,w T ), whereas w i denotes the i th word in the sequence. The subscript T represents the total number of unique words in the corpus. In addition to predicting the contextual word of the central word, we must also predict the topic score of the central word. Therefore, the loss function comprised 2 parts: L cont and L topic , and the overall loss was denoted by L s . Our training objective was to minimize the loss function:

research methodology in medical research

Finally, we can update the word representation.

Text Classification

In this study, we chose TextCNN as the classification model. In the risk factor identification task, some key semantic information is more important, and TextCNN can efficiently use the key information for classification with minimal cost consumption. We represented the manually annotated text as a vector matrix using high-dimensional word vector representations trained by the TopicS model, which aggregates local contextual and topic information and uses it as input for the TextCNN model. Then, the TextCNN algorithm leverages convolutional kernels of different sizes to extract multiple n-gram text features and uses convolutional operations in a fixed window to combine word representations to capture local information. Our input word vector combined the topic information of words, and the most important features in the convolution operation can be extracted using the maximum pooling operation as shown in Figure 4 .

research methodology in medical research

Clustering With a Review Mechanism

The clustering task is to group similar risk factors. In this study, a large amount of text containing risk factors was clustered into a manually observable number of categories, making it easier to comprehend their content. This study enhances the single-pass algorithm and integrates it with a manual review to cluster the risk factors identified in the text classification, ensuring the validity of the clustering results. The main concept of single-pass clustering [ 55 ] is to match informational text items based on their similarity values without the need to determine the number of clusters in advance. This makes it suitable for clustering tasks with an unknown number of clusters. However, traditional single-pass clustering uses only one-loop traversal, which may result in previously entered text items completing the traversal earlier. This can cause their similarity to the previous topics to be slightly lower than the threshold and lead to them being recreated as new categories, ultimately affecting the clustering effect.

As shown in Figure 5 , we improved the single-pass algorithm by retraversing the categories that were clustered separately after all the text items had been traversed to handle any missed text. After the automated clustering was completed, we conducted a manual review to ensure the reliability of the clustering.

research methodology in medical research

Moreover, this study uses a keyword cloud visualization of category content to quickly understand the themes and characteristics of each cluster and compare the differences between different clusters. TextRank [ 56 ] was selected to extract category keywords, which considers only the voting scores of words in a single document; common words that frequently appear in a single document easily obtain high scores [ 57 ]. We treated each category as a single document for keyword extraction. As risk factors appear more frequently in categories, TextRank can effectively extract risk factors and surrounding words, preserving category content information as much as possible and reflecting the true content of the risk factors.

In this section, we present the performance of the classifier and the findings based on the categorization of all the comments in the clean data set using the classifier. Our approach involved visualizing the clustering results of the risk factors to comprehend the primary elements of these factors. We also explored the pathogenic mechanisms associated with these risk factors.

Classifier Performance

We used standard text-mining evaluation metrics such as accuracy, precision, recall, and F 1 -score to evaluate the performance. Precision assesses how many risk factors the model identifies correctly, and recall measures how many risk factors the model can identify on its test set. As we aimed to identify as many AR risk factors as possible to provide comprehensive references for individuals, recall was more important than precision in our study.

We set 7-word embedding dimensions ranging from 100 to 400. Table 3 displays the classification results of the TextCNN classification model with the 7 dimensions of Skip-gram and TopicS word vectors. In addition, TextRNN and transformer models were evaluated with the 7-word embedding dimensions of TopicS or Skip-gram, as shown in Tables S3 and S4 in Multimedia Appendix 1 ; the classification models performed better when the word-embedding dimension was 100 or 150, as shown in Table 4 , which includes the results with best-performing dimensions. This study conducted word representation learning on a domain-specific input corpus, where low dimensionality was found to be sufficient to represent the features of the corpus [ 58 ]. Moreover, TopicS not only improved precision but also significantly increased recall for all 3 models, as shown in Table 4 .

a TopicS represents the topic-enhanced word-embedding model proposed in this paper.

b Italicization represents that the metrics of TopicS are better than Skip-gram for each metric.

a Embed_size represents the word-embedding size.

b Italicization represents that the metrics of TopicS are better than Skip-gram for each model.

Table 4 shows that TextCNN has the highest accuracy and recall rate among the 3 classification models. The highest accuracy achieved by our classification model was 0.9594, which used a 150-dimension word-embedding representation obtained from TopicS. In other words, TextCNN can detect more risk factors and minimize the loss of risk factors resulting from classification errors. The CNN model can extract key information similar to n-grams in sentences. The combination of TopicS and TextCNN can enhance topic information and achieve an aggregation effect. Our implementation process was the simplest and consumed the least resources. Our model examined 30,372 comments and identified 5221 comments containing risk factors.

Risk Factor Clustering Results

We clustered the text items obtained from the text classification into 28 categories and extracted keywords from each category to better understand the content. Table 5 shows the top 5 categories and their corresponding keywords. The complete list can be found in Table S5 in Multimedia Appendix 1 . We used category 1 as an example to explain the category formation process and demonstrate the validity of the qualitative results. As shown in Table 4 , we labeled category 1 as Season based on the analysis of keyword weights and relative comments. The comments related to this category focused on seasonally induced AR, with factors such as changes in the weather during seasonal transitions and colder temperatures during winter, which can exacerbate symptoms. We also counted the number of text items in each category and found that seasonal, regional, mites, and weather changes were common risk factors for most patients. In addition, patients’ unhealthy lifestyle habits were also important risk factors widely present in research investigations. Furthermore, most patients reported experiencing symptoms at specific times (eg, “morning”), but researchers have paid little attention to the timing of symptom occurrence (which we refer to as time points).

The Possible Pathway of Several Risk Factors Triggers AR

We referred to the relevant literature on the risk factors associated with AR to confirm whether the extracted risk factors were consistent with the general medical consensus. Our findings are novel compared with those in the literature [ 59 ]. Previous survey-based studies have explored only the correlation between risk factors and AR, whereas our experimental data provide insight into the potential pathogenesis of reported risk factors. The following section provides a theoretical discussion of potential pathways for several risk factors that trigger AR:

  • Season : (1) seasonal risk factors are manifested in pollen allergens. Tree allergens such as elm and cypress pollen are prevalent in early spring, followed by ash, pine, and birch pollen in late spring. In summer, grasses, artemisia, and flowering plants grow vigorously owing to increased rainfall, leading to increased pollen spread from these plants. In autumn, weeds account for the largest proportion of pollen allergens. (2) Different climatic conditions in different seasons contribute to the development of allergies. For example, in early spring, frequent cold and high-pressure air activity in East Asia causes intense atmospheric circulation, resulting in alternating hot and cold temperatures that impair the immune regulatory function of the human body, leading to increased allergy attacks. In autumn, changeable weather, large temperature differences, and sunlight and UV radiation can stimulate allergic reactions in people with weak lungs or those who are prone to AR. In addition, seasonal changes and increasing temperature differences between day and night can disrupt the human immune system.
  • Poor habits : major keywords for this topic were “smoking,” “staying up late,” and “resistance.” (1) Habits such as staying up late, lack of exercise, smoking, and alcohol abuse can weaken immunity and resistance. Gangl et al [ 60 ] found that smoking can reduce the integrity and barrier function of respiratory epithelial cells, thereby making smokers more susceptible to allergens. (2) An irregular diet can damage the spleen and stomach, which is also a key factor in the development of AR. (3) The frequent use of air conditioning in summer can cause nasal mucosa irritation owing to temperature fluctuations. Long-term exposure to adverse stimuli can cause dryness of the nasal cavity and weaken the resistance of the mucosal epithelium, which may lead to AR.
  • Allergens : we grouped clusters that included mites, plants, food, animals, and mold as allergens. (1) The findings of this study suggest that dust mites are the primary allergen, and exposure to a certain concentration of indoor dust mites can lead to AR. The ideal humidity level for dust mite growth is between 75% and 80%, and dust mites tend to thrive during spring and autumn and in warm and humid environments. Studies have shown that a large number of dust mites may be attached to uncleaned air conditioning filters, confirming that air conditioning is an important route of transmission for household dust mites [ 61 ]. (2) Allergenic pollen species are closely related to regions and seasons, and some regions now provide pollen concentration and allergy index broadcasts based on meteorological conditions, which is highly convenient for individuals experiencing allergy. (3) Food allergens such as milk, eggs, wheat, soybeans, and peanuts can also trigger AR. (4) Apart from dust mites, other perennial indoor allergens include animal dander, cockroach excrement, and molds.
  • Outdoor environment : this topic had “dust,” “air quality,” “trust,” and “allergen” as high scoring words. (1) Various substances present in the outdoor environment can trigger AR. Industrialization has increased the content of aromatic hydrocarbon particles, ethanol, and formaldehyde in diesel exhaust, which can damage the mucous membrane and serve as a strong stimulus for AR attacks. (2) Air pollution can affect the distribution of allergens such as mold and pollen. In hazy weather, allergens tend to stay in the air longer, increasing the chance and duration of contact with the human body and leading to AR. (3) High winds can raise dust, pollen, mites, bacteria, and other allergenic factors, increasing their concentration in the air and making it easier to trigger AR.
  • Time points : patients with AR are more likely to experience symptoms during 2 specific time points, morning and evening. Schenkel et al [ 62 ] assessed the severity of 4 nasal symptoms (sneezing, blockage, nasal runny nose, and nasal itch) at different times of the day, revealing that morning and evening symptoms were the most severe. This may be because of the circadian rhythm, pollen concentration, or personal behavior exacerbating the symptoms. In the evening, when the wind subsides, pollen settles closer to the ground and can be inhaled more easily. In addition, although humans rest at night in a horizontal position, nasal ventilation may be more difficult, leading to more severe symptoms. In the morning, low temperatures can cause congestion and swelling of the nasal mucosa because of the temperature difference between the environment and the body. This cluster had words such as “evening,” “get up early,” and “nose” as highly rated words.

This theoretical discussion regarding the potential pathway of risk factors that trigger AR can guide the development of detailed AR intervention measures. For example, patients with AR can pay attention to pollen concentration and temperature changes and adjust their outings and clothing accordingly based on the characteristics of the season; they can set the air conditioner to turn on or off based on their waking time to reduce the inhalation of cold air when waking up. Furthermore, they can adjust their sleeping position to reduce the frequency of nighttime symptoms.

Principal Findings

This study aimed to identify the risk factors for AR based on social media comments. To do so, a data set of comments related to AR was collected, processed, and analyzed. The data set covered a consecutive period from May 2012 to May 2022. Overall, this analysis provided new insights into three main questions: (1) How many comments contained AR risk factor information? (2) How many categories can these risk factors be summarized into? (3) How do these risk factors trigger AR?

In assessing the identification of AR risk factors, we found that TopicS enhanced both precision and recall. TextCNN outperformed other models, achieving an accuracy of 0.9594 with a 150-dimension TopicS embedding. Analyzing 30,372 comments, our model pinpointed 5221 comments with risk factors. Categorizing the text items led to 28 distinct categories, with seasonal factors, regional variations, mites, weather changes, and unhealthy lifestyle habits emerging as common risks.

Furthermore, our research into AR risk factors revealed how risk factors trigger AR and uncovered the frequently reported, but underresearched, risk factors by affected individuals. Seasonal changes, especially during spring and autumn, increase exposure to pollen allergens, with varying climatic conditions affecting the development of allergies. Poor habits, such as smoking, irregular sleep, and frequent use of air conditioning, compromise immunity and heighten AR susceptibility. Dust mites, influenced by humidity, stand out as a primary allergen, with food items and indoor factors, such as animal dander, also triggering AR. Industrial pollutants and outdoor environmental factors amplify AR risk. Notably, AR symptoms intensify during mornings and evenings, which is likely influenced by circadian rhythms and environmental factors.

Limitations and Future Work

This study has some limitations. Our study was based on the self-reported nature of social media data, and the lack of more detailed information from the study participants was a concern. Our statistics showed that seasonal factors, regional variations, mites, weather changes, and unhealthy lifestyle habits emerge as common risk factors, which is consistent with the findings of other studies based on surveys. Although social media may lack in-depth patient information, it provides an effective method of collecting breadth of data. Social media data can be gathered 24 hours a day and are an extremely efficient way to rapidly update new knowledge into the risk factor knowledge base. In the future, our framework can be expanded in 2 ways. First, the framework can track the development trends and changes in AR risk factors by leveraging real-time internet data sets. Second, the framework can be generalized and extended to detect patterns, trends, and risk factors for other chronic diseases such as type 2 diabetes.

Conclusions

In this model improvement study, we proposed a topic-enhanced word-embedding model to improve the accuracy and recall of the text classification, namely to uncover less common or other types of risk factors based on social media data that have not been previously reported. The risk factors identified in this study can be a helpful reference for people with AR to reduce the development of the disease in their daily lives. This study establishes a knowledge base of potential risk factors for individuals who may not be aware of the factors that could trigger their symptoms. Patients can compare their lifestyle habits and medical history to identify their risk factors, which could help reduce the frequency of episodes and prevent the decline in their quality of life caused by blindly avoiding potential triggers. Our findings demonstrate the practicality and feasibility of using social media data for investigating disease knowledge. These findings may provide guidance for the development of management plans and interventions for AR.

Acknowledgments

The data set collection and analysis of this research were partially supported by the National Natural Science Foundation of China (grants 72131006, 72071063, and 72271082); Anhui Provincial Key Research and Development Plan Project (grant 2022i01020003); and the Fundamental Research Funds for the Central Universities (grant JS2023ZSPY0063).

Data Availability

The data sets generated and analyzed during this study are available from the corresponding author upon reasonable request.

Authors' Contributions

DG conceptualized and investigated the study. QW drafted the methodology, performed the software analysis, and prepared the original draft. YC reviewed and edited the draft. XY completed the investigation. WZ drafted the methodology and supervised the study. ML supervised the study. ZX conceptualized the study. GZ and ZO supervised the study.

Conflicts of Interest

None declared.

Examples of social media text, topic dictionary examples, word-embedding dimension parameters with TextRNN, word-embedding dimension parameters with transformer, and social media category distribution and visualization.

Word cloud 1.

Word cloud 2.

Word cloud 3.

Word cloud 4.

Word cloud 5.

  • Pawankar R, Baena-Cagnani CE, Bousquet J, Walter Canonica G, Cruz AA, Kaliner MA, et al. State of world allergy report 2008: allergy and chronic respiratory diseases. World Allergy Org J. 2008;1:S4-17. [ CrossRef ]
  • Krishna MT, Mahesh PA, Vedanthan PK, Mehta V, Moitra S, Christopher DJ. The burden of allergic diseases in the Indian subcontinent: barriers and challenges. Lancet Glob Health. Apr 2020;8 (4):e478-e479. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Greiner AN, Hellings PW, Rotiroti G, Scadding GK. Allergic rhinitis. Lancet. Dec 17, 2011;378 (9809):2112-2122. [ CrossRef ] [ Medline ]
  • Wang XD, Zheng M, Lou HF, Wang CS, Zhang Y, Bo MY, et al. An increased prevalence of self-reported allergic rhinitis in major Chinese cities from 2005 to 2011. Allergy. Aug 13, 2016;71 (8):1170-1180. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Price D, Smith P, Hellings P, Papadopoulos N, Fokkens W, Muraro A, et al. Current controversies and challenges in allergic rhinitis management. Expert Rev Clin Immunol. Aug 29, 2015;11 (11):1205-1217. [ CrossRef ]
  • Cardell LO, Olsson P, Andersson M, Welin KO, Svensson J, Tennvall GR, et al. TOTALL: high cost of allergic rhinitis-a national Swedish population-based questionnaire study. NPJ Prim Care Respir Med. Feb 04, 2016;26 (1):15082. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Terreehorst I, Hak E, Oosting AJ, Tempels-Pavlica Z, de Monchy JG, Bruijnzeel-Koomen CA, et al. Evaluation of impermeable covers for bedding in patients with allergic rhinitis. N Engl J Med. Jul 17, 2003;349 (3):237-246. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhang Y, Zhang L. Increasing prevalence of allergic rhinitis in China. Allergy Asthma Immunol Res. Mar 2019;11 (2):156-169. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Gao H, Niu Y, Wang Q, Shan G, Ma C, Wang H, et al. Analysis of prevalence and risk factors of adult self-reported allergic rhinitis and asthma in plain lands and hilly areas of Shenmu City, China. Front Public Health. Jan 4, 2021;9:749388. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li L, Zhou J, Ma Z, Bensi MT, Hall MA, Baecher GB. Dynamic assessment of the COVID-19 vaccine acceptance leveraging social media data. J Biomed Inform. May 2022;129:104054. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Castillo A, Benitez J, Llorens J, Luo X. Social media-driven customer engagement and movie performance: theory and empirical evidence. Decis Support Syst. Jun 2021;145:113516. [ CrossRef ]
  • Stieglitz S, Dang-Xuan L. Emotions and information diffusion in social media—sentiment of microblogs and sharing behavior. J Manag Inf Syst. Dec 08, 2014;29 (4):217-248. [ CrossRef ]
  • Kumar A, Srinivasan K, Wen-Huang C, Zomaya AY. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf Process Manag. Jan 2020;57 (1):102141. [ CrossRef ]
  • Liu X, Wang GA, Fan W, Zhang Z. Finding useful solutions in online knowledge communities: a theory-driven design and multilevel analysis. Inf Syst Res. Sep 2020;31 (3):731-752. [ CrossRef ]
  • Lindelöf G, Aledavood T, Keller B. Dynamics of the negative discourse toward COVID-19 vaccines: topic modeling study and an annotated data set of Twitter posts. J Med Internet Res. Apr 12, 2023;25:e41319. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhu J, Li Z, Zhang X, Zhang Z, Hu B. Public attitudes toward anxiety disorder on Sina Weibo: content analysis. J Med Internet Res. Apr 04, 2023;25:e45777. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Shin D, He S, Lee GM, Whinston AB, Cetintas S, Lee KC. Enhancing social media analysis with visual data analytics: a deep learning approach. MIS Q. Dec 1, 2020;44 (4):1459-1492. [ CrossRef ]
  • Paul M, Dredze M. You are what you tweet: analyzing Twitter for public health. Proc Int AAAI Conf Web Social Media. Aug 03, 2021;5 (1):265-272. [ CrossRef ]
  • Pradeepa S, Manjula KR, Vimal S, Khan MS, Chilamkurti N, Luhach AK. DRFS: detecting risk factor of stroke disease from social media using machine learning techniques. Neural Process Lett. Jun 09, 2020;55 (4):3843-3861. [ CrossRef ]
  • Xie J, Liu X, Zeng DD, Fang X. Understanding medication nonadherence from social media: a sentiment-enriched deep learning approach. MIS Q. Feb 25, 2022;46 (1):341-372. [ CrossRef ]
  • Navale V, McAuliffe M. The integration of a canonical workflow framework with an informatics system for disease area research. Data Intell. 2022;4 (2):186-195. [ CrossRef ]
  • Zhang Y, Lan F, Zhang L. Advances and highlights in allergic rhinitis. Allergy. Nov 17, 2021;76 (11):3383-3389. [ CrossRef ] [ Medline ]
  • Chiang TY, Yuan TH, Shie RH, Chen CF, Chan CC. Increased incidence of allergic rhinitis, bronchitis and asthma, in children living near a petrochemical complex with SO pollution. Environ Int. Nov 2016;96:1-7. [ CrossRef ] [ Medline ]
  • Kurganskiy A, Creer S, de Vere N, Griffith GW, Osborne NJ, Wheeler BW, PollerGEN Consortium; et al. Predicting the severity of the grass pollen season and the effect of climate change in Northwest Europe. Sci Adv. Mar 26, 2021;7 (13):eabd7658. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lee JY, Lee J, Huh DA, Moon KW. Association between environmental exposure to phthalates and allergic disorders in Korean children: Korean National Environmental Health Survey (KoNEHS) 2015-2017. Int J Hyg Environ Health. Sep 2021;238:113857. [ CrossRef ] [ Medline ]
  • Paciência I, Cavaleiro Rufo J, Silva D, Mendes F, Farraia M, Delgado L, et al. Effects of indoor endocrine-disrupting chemicals on childhood rhinitis. J Investig Allergol Clin Immunol. Jun 18, 2020;30 (3):195-197. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Saulyte J, Regueira C, Montes-Martínez A, Khudyakov P, Takkouche B. Active or passive exposure to tobacco smoking and allergic rhinitis, allergic dermatitis, and food allergy in adults and children: a systematic review and meta-analysis. PLoS Med. Mar 11, 2014;11 (3):e1001611. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kong IG, Rhee CS, Lee JW, Yim H, Kim MJ, Choi Y, et al. Association between perceived stress and rhinitis-related quality of life: a multicenter, cross-sectional study. J Clin Med. Aug 19, 2021;10 (16):3680. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Han YY, Forno E, Gogna M, Celedón JC. Obesity and rhinitis in a nationwide study of children and adults in the United States. J Allergy Clin Immunol. May 2016;137 (5):1460-1465. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kanazawa J, Masuko H, Yatagai Y, Sakamoto T, Yamada H, Kitazawa H, et al. Association analyses of eQTLs of the TYRO3 gene and allergic diseases in Japanese populations. Allergol Int. Jan 2019;68 (1):77-81. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Alm B, Goksör E, Pettersson R, Möllborg P, Erdes L, Loid P, et al. Antibiotics in the first week of life is a risk factor for allergic rhinitis at school age. Pediatr Allergy Immunol. Aug 09, 2014;25 (5):468-472. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ho CL, Wu WF. Risk factor analysis of allergic rhinitis in 6-8 year-old children in Taipei. PLoS One. Apr 2, 2021;16 (4):e0249572. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhang W, Ram S. A comprehensive analysis of triggers and risk factors for asthma based on machine learning and large heterogeneous data sources. MIS Q. Jan 01, 2020;44 (1):305-349. [ CrossRef ]
  • Griffis H, Asch DA, Schwartz HA, Ungar L, Buttenheim AM, Barg FK, et al. Using social media to track geographic variability in language about diabetes: analysis of diabetes-related tweets across the United States. JMIR Diabetes. Jan 26, 2020;5 (1):e14431. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Schäfer F, Faviez C, Voillot P, Foulquié P, Najm M, Jeanne JF, et al. Mapping and modeling of discussions related to gastrointestinal discomfort in French-speaking online forums: results of a 15-year retrospective infodemiology study. J Med Internet Res. Nov 03, 2020;22 (11):e17247. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Oyebode O, Orji R. Detecting factors responsible for diabetes prevalence in Nigeria using social media and machine learning. In: Proceedings of the 15th International Conference on Network and Service Management (CNSM). 2019. Presented at: 15th International Conference on Network and Service Management (CNSM); October 21-25, 2019, 2019; Halifax, NS. [ CrossRef ]
  • Ramsingh J, Bhuvaneswari V. A big data framework to analyze risk factors of diabetes outbreak in indian population using a map reduce algorithm. In: Proceedings of the Second International Conference on Intelligent Computing and Control Systems (ICICCS). 2018. Presented at: Second International Conference on Intelligent Computing and Control Systems (ICICCS); June 14-15, 2018, 2018; Madurai, India. [ CrossRef ]
  • Ramsingh J, Bhuvaneswari V. An integrated multi-node Hadoop framework to predict high-risk factors of diabetes mellitus using a multilevel MapReduce based fuzzy classifier (MMR-FC) and modified DBSCAN algorithm. Appl Soft Comput. Sep 2021;108:107423. [ CrossRef ]
  • Alswedani S, Mehmood R, Katib I, Altowaijri SM. Psychological health and drugs: data-driven discovery of causes, treatments, effects, and abuses. Toxics. Mar 20, 2023;11 (3):102681. [ CrossRef ] [ Medline ]
  • Chung JE, Mustapha IZ, Li J, Gu X. Discourse about human papillomavirus (HPV)-associated oropharyngeal cancer (OPC) on Twitter: lessons for public health education about OPC and dental care. Public Health Pract (Oxf). Jun 2022;3:100239. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Neisani Samani Z, Karimi M, Alesheikh A. Environmental and infrastructural effects on respiratory disease exacerbation: a LBSN and ANN-based spatio-temporal modelling. Environ Monit Assess. Jan 04, 2020;192 (2):90. [ CrossRef ] [ Medline ]
  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. arXiv. Preprint posted online October 16, 2013. [ FREE Full text ] [ CrossRef ]
  • Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B. Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2014. Presented at: 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); June 22-27, 2014, 2014; Baltimore, MD. [ CrossRef ]
  • Yilmaz S, Toklu S. A deep learning analysis on question classification task using Word2vec representations. Neural Comput Appl. Jan 21, 2020;32 (7):2909-2928. [ CrossRef ]
  • Shi B, Fu Z, Bing L, Lam W. Learning domain-sensitive and sentiment-aware word embeddings. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. Presented at: 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); July 15-20, 2018, 2018; Melbourne, Australia. [ CrossRef ]
  • Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. Presented at: 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); October 25-29, 2014, 2014; Doha, Qatar. [ CrossRef ]
  • Zaremba W, Sutskever I, Vinyals O. Recurrent neural network regularization. arXiv. Preprint posted online September 8, 2014. [ FREE Full text ] [ CrossRef ]
  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. arXiv. Preprint posted online June 12, 2017.
  • Zhang K, Moe W. Measuring brand favorability using large-scale social media data. Inf Syst Res. Dec 2021;32 (4):1128-1139. [ CrossRef ]
  • Abbasi A, Li J, Adjeroh D, Abate M, Zheng W. Don’t mention it? Analyzing user-generated content signals for early adverse event warnings. Inf Syst Res. Sep 2019;30 (3):1007-1028. [ CrossRef ]
  • Guo Q, Chen W, Wan H. AOL4PS: a large-scale data set for personalized search. Data Intell. 2021;3 (4):548-567. [ CrossRef ]
  • Khatua A, Khatua A, Cambria E. A tale of two epidemics: contextual Word2Vec for classifying Twitter streams during outbreaks. Inf Process Manag. Jan 2019;56 (1):247-257. [ CrossRef ]
  • Gu D, Li M, Yang X, Gu Y, Zhao Y, Liang C, et al. An analysis of cognitive change in online mental health communities: a textual data analysis based on post replies of support seekers. Inf Process Manag. Mar 2023;60 (2):103192. [ CrossRef ]
  • Eysenbach G, Till JE. Ethical issues in qualitative research on internet communities. BMJ. Nov 10, 2001;323 (7321):1103-1105. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Shen D, Yang Q, Sun JT, Chen Z. Thread detection in dynamic text message streams. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 2006. Presented at: SIGIR '06; August 6-11, 2006, 2006; Seattle, WA. [ CrossRef ]
  • Mihalcea R, Tarau P. TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004. Presented at: 2004 Conference on Empirical Methods in Natural Language Processing; July 25-26, 2004, 2004; Barcelona, Spain. URL: https://aclanthology.org/W04-3252.pdf
  • Long S, Yan L. Rank-IDF: a statistical and network based feature words selection in big data text analysis. In: Proceedings of the 2020 5th International Conference on Mathematics and Artificial Intelligence. 2020. Presented at: ICMAI '20; April 10-13, 2020, 2020; Chengdu, China. [ CrossRef ]
  • Wright AP, Jones CM, Chau DH, Matthew Gladden R, Sumner SA. Detection of emerging drugs involved in overdose via diachronic word embeddings of substances discussed on social media. J Biomed Inform. Jul 2021;119:103824. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhou J, Zhang Q, Zhou S, Li X, Zhang X. Unintended emotional effects of online health communities: a text mining-supported empirical study. MIS Q. Mar 01, 2023;47 (1):195-226. [ CrossRef ]
  • Gangl K, Reininger R, Bernhard D, Campana R, Pree I, Reisinger J, et al. Cigarette smoke facilitates allergen penetration across respiratory epithelium. Allergy. Mar 23, 2009;64 (3):398-405. [ CrossRef ] [ Medline ]
  • Liu Z, Bai Y, Ji K, Liu X, Cai C, Yu H, et al. Detection of dermatophagoides farinae in the dust of air conditioning filters. Int Arch Allergy Immunol. Aug 10, 2007;144 (1):85-90. [ CrossRef ] [ Medline ]
  • Schenkel EJ. Effect of desloratadine on the control of morning symptoms in patients with seasonal and perennial allergic rhinitis. Allergy Asthma Proc. Nov 01, 2006;27 (6):465-472. [ CrossRef ] [ Medline ]

Abbreviations

Edited by A Mavragani; submitted 19.04.23; peer-reviewed by X Liu, Y Cao; comments to author 12.10.23; revised version received 30.10.23; accepted 03.01.24; published 22.02.24.

©Dongxiao Gu, Qin Wang, Yidong Chai, Xuejie Yang, Wang Zhao, Min Li, Oleg Zolotarev, Zhengfei Xu, Gongrang Zhang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 22.02.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

share this!

February 20, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

Research team develops universal and accurate method to calculate how proteins interact with drugs

by Institute of Organic Chemistry and Biochemistry of the CAS

Can it take just a few minutes to calculate how proteins interact with drugs?

A research team from the Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences / IOCB Prague has developed a novel computational method that can accurately describe how proteins interact with molecules of potential drugs and can do so in a mere tens of minutes. This new quantum-mechanical scoring function can thus markedly expedite the search for new drugs. The research has been published in the journal Nature Communications .

The study demonstrates that this is the first universally applicable method of its kind. IOCB Prague computational experts tested it on 10 proteins of different levels of structural complexity, each binding a large variety of small molecules (usually referred to as ligands). They then compared their results not only with those of other corresponding methods, but also with findings of laboratory experiments, and both comparisons turned out very favorably.

"Of course, we are not the only ones working on this. There are several such methods. Usually, however, their speed is offset by low accuracy whereas more accurate calculations can take several days. Our methods are unique in that they can process information about large molecular systems within tens of minutes while retaining the benefits of much more demanding quantum-mechanical calculations," explains Jan Řezáč, corresponding author of the article from the Non-Covalent Interactions group led by Prof. Pavel Hobza.

Experts from this group have been studying intermolecular interactions for a long time. In this research they focus mainly on biomolecules, and the results of their work directly bear on the computer-aided design of drugs. The reason is that when scientists work toward a new drug, they often look for molecules that bind strongly to a particular protein.

Identifying them, however, is akin to finding needles in a haystack, as large numbers of molecules have to be tested to set apart those that show promise. This considerably slows down the discovery of medicinal substances and makes it more expensive. By predicting the strength of protein –ligand binding, and thus singling out molecules that best satisfy a defined set of criteria, computational chemists spare the work of experimenters, which, in turn, significantly accelerates drug discovery .

Journal information: Nature Communications

Provided by Institute of Organic Chemistry and Biochemistry of the CAS

Explore further

Feedback to editors

research methodology in medical research

Neurobiology: Examining how bats distinguish different sounds

6 hours ago

research methodology in medical research

Study shows orchid family emerged in northern hemisphere and thrived alongside dinosaurs for 20 million years

7 hours ago

research methodology in medical research

Three years later, the search for life on Mars continues

research methodology in medical research

Study finds a smoking gun for the spread and evolution of antibiotic resistance

research methodology in medical research

Live imaging reveals key cell dynamics in 3D organ formation in Drosophila

research methodology in medical research

Starving mosquitoes for science

8 hours ago

research methodology in medical research

Chemists synthesize unique anticancer molecules using novel approach

research methodology in medical research

New study shows similarities and differences in human and insect vision formation

research methodology in medical research

AI helps provide the most complete map of interactions key to bacterial survival

research methodology in medical research

Biology textbooks do not provide students with comprehensive view of science of sex and gender, say professors

9 hours ago

Relevant PhysicsForums posts

Force that causes ions to move to a lower concentration.

Feb 20, 2024

Differences between ph meters for solutions, creams and oils

Feb 16, 2024

Freon filled balloons go flat QUICKLY! Why?

Feb 13, 2024

Help, I have made a huge mistake with copper sulfate!

Feb 9, 2024

Trying to impress my 8th grade students, made some unknown stuff

Feb 8, 2024

Regenerating ion exchange resin

Jan 29, 2024

More from Chemistry

Related Stories

research methodology in medical research

Faster modeling of interactions between ligands and proteins

Nov 29, 2019

research methodology in medical research

New mapping method illuminates druggable sites on proteins

Jan 2, 2024

research methodology in medical research

Artificial intelligence for drug discovery offers up unexpected results

Nov 13, 2023

research methodology in medical research

New tool developed to efficiently predict relative ligand binding affinity in drug discovery

Oct 25, 2023

Computer simulation of receptors reveals a new ligand-binding site

Jul 26, 2018

research methodology in medical research

From molecule to medicine via machine learning

Dec 16, 2020

Recommended for you

research methodology in medical research

Looking at the importance of catalyst sites in electrochemical CO₂ conversion

research methodology in medical research

New class of 'intramolecular bivalent glue' could transform cancer drug discovery

Feb 21, 2024

research methodology in medical research

Unraveling the pH-dependent oxygen reduction performance on single-atom catalysts

research methodology in medical research

Magnetic effects at the origin of life? It's the spin that makes the difference

research methodology in medical research

Researchers synthesize a new manganese-fluorine catalyst with exceptional oxidizing power

Let us know if there is a problem with our content.

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

  • Open access
  • Published: 16 February 2024

Application of ALSO course in standardized training Resident in Obstetric

  • Li Zhiyue 1 &

BMC Medical Education volume  24 , Article number:  151 ( 2024 ) Cite this article

317 Accesses

Metrics details

To explore the teaching effect of Advanced Life Support in Obstetrics (ALSO) Course in the standardized training resident in obstetric.

60 residents of obstetrics from January 2021 to December 2022 were randomly divided into two groups, observation group and control group. The experimental group used ALSO teaching method, and the control group used traditional teaching method. The teaching effect was evaluated by theoretical examination, direct observation of procedural skills (DOPS) scale and mini clinical evaluation (Mini-CEX) scale.

The theoretical achievements of the observation group were significantly higher than that of the control group ( P  < 0.05). The pre-procedural preparation, safe analgesia, technique of procedure, aseptic technique, seeks help when necessary, post-procedural management, communication skills, humanistic care and overall performance score of the DOPS in the experimental group were higher than those in the control group ( P  < 0.05). The organization efficiency, humanistic qualities, manipulative skills, clinical judgment, medical interviewing skills and overall clinical competence score of the Mini-CEX in the experimental group were higher than those in the control group ( P  < 0.05).

Conclusions

ALSO teaching method has an ideal effect in the standardization training of residents of obstetrics, indicating the prospect of active in-depth research and expanded application.

Peer Review reports

Introduction

Standardized training of residents, as an important part of postgraduate medical education, is the necessary stage of training highly competent professionals [ 1 ]. Obstetrics and gynecology, particularly obstetrics, is characterized by high risk, emergency conditions, and strong integration; therefore, residency plays an essential role in the development of a clinical medicine graduate into an obstetrics clinician [ 2 , 3 ]. The traditional model of medical education is still founded on classroom lectures, and trainees are frequently passive recipients, which is not conducive to the development of their clinical problem analysis and practical skills [ 4 ]. The Advanced Life Support in Obstetrics (ALSO) course is a clinically based, interprofessional, multidisciplinary educational program that aims to promote obstetricians’ learning of knowledge and skills through training in the most recent international obstetrical knowledge and skills, using rational methods to effectively manage critical, severe, and emergency obstetrical conditions, and standardizing obstetrical operations, with the ultimate goal of enhancing obstetric care [ 5 ]. In this study, we analyzed the impact of a standardized obstetric advanced life support course on the standardized training of obstetric residents and provided a foundation for future in-depth research on the standardization of resident training.

Materials and methods

Participants.

Sixty participants were all resident doctors who received Standardized Residency Training in Obstetrics and Gynecology in our department between January 2021 and December 2022 were used as study subjects and were divided into a control group ( n  = 30) and an observation group ( n  = 30) using the random number table method, with the control group receiving the traditional teaching method and the observation group receiving the ALSO teaching method; both groups had no prior work experience. In terms of gender, age, education, and training year, there was no statistically significant difference ( P  > 0.05) between the two categories.

Teaching methods

The control group trainees were trained using traditional teaching methods, with the instructor demonstrating the theoretical knowledge of the selected teaching content via blackboard-writing, teaching aids, multimedia, etc., and then leading the residents to perform skill demonstrations in the clinical skill center, such as physical examination and history collection, etc. The residents practiced independently after the demonstration.

The observation group adopted the ALSO teaching method and the specific content of ALSO courses for the residents in the observation group is shown in Table S1 . The courses were divided into a theoretical class and a hands-on practical component, with required reading and lectures, slide presentations, and practical exercises for each chapter.

With the example of shoulder dystocia, the ALSO teaching procedure was introduced. Firstly, the instructor introduces the participants to the learning objectives of the course, including recognition of risk factors, identification and standardized management for shoulder dystocia. Then, the instructor correctly demonstrates the administration of shoulder dystocia using the HELPERR mnemonic, consist of H (call for help), E (evaluate for episiotomy), L (legs), P (suprapubic pressure), E (enter maneuvers), R (remove the posterior arm), R (roll the patient). Thereafter, clinical scenarios were set up for students to practice on simulated individuals. Ultimately, the participants reviewed the process based on the contents of the records, and then the instructor conducts an evaluation, highlighting the deficiencies and strengths, answering any concerns raised by the participants, and providing a general assessment. All of the interviews were recorded.

Evaluation of teaching effect

After the training, theoretical and technological assessment were conducted, including, (1) Theoretical assessment of obstetrics: The theory test is mainly composed of objective questions, which is centered on clinical practical application. The full score of the test was 100 points, including 80 points for eighty single-choice questions and 20 points for four short-answer questions. Spearman correlation coefficient was calculated to assess the split-half reliability for the theoretical test. The split-half coefficient for the theoretical test was 0.803, indicating the designed theoretical test possessed high reliability. (2) Operational assessment of obstetrics: Direct Observation of Procedural Skills (DOPS) rating scales were used for assessment of procedural skills (including understanding of indications, obtaining informed consent, preparation of pre-procedure, appropriate level of pain relief, technical ability, aseptic technique, seeking help where appropriate, post-procedure management, communication skills, consideration of patient and overall ability to perform the procedure). Mini Clinical Evaluation (Mini-CEX) were used for clinical comprehensive ability (including history taking, physical examination skills, organization efficiency, humanistic qualities, clinical operating ability, clinical judgement, health education, communication skills and overall clinical competency). Both two tests were marked on a 9-point scale (1–3: to be strengthened, 4–6: up to standard, 7–9: excellent).

The study protocol was approved by the Ethics Committee of Northern Jiangsu People’s Hospital (ethical review: 202,078). All subjects provided informed consent prior to their inclusion in the study. All the procedures performed in this study were in accordance with the principles of the Declaration of Helsinki.

Statistical analysis

SPSS 22.0 (SPSS Inc, Chicago, IL) was used for the data Analysis. The counting data were expressed in frequencies, and the measurement data were expressed as mean ± standard deviation. Prior to the statistical analysis, the comparison data were normally distributed. Subsequently, count data were compared using Chi-square test and measurement data were compared using the independent sample t test for the two groups. P  < 0.05 was considered a statistically significant difference.

Basic characteristics and information

As shown in Table  1 , there were no significant differences between control group and observation group in terms of gender, age, educational background or grades ( P  > 0.05).

Theoretical assessment of obstetrics

The theoretical knowledge scores in the ALSO group were significantly higher than those in the traditional group ( P  < 0.05, Table  2 ).

Comparison of items in DOPS tests in the traditional group and ALSO group

Based on the assessment results, the residents in ALSO group performed significantly better than those in traditional group in terms of DOPS tests in pre-procedural preparation, safe analgesia, technique of procedure, aseptic technique, seeks help when necessary, post-procedural management, communication skills, humanistic care and overall performance ( P  < 0.05, Table  3 ).

Comparison of items in Mini-CEX tests in the traditional group and ALSO group

The results of the Mini-CEX assessment was shown in Table  4 . The performance of ALSO group was better than traditional group in most aspects including organization efficiency, humanistic qualities, clinical operating ability, clinical judgement, health education, communication skills and overall clinical competency ( P  < 0.05).

The standardized resident training program in China serves as a crucial component for the ongoing education and system-based clinical work training of students who have successfully finished fundamental theoretical coursework at a medical college [ 6 ]. Obstetrics is distinguished by its dynamic clinical developments and encompassing complexities, necessitating a significant level of pragmatism and feasibility [ 7 ]. Consequently, it is of paramount importance to provide obstetrics residents with comprehensive training in clinical thinking ability and skill operation ability. In China, the traditional approach to educate residents encompasses several components, including the utilization of textbooks, participation in grand rounds, and the acquisition of knowledge through the guidance of experienced specialists [ 8 ]. This education pattern tends to produce only passive learning and limited participation. Furthermore, the traditional pedagogical curriculum places emphasis on developing individual fundamental operational skills, while neglecting to provide complete instruction for the trainee’s adaptive clinical proficiency [ 9 , 10 ]. Accordingly, it is imperative to investigate a novel instructional framework for obstetric residency training that can enhance the effectiveness and adaptability of teaching methods, ultimately leading to an enhancement in the quality of standardized training resident in obstetric.

ALSO course was first developed by the American Academy of Family Physicians in 1991, and introduced into China in 2002. The course places emphasis on integrating theoretical and practical instruction, distinguishing itself from conventional teaching methods. It prioritizes standardized simulation of practical training and pays close attention to students’ learning outcomes. By engaging in hands-on model teaching activities, students are equipped with the necessary skills to address obstetric issues. The course aims to assist clinical practitioners in effectively managing critical, serious, and emergency obstetric cases by employing appropriate and rational approaches. The health system and medical culture of China are very different from those in other countries [ 9 ]. Although ALSO course has been validated and used in many countries for many years, this is the first study to assess its use in an obstetric residency program in China.

In this study, compared with the traditional group, the ALSO group demonstrated superior theoretical scores and exhibited greater proficiency across most assessment items in DOPS and Mini-CEX. In terms of training content, the course covered diagnosis and treatment of vaginal bleeding during third trimester, analysis of electronic fetal heart monitoring, emergency handling, in prolapse of cord, management of shoulder dystocia, etc. The entire training process presents a multi-system crossover, integrating numerous knowledge points and skills into actual clinical scenarios. The content is generally consistent with a 2019 study in Novi Sad, Serbia [ 11 ]. While there are no additional fees for our ALSO course, which makes the program more acceptable for residents with low income in China. At this stage, it is appropriate to provide obstetrics residents, who have acquired comprehensive theoretical knowledge during their undergraduate education, with training in the ALSO program. These residents may face unexpected changes in patients’ conditions upon entering the training base, and their limited personal experience may hinder their ability to manage clinical situations promptly and effectively. From the actual implementation of the training, trainees interacted positively with the instructor and among trainees, boldly raised confusions, repeatedly operated and practiced, and showed enthusiastic enthusiasm and interest in learning. Based on the training impact observed in both theoretical and practical contexts, the trainees have acquired substantial benefits, particularly in terms of clinical reasoning and collaborative aptitude. The trainees widely acknowledge a notable enhancement in the cultivation of these two proficiencies, which can be regarded as a deficiency in conventional pedagogy and a pressing area for skill improvement among the trainees. A study in Australia showed that participants could gain a significantly increased amount of confidence and perceived knowledge as a result of completing the ALSO courses, which is similar with our findings [ 12 ]. ALSO teaching method differs from the traditional teaching method of spoon-feeding by integrating theory and practice. It emphasizes standardized and systematic practical training during the teaching process, thereby fostering the trainees’ practical problem-solving skills. This approach effectively compensates for the limitations of classroom teaching. This training model of ALSO in our study is basically in agreement with that in many developing countries and developed countries [ 5 , 11 ].

The limitations of our study are that it includes a low case number and is a single-center study. A further large-scale study should be planned to confirm its application. Furthermore, this instructional approach necessitates a substantial commitment of time and effort from the clinical teachers in terms of pre-class preparation, hence creating a potential clash with the demanding clinical responsibilities. Consequently, it is imperative to increase the participation of educators in collaborative teaching efforts aimed at fostering ALSO program promotion.

In summary, the utilization of ALSO course in the standardized training of obstetrics residents has the potential to significantly improve residents’ theoretical and practical performances, and ultimately enhance the overall quality of standardized training resident in obstetric.

Data availability

Data that supports the findings of this study are within the article. Further data is available from the corresponding author upon reasonable request.

Johnson GJ, Kilpatrick CC, Zaritsky E, Woodbury E, Boller M, Burton M, Asfaw T, Ratan BM. Training the Next Generation of obstetrics and Gynecology leaders, a multi-institutional needs Assessment. J Surg Educ. 2021;78(6):1965–72.

Article   PubMed   Google Scholar  

Alston MJ, Autry A, Wagner SA, Kohl-Thomas BM, Ehrig J, Allshouse AA, Gottesfeld M, Stephenson-Famy A. Attitudes of trainees in Obstetrics and Gynecology regarding the structure of Residency Training. Obstet Gynecol. 2019;134:22S–8.

Dotters-Katz S, Gray B, Heine RP, Propst K. Resident Education in Complex Obstetric procedures: are we adequately preparing tomorrow’s obstetricians? Obstet Gynecol. 2019;134:42S–2.

Article   Google Scholar  

Marlier M, Chevreau J, Gagneur O, Sergent F, Gondry J, Foulon A. Practice and expectations regarding simulation for residents in obstetrics and gynecology. J Gynecol Obstet Hum Reprod. 2022;51(3):6.

McGready R, Rijken MJ, Turner C, Than HH, Tun NW, Min AM, Hla S, Wai NS, Proux K, Min TH, et al. A mixed methods evaluation of Advanced Life Support in Obstetrics (ALSO) and Basic Life support in obstetrics (BLSO) in a resource-limited setting on the Thailand-Myanmar border. Wellcome open Res. 2021;6:94.

Article   PubMed   PubMed Central   Google Scholar  

Deng GW, Zhao D, Lio J, Chen XY, Ma XP, Liang L, Feng CP. Strategic elements of residency training in China: transactional leadership, self-efficacy, and employee-orientation culture. BMC Med Educ. 2019;19(1):8.

Phillips JL, Heneka N, Bhattarai P, Fraser C, Shaw T. Effectiveness of the spaced education pedagogy for clinicians’ continuing professional development: a systematic review. Med Educ. 2019;53(9):886–902.

Chen Q, Li M, Wu N, Peng X, Tang GM, Cheng H, Hu LL, Yang B, Liao ZL. A survey of resident physicians’ perceptions of competency-based education in standardized resident training in China: a preliminary study. BMC Med Educ. 2022;22(1):9.

Article   CAS   Google Scholar  

Wang H, Liu Y. Thoughts on standardized training and teaching of physicians in residency. Asian J Surg. 2022;45(9):1732–3.

Pan GC, Zheng W, Liao SC. Qualitative study of the learning and studying process of resident physicians in China. BMC Med Educ. 2022;22(1):12.

Namak SY, Vejnovic A, Vejnovic TR, Moore JB, Kirk JK. Change in knowledge and Preferred scenario responses after completion of the Advanced Life Support in Obstetrics Course in Serbia. Fam Med. 2019;51(10):850–3.

Walker LJM, Fetherston CM, McMurray A. Perceived changes in the knowledge and confidence of doctors and midwives to manage obstetric emergencies following completion of an Advanced Life Support in Obstetrics course in Australia. Aust N Z J Obstet Gynaecol. 2013;53(6):525–31.

Download references

Acknowledgements

The author would like to acknowledge and thank all faculty members and students who volunteered to participate in this study.

This work was supported from Program of Chinese Society of Medical Education (No. 2018B-N02129).

Author information

Authors and affiliations.

Clinical Medical College of Yangzhou University, 225001, Yangzhou, China

Li Zhiyue & Lu Dan

You can also search for this author in PubMed   Google Scholar

Contributions

Li Zhiyue: manuscript writing and data management. Lu Dan: project development, manuscript editing. All authors reviewed the manuscript.

Corresponding author

Correspondence to Lu Dan .

Ethics declarations

Ethics approval and consent to participate.

The study protocol was approved by the Ethics Committee of Northern Jiangsu People’s Hospital (ethical review: 202078). All subjects provided informed consent prior to their inclusion in the study. All the procedures performed in this study were in accordance with the principles of the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhiyue, L., Dan, L. Application of ALSO course in standardized training Resident in Obstetric. BMC Med Educ 24 , 151 (2024). https://doi.org/10.1186/s12909-024-05126-6

Download citation

Received : 31 August 2023

Accepted : 02 February 2024

Published : 16 February 2024

DOI : https://doi.org/10.1186/s12909-024-05126-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Standardization training of residents
  • ALSO teaching method
  • Clinical skill and operation

BMC Medical Education

ISSN: 1472-6920

research methodology in medical research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Singapore Med J
  • v.59(12); 2018

Logo of singmedj

Qualitative research essentials for medical education

Sayra m cristancho.

1 Department of Surgery and Faculty of Education, Schulich School of Medicine and Dentistry, Western University, Canada

2 Centre for Education Research and Innovation, Schulich School of Medicine and Dentistry, Western University, Canada

Mark Goldszmidt

3 Department of Medicine, Schulich School of Medicine and Dentistry, Western University, Canada

Lorelei Lingard

Christopher watling.

4 Postgraduate Medical Education, Schulich School of Medicine and Dentistry, Western University, Canada

This paper offers a selective overview of the increasingly popular paradigm of qualitative research. We consider the nature of qualitative research questions, describe common methodologies, discuss data collection and analysis methods, highlight recent innovations and outline principles of rigour. Examples are provided from our own and other authors’ published qualitative medical education research. Our aim is to provide both an introduction to some qualitative essentials for readers who are new to this research paradigm and a resource for more experienced readers, such as those who are currently engaged in a qualitative research project and would like a better sense of where their work sits within the broader paradigm.

INTRODUCTION

Are you a medical education researcher engaged in qualitative research and wondering if you are on the right track? Are you contemplating a qualitative research project and not sure how to get started? Are you reading qualitative manuscripts and making guesses about their quality? This paper offers a selective overview of the increasingly popular domain of qualitative research. We consider the nature of qualitative research questions, describe common methodologies, discuss data collection and analysis methods, highlight recent innovations, and outline principles of rigour. The aim of this paper is to educate newcomers through introductory explanations while stimulating more experienced researchers through attention to current innovations and emerging debates.

WHAT IS QUALITATIVE RESEARCH?

Qualitative research is naturalistic; the natural setting – not the laboratory – is the source of data. Researchers go where the action is; to collect data, they may talk with individuals or groups, observe their behaviour and their setting, or examine their artefacts.( 1 ) As defined by leading qualitative researchers Denzin and Lincoln, qualitative research studies social and human phenomena in their natural settings, attempting to make sense of or interpret these phenomena in terms of the meanings participants bring to them.( 2 )

Because qualitative research situates itself firmly in the world it studies, it cannot aim for generalisability. Its aim is to understand, rather than erase, the influence of context, culture and perspective. Good qualitative research produces descriptions, theory or conceptual understanding that may be usefully transferred to other contexts, but users of qualitative research must always carefully consider how the principles unearthed might unfold in their own distinct settings.

WHAT QUESTIONS ARE APPROPRIATE FOR QUALITATIVE RESEARCH?

Meaningful education research begins with compelling questions. Research methods translate curiosity into action, facilitating exploration of those questions. Methods must be chosen wisely; some questions lend themselves to certain methodological approaches and not to others.

In recent years, qualitative research methods have become increasingly prominent in medical education. The reason is simple: some of the most pressing questions in the field require qualitative approaches for meaningful answers to be found.

Qualitative research examines how things unfold in real world settings. While quantitative research approaches that dominate the basic and clinical sciences focus on testing hypotheses, qualitative research explores processes, phenomena and settings ( Box 1 ). For example, the question “Does the introduction of a mandatory rural clerkship increase the rate of graduates choosing to practise in rural areas? ” demands a quantitative approach. The question embeds a hypothesis – that a mandatory rural clerkship will increase the rate of graduates choosing to practise in rural areas – and so the research method must test whether or not that hypothesis is true. But the question “ How do graduating doctors make choices about their practice location? ” demands a qualitative approach. The question does not embed a hypothesis; rather, it explores a process of decision-making.

An external file that holds a picture, illustration, etc.
Object name is SMJ-59-622-g001.jpg

Qualitative research questions:

Many issues in medical education could be examined from either a quantitative or qualitative approach; one approach is not inherently superior. The questions that drive the research as well as the products that derive from it are, however, fundamentally different. Consider two approaches to studying the issue of online learning. A quantitative researcher might ask, “ What is the effect of an online learning module on medical students’ end-of-semester OSCE [objective structured clinical examination] scores? ”, while a qualitative researcher might ask “ How do medical students make choices about using online learning resources? ” Although the underlying issue is the same – the phenomenon of online learning in medical school – the studies launched by these questions and the products of those studies will look very different.

WHAT ARE QUALITATIVE METHODOLOGIES AND WHY ARE THEY IMPORTANT?

Executing rigorous qualitative research requires an understanding of methodology – the principles and procedures that define how the research is approached. Far from being monolithic, the world of qualitative research encompasses a range of methodologies, each with distinctive approaches to inquiry and characteristic products. Methodologies are informed by the researcher’s epistemology – that is, their theory of knowledge. Epistemology shapes how researchers approach the researcher’s role, the participant-researcher relationship, forms of data, analytical procedures, measures of research quality, and representation of results in analysis and writing.( 3 )

In medical education, published qualitative work includes methodologies such as grounded theory, phenomenology, ethnography, case study, discourse analysis, participatory action research and narrative inquiry, although the list is growing as the field embraces researchers with diverse disciplinary backgrounds. This paper neither seeks to exhaustively catalogue all qualitative methodologies nor comprehensively describe any of them. Rather, we present a subset, with the aim of familiarising readers with its fundamental goals. In this article, we briefly introduce four common methodologies used in medical education research ( Box 2 ). Using one topic, professionalism, we illustrate how each methodology might be applied and how its particular features would yield different insights into that topic.

An external file that holds a picture, illustration, etc.
Object name is SMJ-59-622-g002.jpg

Common qualitative methodologies in medical education:

Grounded theory

Arguably the most frequently used methodology in medical education research today, grounded theory seeks to understand social processes. Core features of grounded theory include iteration, in which data collection and analysis take place concurrently with each informing the other, and a reliance on theoretical sampling to explore patterns as they emerge.( 4 ) While many different schools of grounded theory exist, they share the aim of generating theory that is grounded in empirical data.( 5 ) Theory, in this type of research, can be thought of as a conceptual understanding of the process under study, ideally affording a useful explanatory power. For example, if one were interested in the development of professionalism among senior medical students during clerkship, one might design a grounded theory study around the following question: “ What aspects of clerkship support or challenge professional behaviour among senior medical students? ” The resulting product would be a conceptual rendering of how senior medical students navigate thorny professionalism issues, which might in turn be useful to curriculum planners.

Phenomenology

This methodology begins with a phenomenon of interest and seeks to understand the subjective lived experience of that phenomenon.( 6 ) Core features of phenomenology include a focus on the individual experience (typically pursued through in-depth interviewing and/or examinations of personal narratives), inductive analysis and a particular attention to reflexivity.( 7 ) Phenomenological researchers typically enumerate their own ideas and preconceptions about the phenomenon under study and consider how these perceptions might influence their interpretation of data.( 8 ) A phenomenological study around professionalism in senior medical students, for example, might involve interviewing several students who have experienced a professionalism lapse about that experience. The resulting product might be an enhanced understanding of the emotional, social and professional implications of this phenomenon from the student’s perspective, which might in turn inform wellness or resilience strategies.

Ethnography

Ethnography aims to understand people in their contexts, exploring the influence of culture, social organisation and shared values on how people behave – their routines and rituals. Core features of ethnography include reliance on direct observation as a data source, and the use of sustained immersive engagement in the setting of interest in order to understand social dynamics from within.( 9 , 10 ) An ethnographic approach to studying how professional attitudes develop in senior medical students might gather data through observations of ward rounds, team meetings and clinical teaching sessions over a period of time. The resulting product – called an ethnography – would describe how professional values are socialised in junior learners in clinical settings, which could assist educators in understanding how the clinical experiences they programme for their learners are influencing their professional development.

Case study research seeks an in-depth understanding of an individual case (or series of cases) that is illustrative of a problem of interest. Like clinical case studies, the goal is not generalisation but a thorough exploration of one case, in hopes that the fruits of that exploration may prove useful to others facing similar problems. Core features of case study research include: thoughtful bounding or defining of the scope of the case at the outset; collection of data from multiple sources, ranging from interviews with key players to written material in policy documents and websites; and careful attention to both the phenomenon of interest and its particular context.( 11 ) A specific professionalism challenge involving medical students could provide fodder for a productive case study. For example, if a medical school had to discipline several students for inappropriately sharing personal patient information on social media, a case study might be undertaken. The ‘case’ would be the incident of social media misuse at a single medical school, and the data gathered might include interviews with students and school officials, examination of relevant policy documents, examination of news media coverage of the event, and so on. The product of this research might trigger similar institutions to carefully consider how they might approach – or prevent – a similar problem.

As these four examples illustrate, methodology is the backbone of qualitative research. Methodology shapes the way the research question is asked, defines the characteristics of an appropriate sample, and governs the way the data collection and analysis procedures are organised. The researcher’s role is also distinctive in each methodology; for instance, in constructivist grounded theory, the researcher actively constructs the theory,( 12 ) while in phenomenology, the researcher attempts to manage his or her ‘pre-understandings’ through either bracketing them off or being reflexive about them.( 13 ) Interested readers may wish to consult the reference list for recently published examples of research using grounded theory,( 14 ) phenomenology,( 15 ) ethnography( 16 ) and case study approaches( 17 ) in order to appreciate how researchers deploy these methodologies to tackle compelling questions in contemporary medical education.

WHAT ARE SOME COMMON METHODS OF QUALITATIVE DATA COLLECTION?

The most common methods of qualitative data collection are interview – talking to participants about their experiences relevant to the research question, and observation – watching participants while they are having those experiences. Depending on the research questions explored, a research design might combine interviews and observations.

Interview-based methods

Interviews are typically used for situations where a guided conversation with relevant participants can help provide insight into their lived experiences and how they view and interpret the world around them. Interviews are also particularly useful for exploring past events that cannot be replicated or phenomena where direct observation is impossible or unfeasible.

Participants may be interviewed individually or in groups. Focus group interviews are used when the researcher’s topic of interest is best explored through a guided, interactive discussion among the participants themselves. Therefore, when focus groups are used, the sample is conceptualised at the level of the group – three focus groups of five people constitutes a sample of three interactive discussions, not 15 individual participants. Because they centre on the group discussion and dynamic, focus groups are less well-suited for topics that are sensitive, highly personal or perceived to be culturally inappropriate to discuss publicly.( 18 )

Unlike quantitative interviews, where a set of structured, closed-ended (e.g. yes/no) questions are asked in the same order with the same wording every time, qualitative interviews typically involve a semi-structured design where a list of open-ended questions serves to guide, but not constrain, the interview. Therefore, at the interviewer’s discretion, the questions and their sequence may vary from interview to interview. This judgement is made based on both the interviewer’s understanding of the phenomenon under exploration and the emerging dynamic between the interviewer and participant.

The primary goal of a qualitative interview is to get the participants to think carefully about their experience and relate it to the interviewer with rich detail. Getting good data from interviewing relies on using creative strategies to avoid the common trap of getting politically correct answers – often called ‘cover stories’– or answers that are superficial rather than deep and reflective.( 19 ) A common design error occurs when researchers are overly explicit in their questioning, such as asking “ What are the top five criteria you use to assess student professionalism? ” A better approach involves questions that ask participants to describe what they do in practice, with follow-up probes that extend beyond the specific experience described. For example, starting with “ Tell me about a recent experience where you assessed a student’s professionalism ” allows the participant to relay an experience, to which the interviewer can respond with probes such as “ What was tricky about that? ” or “ How typical is that experience? ”

Another common strategy for prompting participants to engage in rich reflection on their experience and perceptions is to use vignettes as discussion prompts. Vignettes are often artificial scenarios presented to participants to read or watch on video, about which they are then asked probing questions.( 20 ) However, vignettes can also be used to recreate an authentic situation for the participant to engage with.( 21 ) For instance, in one interview study, we presented participants with a vignette in the form of the research assistant reading aloud a standard patient admission presentation that the interviewees would typically hear from their students on morning ward rounds. We then asked the participants to interact with the interviewer as though he or she was a student who had presented this case on morning rounds. Recreating this interaction in the context of the interview served as a stepping stone to questions such as “ Why did you ask the student ‘x’? ” and “ How would your approach have differed with a different student presenter, e.g. a stronger or weaker one? ”

Direct observation

Observation-based research can involve a wide spectrum of activities, ranging from brief observations of specific tasks (e.g. handover, preoperative team briefings) to prolonged field observations such as those seen in ethnography. When used effectively, direct observation can provide the researcher with powerful insight into the routines of a group.

Getting good data from observational research relies on several key components. First, it is essential to define the scope of the project upfront: limited budgets, the massive amount of detail to be attended to, and the ability of any individual or group of observers to attend to these make this essential. Good observational research therefore relies on collaboration between knowledgeable insiders and those with both methodological and theoretical expertise. Sampling demands particular attention; an initial purposive sampling approach is often followed by more targeted, theoretical sampling that is guided by the developing analysis. Observational research also typically involves a mix of data sources, including observational field notes, field interviews and document analysis. Audio and video may be helpful when the studied phenomena is particularly complex or nuances of interaction may be missed without the ability to review data, or when precision of verbal and nonverbal interactions is necessary to answer the research question.( 22 )

Field notes are often the dominant data source used for subsequent analysis in observational research. As such, they must be created with great diligence. Usually researchers will jot down brief notes during an observation and afterwards elaborate in as much detail as they can recall. Field notes have an important reflective component. In addition to the factual descriptions, researchers include comments about their feelings, reactions, hunches, speculations and working theories or interpretations. The content of field notes, therefore, usually includes: descriptions of the setting, people and activities; direct quotations or paraphrasing of what people said; and the observer’s reflections.( 23 ) Field notes are time-consuming when done well – even a single hour of observation can lead to several hours of reflective documentation.

An important aspect to consider when designing observation-based research is the ‘observer effect’, also known as the Hawthorne effect, more recently reframed as ‘participant reactivity’ by health professions education researchers Paradis and Sutkin.( 24 ) The Hawthorne effect is conventionally defined as “ when observed participants act differently from how they would act if the observer were not present ”.( 25 ) Researchers have implemented a number of strategies to mitigate this effect, including prolonged embedding of the observer, efforts to ‘fit in’ through dress or comportment, and careful recording of explicit instances of the effect.( 24 ) However, Paradis and Sutkin found that instances of the Hawthorne effect, as conventionally defined, have never been described in qualitative research manuscripts in the health professions education field, perhaps because, as they speculate, healthcare workers and trainees are accustomed to being observed. Based on this, they argued that researchers should worry less about mitigating the Hawthorne effect and instead invest in interpersonal relationships at their study site to mitigate the effects of altered behaviour and draw on theory to make sense of participants’ altered behaviour.( 23 ) Combining interviewing and observation is also common in qualitative research ( Box 3 ).

An external file that holds a picture, illustration, etc.
Object name is SMJ-59-622-g003.jpg

Combining interviews and observations:

WHAT ARE THE COMMON METHODS OF QUALITATIVE DATA ANALYSIS?

Qualitative data almost invariably takes the form of text; an interview is turned into a transcript and an observation is rendered into a field note. Analysing these qualitative texts is about uncovering meaning, developing understanding and discovering insights relevant to the research question. Analysis is not separated from data collection in qualitative research, and begins with the first interview, the first observation or the first reading of a document. In fact, the iterative nature of data collection and analysis is a hallmark of qualitative research, because it allows the researcher’s emerging insights about the study phenomena to inform subsequent rounds of data collection ( Box 4 ).

An external file that holds a picture, illustration, etc.
Object name is SMJ-59-622-g004.jpg

The iterative process of analysis:

Data that has been analysed while being collected is both parsimonious and illuminating. However, this process can extend indefinitely. There will always be another person to interview or another observation to record. Deciding when to stop depends on both practical and theoretical concerns. Practical concerns include deadlines and funding. More importantly, the decision should be guided by the theoretical concern of sufficiency.( 26 ) Sufficiency occurs when new data does not produce new insights into the phenomenon, in other words, when you keep hearing and seeing the same things you have heard and seen before.

Qualitative data analysis is primarily inductive and comparative. The overall process of data analysis begins by identifying segments in the data that are responsive to the research question. The next step is to compare one segment with the next, looking for recurring patterns in the data set. During this step, the focus is on sorting the raw data into categories that progressively build a coherent description or explanation of the phenomenon under study. This process of identifying pieces of data and grouping them into categories is called coding.( 14 ) Once a tentative scheme of categories is derived, it is applied to new data to see whether those categories continue to exist or not, or whether new categories arise – this step determines whether sufficiency has been reached. The final step in the analysis is to think about how categories interrelate. At this point, the analysis moves to interpreting the meaning of these categories and their interrelations.( 12 )

The process for data analysis laid out in this section is a basic inductive and comparative analysis strategy that is suitable for analysing data for most interpretive qualitative research methodologies, including the four featured in this paper – phenomenology, grounded theory, ethnography and case study – as well as others such as narrative analysis and action research. While each methodology attends to specific procedures, they all share the use of this basic inductive/comparative strategy. Overall, analysis should be guided by methodology, but different analytical procedures can be creatively combined across methodologies, as long as this combining is explicit and intentional.( 27 )

WHAT ARE SOME CURRENT INNOVATIONS IN QUALITATIVE RESEARCH?

Understanding the complex factors that influence clinical practice and medical education is not an easy research task. Many important issues may be difficult for the insider to articulate during interviews and impossible for the outsider to ‘see’ during observation. Innovations to address these challenges include guided walks,( 28 ) photovoice( 29 ) and point-of-view filming.( 30 ) Our own research has drawn intensively on the innovation termed ‘rich pictures’ to explore the features and implications of complexity in medical education.( 31 ) In one study, we asked medical students to draw pictures of clinical cases that they found complex: an exciting case and a frustrating one.( 32 ) Participants were given 30–60 minutes on their own to reflect on the situation and draw their pictures. This was followed by an in-depth interview using the pictures as triggers to explore the phenomenon under study – in this case, students’ experiences of and responses to complexity during their training.

Such innovations hold great promise for qualitative research in medical education. For instance, rich pictures can reveal emotional and organisational dimensions of complex clinical experiences, which are less likely to be emphasised in participants’ traditional interview responses.( 33 ) Methodological innovations, however, bring new challenges: they can be time-intensive for participants and researchers; they require new analytical procedures to be developed; and they necessitate efforts to educate audiences about the rigour and credibility of unfamiliar approaches.

WHAT ARE THE PRINCIPLES OF RIGOUR IN QUALITATIVE RESEARCH?

Like quantitative research, qualitative research has principles of rigour that are used to judge the quality of the work.( 34 ) Here, we discuss principles that appear in most criteria for rigour in the field: reflexivity, adequacy, authenticity, trustworthiness and resonance ( Box 5 ).

An external file that holds a picture, illustration, etc.
Object name is SMJ-59-622-g005.jpg

Principles of rigour in qualitative research:

The main data collection tool in qualitative research is the researcher. We talk to participants, observe their practices and interpret their documents. Consequently, a critical feature of rigour in qualitative data collection is researcher reflexivity: the ability to consider our own orientations towards the studied phenomenon, acknowledge our assumptions and articulate regularly our impressions of the data.( 35 ) Only this way can we assure others that our subjectivity has been thoughtfully considered and afford them the ability to judge its influence on the work for themselves. Qualitative research does not seek to remove this subjectivity; it treats research perspective as unavoidable and enriching, not as a form of bias to purge.

Every qualitative dataset is an approximation of a complex phenomenon – no study can capture all dimensions and nuances of situated social experiences, such as medical students’ negotiations of professional dilemmas in the clinical workplace. Therefore, two other important criteria of rigour relate to the adequacy and authenticity of the sampled experiences. Did the research focus on the appropriate participants and/or situations? Was the size and scope of the sample adequate to represent the scope of the phenomenon?( 36 ) Was the data collected an authentic reflection of the phenomenon in question? Qualitative researchers should thoughtfully combine different perspectives, methods and data sources (a process called ‘triangulation’) to intensify the richness of their representation.( 37 ) We should endeavour to draw on data in our written reports such that we provide what sociologist Geertz has termed a sufficiently ‘thick’ description( 38 ) for readers to judge the authenticity of our portrayal of the studied phenomenon.

Qualitative analysis embraces subjectivity: what the researcher ‘sees’ in the data is a product both of what participants told or showed us and of what we were oriented to make of those stories and situations. To some degree, a rhetorician will always see rhetoric and a systems engineer will always see systems. To fulfil the rigour criteria of trustworthiness, qualitative analysis should also be systematic and held to a principle of trustworthiness, which dictates that we should clearly describe: (a) what was done by whom during the inductive, comparative analytical process; (b) how the perspectives of multiple coders were negotiated; (c) how and when theoretical lenses were brought to bear in the iterative process of data collection and analysis; and (d) how discrepant instances in the data – those that fell outside the dominant thematic patterns – were handled.

Finally, the ultimate measure of quality in qualitative research is the resonance of the final product to those who live the social experience under study.( 4 ) As qualitative researchers presenting our work at conferences, we know we have met this bar if our audiences laugh, nod or scowl at the right moments, and if their response at the end is “ You nailed it. That’s my world. But you’ve given me a new way to look at it ”. The situatedness of qualitative research means that its transferability to other contexts is always a matter of the listener/reader’s judgement, based on their consideration of the similarities and differences between the research context and their own. Thus, there is a necessity for qualitative research to sufficiently describe its context, so that consumers of the work have the necessary information to gauge transferability. Ultimately, though, transferability remains an open question, requiring further inquiry to explore the explanatory power of one study’s insights in a new setting.

WHAT ELSE IS THERE TO KNOW?

This overview of qualitative research in medical education is not exhaustive. We have been purposefully selective, discussing in depth some common methodologies and methods, and leaving aside others. We have also passed over important issues such as qualitative research ethics, sampling and writing. There is much, much more for readers to know! Our selectivity notwithstanding, we hope that this paper will provide an accessible introduction to some qualitative essentials for readers who are new to this research domain, and that it may serve as a useful resource for more experienced readers, particularly those who are doing a qualitative research project and would like a better sense of where their work sits within the broader field of qualitative approaches.

  • Open access
  • Published: 16 February 2024

Extraction frequent patterns in trauma dataset based on automatic generation of minimum support and feature weighting

  • Zahra Kohzadi 1 , 2 ,
  • Ali Mohammad Nickfarjam 1 , 2 ,
  • Leila Shokrizadeh Arani 1 , 2 ,
  • Zeinab Kohzadi 3 &
  • Mehrdad Mahdian 4  

BMC Medical Research Methodology volume  24 , Article number:  40 ( 2024 ) Cite this article

145 Accesses

Metrics details

Data mining has been used to help discover Frequent patterns in health data. it is widely used to diagnose and prevent various diseases and to obtain the causes and factors affecting diseases. Therefore, the aim of the present study is to discover frequent patterns in the data of the Kashan Trauma Registry based on a new method.

We utilized real data from the Kashan Trauma Registry. After pre-processing, frequent patterns and rules were extracted based on the classical Apriori algorithm and the new method. The new method based on the weight of variables and the harmonic mean was presented for the automatic calculation of minimum support with the Python.

The results showed that the minimum support generation based on the weighting features is done dynamically and level by level, while in the classic Apriori algorithm considering that only one value is considered for the minimum support manually by the user. Also, the performance of the new method was better compared to the classical Apriori method based on the amount of memory consumption, execution time, the number of frequent patterns found and the generated rules.

Conclusions

This study found that manually determining the minimal support increases execution time and memory usage, which is not cost-effective, especially when the user does not know the dataset's content. In trauma registries and massive healthcare datasets, its ability to uncover common item groups and association rules provides valuable insights. Also, based on the patterns produced in the trauma data, the care of the elderly by their families, education to the general public about encountering patients who have an accident and how to transport them to the hospital, education to motorcyclists to observe safety points in Recommended when using a motorcycle.

Peer Review reports

Introduction

Trauma poses a significant global health challenge, exerting a profound impact on individuals worldwide and standing as the primary cause of mortality among individuals under the age of 45 [ 1 ]. Notably, over half of fatalities occur within minutes of sustaining an injury, often beyond the reach of immediate medical attention despite the presence of well-established trauma systems [ 2 ]. Findings from Study Mock et al [ 3 ] revealed a significant correlation between the economic status of a country and mortality rates attributed to trauma. The results indicated that in Ghana, an injured patient faces nearly double the risk of mortality compared to a patient with identical injuries in the United States.

The introduction of trauma care systems in high-income countries has yielded remarkable reductions in both mortality and disability rates. It is estimated that by enhancing trauma care systems on a global scale, approximately one-third of deaths resulting from injuries could be prevented [ 1 ]. Trauma registries were initially conceived as a tool for enhancing the quality of care provided, offering a wealth of valuable clinical information. Typically, these registries encompass key components such as the Abbreviated Injury Scale (AIS), details on prevention measures, pre-hospital care, and post-discharge care [ 4 ]. Typically, trauma registries encompass a wide range of data, covering patient demographics, injury circumstances, pre-hospital care and transportation details, emergency department interventions, in-hospital treatments, anatomical injury descriptions, physiological measurements, complications, outcomes, and patient dispositions. Moreover, these registries are progressively incorporating information on pre-existing medical conditions, which are recognized as significant determinants of outcomes independent of age and injury severity [ 5 ].

The size of data is consistently expanding, and the demand to comprehend extensive and information-rich datasets has risen across various domains such as technology, business, and science. In today's competitive world, where large volumes of data are prevalent, the ability to extract valuable insights from these datasets and utilize that knowledge has become increasingly crucial. The practice of employing computer-based information systems, along with innovative techniques, to uncover knowledge from data is known as data mining [ 6 ]. Data mining plays a crucial role in the healthcare sector by enabling knowledge discovery and pattern identification to facilitate decision-making processes. It stands as a rapidly advancing field focused on extracting valuable and meaningful insights from extensive datasets. Within healthcare, data mining employs analytical methodologies to identify vital information that supports decision-making processes. Its importance spans various areas, including disease detection, prevention, and management, fraud detection in health insurance, reduction of medical care costs, and the development of effective healthcare policies. Additionally, data mining aids researchers in creating recommendation systems, patient health profiles, and overall contributes to improved diagnosis and treatment through the storage and analysis of voluminous healthcare data using database systems [ 7 ].

Data mining, also known as knowledge discovery in databases (KDD), involves the collection and analysis of historical data to identify patterns, relationships, or regularities within large datasets. The results of data mining can provide valuable insights for making informed decisions in the future. With the evolution of KDD, the use of pattern recognition has become integrated into data mining, leading to a decrease in the reliance on standalone pattern recognition techniques [ 8 ].

The data mining industry is actively conducting research in the association rules mining field [ 9 , 10 ]. In recent times, various algorithms have been suggested for extracting discovered patterns by mining association rules [ 11 ]. The Apriori algorithm is a widely-used algorithm for mining association rules in transaction databases. It was the first such technique developed and remains one of the most popular methods for identifying frequent itemset and interesting associations [ 12 ]. The Apriori algorithm is a classic and pioneering association rule mining algorithm that uses a layer-by-layer iterative search approach to discover relationships between item sets in a database and generate rules [ 13 ].

In the field of health, many studies have been done using Apriori, and association rules mining such as heart diseases [ 14 , 15 ], Alzheimer’s disease diagnosis [ 16 ], Cancer Diagnoses [ 17 ], Diabetes Medical History [ 18 ], Predicting the Risk of Diabetes Mellitus [ 19 ], chronic inflammatory diseases [ 20 ].

Since trauma registries produce vast amounts of diverse and intricate data, using association rules mining for exploratory analysis can help uncover novel, interesting, and obscure patterns. Several of these studies are listed below.

According to the research of Fagerlind et al [ 21 ], the Swedish Traffic Accident Data Acquisition was utilized to analyze crash circumstances reported by the police and injury information gathered from hospitals during the years 2011 to 2017. By applying the Apriori algorithm, statistical associations between injuries (IBIP) were identified through the analysis of injury data. Out of the 48,544 individuals analyzed, 36,480 (75.1%) had a single recorded injury category, while 12,064 (24.9%) had multiple injuries. The analysis using data mining techniques revealed 77 IBIPs among the multiply injured individuals, and out of these, 16 were linked to only one type of road user.

The study of Karajizadeh et al [ 22 ] is classified as a cohort study, which involved analyzing 549 trauma patients with nosocomial infections who were admitted to Shiraz trauma hospital between 2017 and 2018. The study collected data on various factors such as sex, age, mechanism of injury, body region injured, injury severity score, length of stay, type of intervention, infection day after admission, microorganism cause of infections, and outcomes. Knowledge was extracted from the dataset using association rule mining techniques, and the IBM SPSS Modeler data mining software version 18.0 was utilized as a tool for data mining of the trauma patients with hospital queried infections database. Their results showed that the following factors were found to be associated with in-hospital mortality at a confidence level of over 71%: age over 65, surgical site infections on the skin, bloodstream infections, injuries caused by car accidents, invasive tracheal intubation procedures, injury severity scores above 16, and multiple injuries.

The objective of the study by Aekwarangkoon et al [ 23 ] was to utilize association rule mining to identify related patterns, and subsequently, to develop a prediction model utilizing ensemble learning techniques to predict high levels of depression and suicide rates among primary school students attending extended opportunity schools in rural Thailand, where incidents of trauma are prevalent. The results of the experiment indicated that the crucial feature items identified in this study surpassed all other previously used items in predicting depression and suicide. Furthermore, the approach proposed in this research can serve as an initial screening process for identifying individuals at risk of depression and suicide.

The Research by Finley et al [ 24 ] was conducted with the aim of investigating the potential effects of traumatic brain injury on the capacity to classify and remember visual signals from a subjective perspective. They use the Association Rule Modelling method to measure Subjective organization and examine whether the complexity of Association Rule Modelling -generated rules predicts symbol recall.

In the study of Sarıyer et al [ 25 ], the real-life medical data obtained from an emergency department was analyzed using association rule mining to uncover hidden patterns and relationships between diagnostic test requirements and diagnoses. The diagnoses were classified into 21 categories according to the International Classification of Diseases, while the laboratory tests were grouped into four main categories. The study demonstrated that identifying the correlation between a patient's diagnosis and their required diagnostic tests can enhance decision-making and optimize resource utilization in emergency departments. Furthermore, association rules can aid physicians in treating patients effectively.

But in all these studies, classical Apriori algorithm have been used to extract frequent rules and patterns.

Association rule mining comprises a two-step procedure: (i) identifying frequent itemsets within the dataset, and (ii) deriving inferences from these identified itemsets. The identification of frequent itemsets is acknowledged as the computationally most challenging task in this process and has been demonstrated to be NP-Complete [ 26 ].

The essential component rendering association-rule mining feasible is the minimum support threshold, referred to as minsup. Its primary function is to prune the search space and constrain the number of generated rules. Nonetheless, relying solely on a single minsup presupposes that all items in the database share the same nature or exhibit comparable frequencies, which may not accurately represent real-life applications [ 27 , 28 ].

Establishing an inaccurate minimum support (min_sup) threshold can lead to two significant issues: (i) the algorithm's failure to identify correlated patterns, and (ii) a potentially more serious problem, wherein the algorithm may produce misleading patterns that do not genuinely exist [ 29 ].

This is particularly probable when an analyst lacks a comprehensive understanding of the significance of an input parameter in the data mining process or fails to choose optimal parameter values. Such oversights can result in the algorithm's failure to identify highly correlated patterns [ 30 ].

The primary challenge with the Apriori algorithm is picking the support and confidence thresholds. Apriori finds the most common candidate itemset by making every possible candidate itemset that meets a minimum support set by the user. This decision affects how many association rules there are and what kind of association rules they are. In practical applications, users cannot discover a suitable minimum support value immediately and must constantly tune it. To accomplish this, every time a user modifies an item's minimum support, they must again scan the database and repeat the mining algorithm. Also, not all elements in a itemset act in the same way; some appear regularly and frequently, while others appear occasionally and rarely [ 31 ]. Also, if the threshold value is too small, it will generate many useless rules, and if it is too large, it may cause useful information to be deleted [ 32 ].

It is extremely time-consuming and expensive. Thus, it is appealing to think of the possibility of designing an algorithm for automatically generating minimum supports. As a result, the minimum support threshold should be adjusted based on the element set's various levels. The works of others are mentioned in the section on related works.

On the other hand, the advantage of using the weight of variables in extracting frequent patterns is stated in various studies [ 33 , 34 ]. According to these studies, an item may exist many times in the database, but it is not very important, as a result, the importance (their weight) of the variables can be effective in extracting frequent patterns.

Therefore, the aim of this study is to extract frequent patterns in the data of Kashan Trauma Center using an improved algorithm based on the creation of automatic minimum support and feature weights.

At the end of this study, the following questions will be answered:

What is the new method to automatically calculate the minimum support in the Apriori algorithm?

What is the effect of weighting variables to produce frequent patterns?

What is the impact of the new method on algorithm execution time, memory consumption, the number of frequent patterns, and the quality of generated rules?

The other sections of the paper are as follows: in Section "  Methodology ", the methodology is explained, including the description of the dataset, the selection of the algorithm, evaluation, and Implemented framework. In Sections "  Experimental Results ", the findings of the experiments are presented. Finally, in Section "  Discussion ", the conclusion is stated.

Related works

Table 1 presents a comparative assessment of prior studies, delving into aspects such as their objectives, use of real datasets, introduction of implementation platforms, evaluation indicators, and the delineation of respective advantages and limitations. This comparative overview provides a comprehensive insight into the diverse approaches embraced in the field, fostering a nuanced understanding of the strengths and weaknesses inherent in each study.

The examination of each study involves a critical assessment of its purpose, revealing the specific goals and objectives pursued by the researchers. The utilization of real datasets serves as a significant criterion, indicating the practical applicability and relevance of the proposed methods. Additionally, the implementation platform signifies the technology or programming language employed in the study, offering insights into the technical aspects of the research. Evaluation indicators are crucial for gauging the performance of proposed methods, encompassing metrics such as RAM memory consumption, hard disk space utilization, algorithm execution time, the count of frequently generated patterns, the number of generated rules, and the quality of these rules. A comprehensive analysis of these indicators provides valuable insights into the effectiveness and efficiency of each study. Moreover, it is essential to take into account both the strengths and limitations of each study. Recognizing the strengths aids in identifying innovative aspects and potential contributions, while being cognizant of limitations is crucial for placing the findings in context and pinpointing areas for improvement.

In past studies, despite efforts to solve the problem of minimum single support as well as generate association rules based on the weight of variables, However, the weighting of variables and the automation of the minimum support, eliminating the need for user intervention, have not been implemented simultaneously.

Therefore, the current study endeavors to incorporate both variable weighting and automated minimum support calculation.

Methodology

In this study, our goal is to improvement the calculation of min support and discover frequent patterns in trauma data by incorporating variable weights.

For this research, the data from March 2018 to February 2019 the Kashan Trauma Centre was utilized.

The data pre-processing involved multiple steps:

Noisy and outlier data were removed.

Numerical variables were imputed using the mean, while categorical variables were imputed using the mode.

Normalize Min-Max was then utilized to normalize the data.

Lastly, one hot encoding was employed to discretize the data.

One-hot encoding is a machine learning approach for representing categorical variables as numbers so that algorithms can process them. This is done by building a binary vector with one element for each category in the variable and all other components set to zero. Each category is represented as a separate feature in the resulting vector with a value of 0 or 1, depending on whether it was included in the initial data or not. Machine learning algorithms may effectively handle categorical variables and record their correlations with other variables by employing one-hot encoding [ 38 ]. Table 2 shows the features extracted from the dataset after pre-processing.

Algorithm selection and evaluation

This section provides an overview of the association rule mining and Apriori algorithm.

  • Association rule mining

The goal of the data mining technique known as Association Rule Mining is to extract correlations, common patterns, or special structures from data repositories [ 39 ]. Association rules consist of two sets of items: the antecedent (or left-hand side) and the consequent (or right-hand side). These rules are typically expressed in the form X ⇒ Y, where X represents the antecedent and Y represents the consequent. The purpose of the analysis is to derive association rules that identify the items and cooccurrences of different items that appear frequently [ 40 ].

Strong association rules are those that meet the minimum support and minimum confidence thresholds as defined [ 41 ].

Support : The support of a rule indicates the frequency of its application in a given dataset [ 42 ].

N: Total number of records

Definition of weight support in the present study:

Confidence : The confidence of a rule reflects the proportion of times that items in Y are found in transactions that also contain X [ 43 ].

Definition of weight Confidence in the present study:

  • Apriori algorithm

The Apriori algorithm is a well-known approach for identifying frequent patterns in a dataset. These patterns consist of sets of items that occur frequently and exceed a predefined threshold known as the minimum support [ 44 ].

The Apriori algorithm consists of multiple phases or passes [ 41 , 45 ]:

The first step involves generating candidate itemsets, where each k-itemset is created by combining (k-1)-itemsets that were identified in the previous iteration. A common pruning technique used in Apriori involves eliminating k-itemsets whose subsets, containing k-1 items, are not present in any frequent pattern of length k-1.

The next phase of the Apriori algorithm involves calculating the support for each k-itemset candidate. This is accomplished by scanning the entire database to count the number of transactions that contain all the items in the k-itemset candidate. This step is a defining characteristic of the Apriori algorithm and requires scanning the entire database for the longest k-itemset.

To establish a high-frequency pattern, the Apriori algorithm identifies k-item sets that have a support greater than the minimum threshold. These high-frequency patterns consist of sets of k items.

If no new high-frequency patterns are identified, the Apriori algorithm terminates. Otherwise, the algorithm increments k by one and repeats the process from the first phase.

To compare the performance of the new proposed method and the classical algorithm, the following indices were calculated:

RAM memory consumption

The amount of space used on the hard disk

Algorithm execution time

Number of frequently generated patterns

Number of generated rules

Quality of generated rules: To calculate this, the median confidence value was calculated for different item sets in the new proposed method and the classical method.

Implemented framework

In this section, the new proposed algorithm is explained. This algorithm calculates its weight support based on the weight of each item and the number of repetitions of each item. Additionally, to determine the weight support of an item set, multiply the number of repetitions by the harmonic mean of the weights of its components. To determine the minimum weighted support at each level, divide the total weighted support of the item sets by the total records. Then, if the weighted support of each item set is greater than the minimum support of that level, that item set goes to the next stage as a candidate list. The pseudocode of the algorithm is shown in Fig  1 .

figure 1

Pseudocode of the new algorithm

The input of the algorithm is the dataset.

The weight value was calculated for all variables using the information gain method.

Variables whose weight was too low were removed.

The improved Apriori algorithm takes feature weights and dataset as input.

The support value of each item was calculated using the suggested formula:

Supp-Items(xi): The supp-Items xi

Weight(xi): Information gain value for variable i

Mean Harmonic (Weight(xi)): The mean harmonic value of the i-th variable weight

Numbers of Items (xi): Number of itemsets xi

In this step, the amount of weighted support for each level was calculated using formula 6 : the support for each item set was first calculated, then their sum was used to calculate the minimum support for that level.

Sort (Supp-Items(xi))

N: Number of Record

According to formula 7 , if the amount of support obtained for each item is greater than the minimum amount of support obtained for that level, that item set will go to the next stage as a candidate item set, otherwise it will be removed.

Then this process continues until no other item set is produced.

The Apriori improved algorithm function returns the most frequent patterns based on the weight of the variables and the minimum support value of each level.

Finally, the output of the algorithm is the frequent patterns and rules.

We examined the algorithm's essential elements to acquire a deeper understanding of its complexity.

Loading the Dataset (load Dataset) : The process of reading an Excel file and transforming it into a matrix is characterized by a time complexity that scales with the size of the dataset. This can be represented as O(N), where N corresponds to the number of transactions or cells in the dataset.

Calculate Weights (information Gain) : The time complexity associated with computing information gain for a feature can be denoted as O(N ⋅ V), where N represents the number of instances, and V stands for the number of unique values within the feature.

Remove Feature () : The time complexity for eliminating features with weights below 0.001 is typically O(n), where n represents the number of features in the data structure.

Creating Candidate 1-Itemsets : The time complexity is O (N * M), with M representing the average number of items in a transaction.

Scanning the Dataset (scanD) : The nested loops iterate through transactions and candidate itemsets, with a worst-case time complexity of O (N * M * K), where K denotes the length of the candidate itemsets.

Generating Candidate Itemsets (aprioriGen) : The time complexity is contingent on the size of the existing frequent itemsets, with a worst-case scenario reaching O(2^(M-1)), where M represents the length of the frequent itemsets.

Calculating Primary Weights (primariweight) : The time complexity is O (N * M), with M being the mean number of items in a transaction.

Apriori Algorithm Iterations (Apriori) : The iterations entail examining the dataset and producing candidate itemsets. In the worst-case scenario, the time complexity is O (I * N * M * K), where I represents the number of iterations.

Sorting (): The time complexity for sorting a list varies based on the sorting algorithm employed (e.g., Merge Sort: Time Complexity: O (n log n)).

Generating Association Rules (generate Rules) : The time complexity is contingent on the quantity of frequent itemsets and their respective lengths. In the worst-case scenario, it can be expressed as O (F * (M^2)), where F represents the number of frequent itemsets, and M denotes their average length.

The Apriori algorithm's exponential nature results in significant computational costs, particularly when dealing with large datasets or when the minimum support threshold is set at a low value. In conclusion, the comprehensive time complexity of the Apriori algorithm is affected by various factors, including the dataset size (N), the average number of items per transaction (M), the length of candidate itemsets (K), the number of iterations (I), and the quantity of frequent itemsets (F).

Experimental results

Figures 2 , 3 , 4 , 5 , 6 , 7 , 8 and 9 show the performance of the proposed algorithm and the classical Apriori algorithm on trauma data. In Fig.  2 , the number of frequent patterns generated based on the selection of different minimum supports in the classical algorithm of Apriori is shown. With the increase in the minimum support value, the number of frequently generated patterns has decreased. The red dot shows the number of frequent patterns generated using the new proposed algorithm.

figure 2

Number of frequent patterns

figure 3

The amount of space used in the hard disk (KB)

figure 4

The amount of RAM memory (GB)

figure 5

Duration of execution

figure 7

Confidence in four item sets (Median)

figure 8

Confidence in seven item sets (Median)

figure 9

Confidence in nine item sets (Median)

In Fig.  2 , the consumption of hard disk space from the generation of frequent patterns produced by the classical algorithm based on different minimum supports is shown. With the increase in minimum support, the consumption of hard disk space has decreased. The red dot in the diagram shows the amount of hard disk space consumed by the new algorithm.

Figures 4 , 5 , and 6 show the RAM memory consumption, the number of generated rules, and the execution time, respectively.

Figures 7 , 8 , and 9 show the median confidence values in four, seven, and nine item sets, respectively. In all these figures, the marked red dot shows the outputs of the proposed algorithm.

According to the findings, with the increase in the amount of minimum support, the number of generated patterns and rules has decreased, which reduces the amount of execution time and the amount of RAM and hard disk used.

Also, to evaluate the performance of the proposed method compared to the classical algorithm, the quality of the generated rules was also calculated. The quality of the rules in different item sets is very close to each other, and sometimes even in the new proposed method, the quality of the rules produced is better than the classical algorithm, according to Figs.  7 , 8 , and 9 .

According to the findings of the research, one of the most frequent patterns with 80% confidence in the trauma dataset was:

Elderly patients who were hospitalized for more than 4 days had broken organs, and the cost of their treatment was also high and their insurance type was unclear. They were also taken to the hospital by motorcycle and experienced issues in the head and neck region.

It was also observed in some frequent patterns that patients were transported to the hospital by personal vehicle. Also, patients who did not improve experienced limb fractures.

Some of the most frequent patterns in the trauma data were:

Pedestrian injured in transport accident, Injuries to the head, personal vehicle, Elder, Expensive hospital fees.

Pedestrian injured in transport accident, Injuries to the hip and thigh, taxi, Elder.

Motorcycle rider injured in transport accident, Injuries to the shoulder and upper arm, taxi, worker, Cheap hospital fees.

Motorcycle rider injured in transport accident, Injuries to the head, personal vehicle, students, teenager.

Motorcycle rider injured in transport accident, Injuries to the head, Expensive hospital fees, Hospitalization for more than 4 days.

Pedestrian injured in transport accident, ambulance, One day of hospitalization, Cheap hospital fees, Injuries to the ankle and foot.

Pedestrian injured in transport accident, Injuries to the wrist, illiterate, teenager, Hospitalization for more than 3 days.

According to the findings of the research, the number of frequent patterns and rules produced in the new proposed method is much lower compared to the classic Apriori algorithm. The number of frequent patterns could be calculated for the minimum support greater than 0.045 in the classic Apriori algorithm, while for the minimum support less than 0.045, the calculation of the generated patterns is not cost-effective in terms of time, RAM memory, or even the amount of information generated. Therefore, it is not cost-effective to calculate all frequent patterns with the classic Apriori algorithm. Also, due to the fact that the generated rules are made from frequent patterns and their number will be several times that of the frequent patterns, in the classic Apriori algorithm, calculating all the generated rules will not be cost-effective.

The importance of variables and the production of minimum support based on level by level in the new method cause the minimum support to be different for different levels, while in the classic method, one value is considered the minimum support for all levels. But in the new proposed method, one value is generated for single itemsets, one value for binary itemsets, and different amounts of minimum support are generated for other itemsets.

Also, the minimum support in the new proposed method automatically weights the variables and calculates the minimum support at all levels without user intervention. This makes it easy to calculate the generation of frequent patterns and the resulting rules based on the importance of variables in the case of datasets that are unknown to the user.

In [ 21 ] a data mining technique was applied in a novel way to identify IBIPs that were linked to co-occurring injury categories. This analysis demonstrated significant differences in IBIPs between various types of road users, which can provide valuable insights into how injury severity and outcomes may vary. These findings could have important implications for prioritizing crash countermeasures. In [ 22 ] determined that Data mining through association rule mining could potentially be the optimal method for determining the key factors that impact mortality rates in trauma patients with hospital-acquired infections. Among these factors, advanced age, tracheal intubation, mechanical ventilation, surgical site infections, skin infections, and upper respiratory infections appear to be the most crucial risk factors that contribute to mortality rates. In [ 23 ], Their approach differed from previous studies that examined the correlation between high levels of life trauma, depression, and suicide using statistical analysis. Instead, they utilized a distinct methodology that identified highly correlated patterns and effects between trauma, depression, and suicide in primary school students. Through the use of FP-Growth association rule mining, this study was able to determine the linked patterns between high-life trauma, depression, and suicide among primary school students attending extended opportunity schools in rural Thailand. The findings revealed a total of 34 associated patterns for high trauma, 14 associated patterns for depression, and 35 associated patterns for suicide. In [ 25 ] has demonstrated how association rule mining can be used to extract sets of rules between different diagnosis types and laboratory diagnostic test requirements. Real-life data from emergency departments of a large-scale urban hospital were utilized in the research.

Despite the use of the Apriori algorithm in these studies, they have benefited from its classical type. In studies [ 21 , 22 , 23 , 24 , 25 ] a classical Apriori algorithm with manual input of minimum support has been used to extract frequent patterns.

In studies [ 27 , 35 , 36 ] despite the improvement of the classical Apriori algorithm with multiple selections of the minimum support, the minimum support value was still entered by the user, and the variable weights were not considered. Although the calculation of minimum support is automatic in studies [ 32 , 46 ], the variable weights were not taken into account. Despite using the weight to determine the association rules in studies [ 33 ] and [ 34 ], the user enters the minimal support value manually. However, in this study, Variable weighting and automated minimum support determination are employed to generate frequent patterns.

In studies 1, 2, and 3, multiple minimum supports were employed to modify the minimum support threshold. However, a notable limitation of the aforementioned approach is the reliance on user intervention for determining the minimum support. The challenge of automatically generating the minimum support has not been addressed in these three studies, although they represent an improvement compared to the classic Apriori method. Additionally, while real dataset was utilized in these studies, the clarity of algorithm implementation remained unclear. Evaluation metrics in study 1 included the Number of Large Item Sets and Number of Candidate Item Sets. Study 2 focused on Runtime, while study 3 utilized the Number of Frequent Patterns for evaluation.

In study 4, while the authors present a novel approach to efficiently retrieve the top few maximal frequent patterns in order of significance, eliminating the need for the minimum support parameter, they still require users to specify another parameter, namely the desired number of itemsets denoted as “k”. This signifies a form of user intervention in the algorithm.

In study 5, the research strives to introduce an innovative algorithm for association rule mining with the aim of improving computational efficiency and automating the identification of suitable threshold values. The proposed method utilizes the particle swarm optimization algorithm, which initially pursues the optimal fitness value for each particle. However, one of the limitations acknowledged by the study's authors is the absence of consideration for variable weights. The evaluation platform employed was Borland C++ Builder 6, and the assessment criteria included Runtime and the Number of Frequently Generated Patterns.

In studies 6 and 7, despite notable advancements in the automated calculation of the support threshold and the utilization of various statistical indicators for this purpose, there is a notable omission in considering the weight of the variables. The specific platform employed for their study was not disclosed, and their evaluation criteria encompassed the Number of Association Rules, Time Consumption, and the Quality of the Extracted Rules.

In the recent study, a novel approach was introduced that focuses on variable weighting and utilizes the harmonic mean for the automatic calculation of the minimum support. This method was implemented using Python. Unlike previous studies, this approach considers the weight assigned to variables, acknowledging its significance in the mining process.

The evaluation process in this study was comprehensive, involving various indicators to assess the performance of the proposed method. These indicators encompassed RAM memory consumption, the amount of space utilized on the hard disk, algorithm execution time, the number of frequently generated patterns, the number of generated rules, and the quality of the generated rules. Such a multi-faceted evaluation provides a more holistic understanding of the method's effectiveness across different dimensions, addressing not only computational efficiency but also the quality of the extracted patterns and rules.

In the present study, similar to other investigations, association rule mining successfully identified frequent patterns within the data obtained from Kashan Trauma Centre. These patterns have the potential to significantly enhance healthcare outcomes. For instance, it was observed that the majority of patients who did not improve had limb fractures. Consequently, the healthcare team can prioritize their attention towards patients with fractures. Additionally, it was noted that the patients involved in such incidents were predominantly motorcycle riders. Hence, there is an opportunity to raise awareness among the general public regarding the hazards associated with motorcycle usage, and it is advisable to promote the use of appropriate safety gear while riding motorcycles. Furthermore, in some cases, these patients were transported to the hospital in personal vehicles. Thus, it is essential to educate the general public about the importance of adhering to safety precautions when transporting patients in personal vehicles. In some of the produced patterns, the elderly was at risk, so it is suggested to teach their families to take care of the elderly.

Among the limitations of the research was that the duration of the algorithm execution and the amount of memory consumption depend on the hardware device. In this research, a device with 48 GB of RAM, Core i5 generation 9 was used, which was expensive for the researchers. Also, in this study, we were looking for the automation of min support rather than the optimization of time and memory consumption. However, we were able to reduce time and memory compared to the classical Apriori algorithm.

Association rule mining shows potential as a tool for applications in trauma research and treatment. In the examination of trauma registries and large healthcare datasets, its capacity to detect frequent item sets and association rules is particularly relevant and yields insightful findings and knowledge. It can play a crucial role in upgrading trauma treatment systems, detecting risk factors, and forming preventive strategies by looking for associations and trends within trauma-related data. Its incorporation into the healthcare industry could improve decision-making procedures, create efficient regulations, and improve patient outcomes. Integrating association rule mining into trauma research offers a chance to advance trauma therapy and ultimately enhance patient wellbeing as data mining continues to develop. While the generation of frequent patterns in large datasets based on the classic Apriori algorithm and selecting the minimum support manually is not cost-effective in terms of time and memory consumption, calculating the minimum support based on the weight of variables and different levels of item sets can improve the classical algorithm and be used in various industries, including the health industry and large datasets such as trauma, to extract frequent patterns.

Availability of data and materials

All data generated and analyzed during the current study are not available to the public but may be obtained from the corresponding author upon reasonable request and with permission from the Kashan University of Medical Sciences.

Mock C, Joshipura M, Arreola-Risa C, Quansah R. An estimate of the number of lives that could be saved through improvements in trauma care globally. World J Surg. 2012;36:959–63. https://doi.org/10.1007/s00268-012-1459-6 .

Article   PubMed   Google Scholar  

Potenza BM, Hoyt DB, Coimbra R, Fortlage D, Holbrook T, Hollingsworth-Fridlund P, et al. The epidemiology of serious and fatal injury in San Diego County over an 11-year period. J Trauma. 2004;56(1):68–75. https://doi.org/10.1097/01.TA.0000101490.32972.9F .

Mock CN, Jurkovich GJ, Arreola-Risa C, Maier RV, Surgery AC. Trauma mortality patterns in three nations at different economic levels: implications for global trauma system development. J Trauma. 1998;44(5):804–14.

Article   CAS   PubMed   Google Scholar  

Moore L, Clark DE. The value of trauma registries. Injury. 2008;39(6):686–95. https://doi.org/10.1016/j.injury.2008.02.023 .

Morris JA, MacKenzie EJ, Edelstein SL. The effect of preexisting conditions on mortality in trauma patients. JAMA. 1990;263(14):1942–6. https://doi.org/10.1001/jama.1990.03440140068033 .

Jothi N, Husain W. Data mining in healthcare–a review. Procedia Comput Sci. 2015;72:306–13. https://doi.org/10.1016/j.procs.2015.12.145 .

Article   Google Scholar  

Varghese DP, Tintu P. A survey on health data using data mining techniques. Int Res J Eng Technol. 2015;2(07):2395-0056. https://www.irjet.net/archives/V2/i7/IRJET-V2I7108.pdf .

Panjaitan S, Amin M, Lindawati S, Watrianthos R, Sihotang HT, Sinaga B, editors. Implementation of apriori algorithm for analysis of consumer purchase patterns. Journal of Physics: Conference Series; 2019: IOP Publishing. https://doi.org/10.1088/1742-6596/1255/1/012057 .

Czibula G, Czibula IG, Miholca D-L, Crivei LM. A novel concurrent relational association rule mining approach. Exp Syst Appl. 2019;125:142–56. https://doi.org/10.1016/j.eswa.2019.01.082 .

Nguyen D, Luo W, Phung D, Venkatesh S. LTARM: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment. Knowledge-Based Syst. 2018;161:313–28. https://doi.org/10.1016/j.knosys.2018.07.031 .

Liu X, Niu X, Fournier-Viger P. Fast top-k association rule mining using rule generation property pruning. Appl Intell. 2021;51:2077–93. https://doi.org/10.1007/s10489-020-01994-9 .

Yuan X, editor An improved Apriori algorithm for mining association rules. AIP conference proceedings; 2017: AIP Publishing LLC. https://doi.org/10.1063/1.4977361 .

Cai L, Engineering. Japanese teaching quality satisfaction analysis with improved apriori algorithms under cloud computing platform. Comput Syst Sci. 2020;35(3):183-9. https://cdn.techscience.cn/uploads/attached/file/20200901/20200901013945_36111.pdf .

Domadiya N, Rao UP. Privacy-preserving association rule mining for horizontally partitioned healthcare data: a case study on the heart diseases. Indian Acad Sci. 2018;43:1-9. https://doi.org/10.1007/s12046-018-0916-9 . https://www.ias.ac.in/article/fulltext/sadh/043/08/0127 .

Nahar J, Imam T, Tickle KS, Chen Y-PP. Association rule mining to detect factors which contribute to heart disease in males and females. Exp Syst Appl. 2013;40(4):1086-93. https://doi.org/10.1016/j.eswa.2012.08.028 .

Chaves R, Ramírez J, Gorriz J, Initiative AsDN. Integrating discretization and association rule-based classification for Alzheimer’s disease diagnosis. Exp Syst Appl. 2013;40(5):1571-8. https://doi.org/10.1016/j.eswa.2012.09.003 .

Wang Y, Wang F, editors. Association rule learning and frequent sequence mining of cancer diagnoses in new york state. Data Management and Analytics for Medicine and Healthcare: Third International Workshop, DMAH 2017, Held at VLDB 2017, Munich, Germany, September 1, 2017, Proceedings 3; 2017: Springer. https://doi.org/10.1007/978-3-319-67186-4_10 .

Khotimah PH, Hamasaki A, Yoshikawa M, Sugiyama O, Okamoto K, Kuroda TJD. On association rule mining from diabetes medical history. 2018. http://db-event.jpn.org/deim2018/data/papers/169.pdf .

Kamalesh MD, Prasanna KH, Bharathi B, Dhanalakshmi R, Aroul Canessane R, editors. Predicting the risk of diabetes mellitus to subpopulations using association rule mining. Proceedings of the International Conference on Soft Computing Systems: ICSCS 2015, Volume 1; 2016: Springer. https://doi.org/10.1007/978-81-322-2671-0_6 .

Veroneze R, Cruz Tfaile Corbi S, Roque da Silva B, de S. Rocha C, V. Maurer-Morelli C, Perez Orrico SR, et al. Using association rule mining to jointly detect clinical features and differentially expressed genes related to chronic inflammatory diseases. Plos One. 2020;15(10):e0240269. https://doi.org/10.1371/journal.pone.0240269 .

Fagerlind H, Harvey L, Humburg P, Davidsson J, Brown J. Identifying individual-based injury patterns in multi-trauma road users by using an association rule mining method. Accid Anal Prev. 2022;164:106479. https://doi.org/10.1016/j.aap.2021.106479 .

Karajizadeh M, Nasiri M, Yadollahi M, Roozrokh Arshadi Montazer M. Risk Factors Affecting Death from Hospital-Acquired Infections in Trauma Patients: Association Rule Mining. J Health Manag Informatics. 2021;8(1):27-33.

Aekwarangkoon S, Thanathamathee P. Associated patterns and predicting model of life trauma, depression, and suicide using ensemble machine learning. Emerg Sci J. 2022;6:679-93. https://doi.org/10.28991/ESJ-2022-06-04-02 .

Finley J-C, Parente F. Organization and recall of visual information after traumatic brain injury. Brain Injury. 2020;34(6):751–6. https://doi.org/10.1080/02699052.2020.1753113 .

Sarıyer G, Öcal Taşar CJHij. Highlighting the rules between diagnosis types and laboratory diagnostic tests for patients of an emergency department: Use of association rule mining. 2020;26(2):1177-93.

Grahne G, Zhu J, editors. High performance mining of maximal frequent itemsets. 6th International workshop on high performance data mining; 2003.

Liu B, Hsu W, Ma Y, editors. Mining association rules with multiple minimum supports. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining; 1999. https://dl.acm.org/doi/pdf/ https://doi.org/10.1145/312129.312274 .

Tseng M-C, Lin W-Y, editors. Mining generalized association rules with multiple minimum supports. International Conference on Data Warehousing and Knowledge Discovery; 2001: Springer.

Salam A, Khayal MSH. Mining top− k frequent patterns without minimum support threshold. Knowl Inf Syst. 2012;30:57–86. https://doi.org/10.1007/s10115-010-0363-3 .

Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining; New York, NY: AAAI Press; 1998. p. 80–6.

Hosseinioun P, Shakeri H, Ghorbanirostam G. Knowledge-Driven decision support system based on knowledge warehouse and data mining by improving apriori algorithm with fuzzy logic. Int J Comput Inf Eng. 2016;10(3):528–33. https://doi.org/10.5281/zenodo.1339201 .

Dahbi A, Balouki Y, Gadi T, editors. Using multiple minimum support to auto-adjust the threshold of support in apriori algorithm. Proceedings of the ninth international conference on soft computing and pattern recognition (SoCPaR 2017); 2018: Springer. https://doi.org/10.1007/978-3-319-76357-6_11 .

Wang W, Yang J, Yu PS, editors. Efficient mining of weighted association rules (WAR). Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining; 2000. https://dl.acm.org/doi/pdf/ https://doi.org/10.1145/347090.347149 .

Tao F, Murtagh F, Farid M, editors. Weighted association rule mining using weighted support and significance framework. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining; 2003. https://doi.org/10.1145/956750.956836 .

Hu Y-H, Chen Y-L. Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism. Decis Supp Syst. 2006;42(1):1–24. https://doi.org/10.1016/j.dss.2004.09.007 .

Kiran RU, Reddy PK, editors. Mining rare association rules in the datasets with widely varying items’ frequencies. Database Systems for Advanced Applications: 15th International Conference, DASFAA 2010, Tsukuba, Japan, April 1-4, 2010, Proceedings, Part I 15; 2010: Springer. https://doi.org/10.1007/978-3-642-12026-8_6 .

Kuo RJ, Chao CM, Chiu Y. Application of particle swarm optimization to association rule mining. Appl Soft Comput. 2011;11(1):326–36. https://doi.org/10.1016/j.asoc.2009.11.023 .

Cerda P, Varoquaux G, Kégl B. Similarity encoding for learning with dirty categorical variables. Machine Learn. 2018;107(8–10):1477–94. https://doi.org/10.1007/s10994-018-5724-2 .

Article   MathSciNet   Google Scholar  

Kotsiantis S, Kanellopoulos D, Engineering. Association rules mining: A recent overview. International Transactions on Computer Science. 2006;32(1):71-82. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=73a19026fb8a6ef5bf238ff472f31100c33753d0 .

Akbaş KE, Kivrak M, Arslan AK, Çolak C, editors. Assessment of association rules based on certainty factor: an application on heart dataset. 2019 International artificial intelligence and data processing symposium (IDAP); 2019: IEEE. https://doi.org/10.1109/IDAP.2019.8875977 .

Han J, Kamber M, Pei JJUoIaU-CMKJPSFU. Data mining concepts and techniques third edition. 2012. https://www.academia.edu/download/43034828/Data_Mining_Concepts_And_Techniques_3rd_Edition.pdf .

Li Q, Zhang Y, Kang H, Xin Y, Shi C, Care H. Mining association rules between stroke risk factors based on the Apriori algorithm. Technol Health Care. 2017;25(S1):197–205. https://doi.org/10.3233/THC-171322 .

Yousefi L, Swift S, Arzoky M, Sacchi L, Chiovato L, Tucker A, editors. Opening the black box: Exploring temporal pattern of type 2 diabetes complications in patient clustering using association rules and hidden variable discovery. 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS); 2019: IEEE. https://doi.org/10.1109/CBMS.2019.00048 .

Santoso MH. Application of Association Rule Method Using Apriori Algorithm to Find Sales Patterns Case Study of Indomaret Tanjung Anom. Brilliance: Research of Artificial Intelligence. 2021;1(2):54-66. https://doi.org/10.47709/briliance.vxix.xxxx .

Simanjorang RM. Implementation of apriori algorithm in determining the level of printing needs. Data Mining, Image Processing and artificial intelligence. 2020;8(2, Juni):43-8. http://infor.seaninstitute.org/index.php/infokum/article/download/16/20 .

Dahbi A, Jabri S, Balouki Y, Gadi T, editors. Finding Suitable Threshold for Support in Apriori Algorithm Using Statistical Measures. Enabling Machine Learning Applications in Data Science: Proceedings of Arab Conference for Emerging Technologies 2020; 2021: Springer. https://doi.org/10.1007/978-981-33-6129-4_7 .

Download references

This study was supported by a grant from the Research Council of Kashan University of Medical Sciences [grant number: 401098]. The authors did not receive any grants from nonprofit organizations or funding agencies either in public or commercial sectors.

Author information

Authors and affiliations.

Health Information Management Research Center, Kashan University of Medical Sciences, Kashan, Iran

Zahra Kohzadi, Ali Mohammad Nickfarjam & Leila Shokrizadeh Arani

Department of Health Information Management and Technology, Allied Medical Sciences Faculty, Kashan University of Medical Sciences, Kashan, Iran

Medical Informatics Department, School of Allied Medical Sciences Shahid, Beheshti University of Medical Sciences, Tehran, Iran

Zeinab Kohzadi

Trauma Research Center, Kashan University of Medical Sciences, Kashan, Iran

Mehrdad Mahdian

You can also search for this author in PubMed   Google Scholar

Contributions

AM. N, ZA. K, M. M: Conceived and designed the analysis; Collected the data; Revision. AM. N, ZA. K, ZE. K: Conceived and designed the analysis; Contributed data or analysis tools; Performed the analysis; Writing; Revision and Editing; Investigation; Methodology. AM. N, ZA. K, ZE. K, L. SH, M. M: Review of Related works in the field of trauma, Writing; Editing and Revision. All authors reviewed and approved the article.

Corresponding author

Correspondence to Ali Mohammad Nickfarjam .

Ethics declarations

Ethics approval and consent to participate.

This article is extracted from a study was approved by ethics committee/ IRB of the Kashan University of Medical Sciences (Research approval code: 401098, Ethics code: IR.KAUMS.NUHEPM.REC.1401.056).

All methods of the present study were performed in accordance with the relevant guidelines and regulations of the ethical committee of Kashan University of Medical Sciences. Participation was voluntary, the consent was verbal, but all participants responded via email or text message to approve their participation. Informed consent was obtained from all the participants. Participants had the right to withdraw from the study at any time without prejudice.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kohzadi, Z., Nickfarjam, A.M., Arani, L.S. et al. Extraction frequent patterns in trauma dataset based on automatic generation of minimum support and feature weighting. BMC Med Res Methodol 24 , 40 (2024). https://doi.org/10.1186/s12874-024-02154-0

Download citation

Received : 03 November 2023

Accepted : 17 January 2024

Published : 16 February 2024

DOI : https://doi.org/10.1186/s12874-024-02154-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Automatic minimum support
  • Trauma registry

BMC Medical Research Methodology

ISSN: 1471-2288

research methodology in medical research

IMAGES

  1. Medical research methodology

    research methodology in medical research

  2. PPT

    research methodology in medical research

  3. Certified Workshop on Clinical Research & Methodology

    research methodology in medical research

  4. PPT

    research methodology in medical research

  5. Research methodology in medical research

    research methodology in medical research

  6. Types of Research Methodology: Uses, Types & Benefits

    research methodology in medical research

VIDEO

  1. What is Research??

  2. Research Methodology Course Part 3

  3. Research Methodology Part I simple concepts

  4. Research Methodology : Lecture 5

  5. Research Methodology Video for Final Exam

  6. Research Methodology Workshop

COMMENTS

  1. Methodology for clinical research

    Medical research can be divided into primary and secondary research, where primary research involves conducting studies and collecting raw data, which is then analysed and evaluated in secondary research. The successful deployment of clinical research methodology depends upon several factors.

  2. A tutorial on methodological studies: the what, when, how and why

    Methodological studies - studies that evaluate the design, analysis or reporting of other research-related reports - play an important role in health research. They help to highlight issues in the conduct of research with the aim of improving health research methodology, and ultimately reducing research waste. Main body

  3. PDF HEALTH RESEARCH METHODOLOGY

    Health research methodology: A guide for training in research methods INTRODUCTION This is a revised version of an earlier manual on Health Research Methodology and deals with the basic concepts and principles of scientific research methods with particular attention to research in the health field. The research process is the cornerstone for ...

  4. Home page

    BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research.

  5. Research Methods in Medicine & Health Sciences: Sage Journals

    Research Methods in Medicine & Health Sciences is a peer reviewed journal, publishing rigorous research on established "gold standard" methods and new cutting edge research methods in the health sciences and clinical medicine. This journal is a member of the Committee on Publication Ethics (COPE). Browse by Most recent Most read Most cited Trending

  6. A practical guide for health researchers

    It seeks to develop practical skills, starting with defining the characteristics of a good research question, guiding the reader through translation of that question into a research study, conduct of the research, and finally presentation of the outcome of the research to the world at large.

  7. Principles of Research Methodology: A Guide for Clinical ...

    Principles of Research Methodology A Guide for Clinical Investigators Home Book Editors: Phyllis G. Supino, Jeffrey S. Borer Based on a highly regarded and popular lecture series on research methodology Comprehensive guide written by experts in the field Emphasizes the essentials and fundamentals of research methodologies 75k Accesses 20 Citations

  8. Methodological standards for qualitative and mixed methods patient

    The Patient-Centered Outcomes Research Institute's (PCORI) methodology standards for qualitative methods and mixed methods research help ensure that research studies are designed and conducted to generate the evidence needed to answer patients' and clinicians' questions about which methods work best, for whom, and under what circumstances. This set of standards focuses on factors ...

  9. Research Methodologies in Health Professions Education Publi ...

    The medical education research approach at that time heavily relied on a quantitative methodology that was grounded in empiricism and the scientific method of investigation. In the 1980s, qualitative research paradigms originated from other disciplines such as anthropology and sociology were introduced into medical education.

  10. Health research methodology : a guide for training in research methods

    Health research methodology : a guide for training in research methods. 2nd ed. 中文. English. français. español. português. All of IRIS Communities & Collections By Issue Date By Issue Date. 929061157X_eng.pdf (‎1.541Mb)‎. Show Statistical Information.

  11. Research Methodology in the Medical and Biological Sciences

    The information presented also facilitates communication across conventional disciplinary boundaries, in line with the increasingly multidisciplinary nature of modern research projects. Purchase Research Methodology in the Medical and Biological Sciences - 1st Edition. Print Book & E-Book. ISBN 9780123738745, 9780080552897.

  12. PDF A tutorial on methodological studies: the what, when, how and why

    Methodological studies studies that evaluate the design, analysis or reporting of other research-related reports play an important role in health research. They help to highlight issues in the conduct of research. -. with the aim of improving health research methodology, and ultimately reducing research waste.

  13. Statistical Methods in Medical Research: Sage Journals

    Statistical Methods in Medical Research is a highly ranked, peer reviewed scholarly journal and is the leading vehicle for articles in all the main areas of medical statistics and therefore an essential reference for all medical statisticians.

  14. (PDF) Research methodology in medicine for beginners

    Research methodology in medicine for beginners Authors: Apurb Sharma Nepal Mediciti Abstract This presentation is about how to conduct a scientific study. It describes the basics of research....

  15. Common Statistical Methods and Reporting of Results in Medical Research

    Statistical analysis is critical in medical research. The objective of this article is to summarize the appropriate use and reporting of commonly used statistical methods in medical research, on the basis of existing statistical guidelines and the authors' experience in reviewing manuscripts, to provide recommendations for statistical applications and reporting.

  16. Articles

    The Maximum Likelihood Estimator (MLE) for parameters of the gamma distribution is commonly used to estimate models of right-skewed variables such as costs, hospital length of stay, and appointment wait times ... Peter Veazie, Orna Intrator, Bruce Kinosian and Ciaran S. Phibbs. BMC Medical Research Methodology 2023 23 :298.

  17. PDF Recommendations for accurate reporting in medical research statistics

    medical research statistics. An important requirement for validity of medical research is sound . methodology and statistics, yet this is still often overlooked by medical researchers. 1,2. Based on the experience of reviewing statistics in more than 1000 manuscripts submitted to The Lancet Group of journals over the past

  18. Journal of Medical Internet Research

    Methods: This study crawled all the data from May 2012 to May 2022 under the topic of allergic rhinitis on Zhihu, obtaining a total of 9628 posts and 33,747 comments. We improved the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorized annotated text items for training the risk factor classifier ...

  19. Qualitative Methods in Health Care Research

    The three principal approaches to health research are the quantitative, the qualitative, and the mixed methods approach. The quantitative research method uses data, which are measures of values and counts and are often described using statistical methods which in turn aids the researcher to draw inferences.

  20. Research team develops universal and accurate method to calculate how

    A research team from the Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences / IOCB Prague has developed a novel computational method that can accurately describe how ...

  21. Submission guidelines

    Now you've identified a journal to submit to, there are a few things you should be familiar with before you submit. Make sure you are submitting to the most suitable journal - Aims and scope. Understand the costs and funding options - Fees and funding. Make sure your manuscript is accurate and readable - Language editing services.

  22. Application of ALSO course in standardized training Resident in

    Objective To explore the teaching effect of Advanced Life Support in Obstetrics (ALSO) Course in the standardized training resident in obstetric. Methods 60 residents of obstetrics from January 2021 to December 2022 were randomly divided into two groups, observation group and control group. The experimental group used ALSO teaching method, and the control group used traditional teaching method ...

  23. Qualitative research essentials for medical education

    This paper offers a selective overview of the increasingly popular domain of qualitative research. We consider the nature of qualitative research questions, describe common methodologies, discuss data collection and analysis methods, highlight recent innovations, and outline principles of rigour. The aim of this paper is to educate newcomers ...

  24. Extraction frequent patterns in trauma dataset based on automatic

    Purpose Data mining has been used to help discover Frequent patterns in health data. it is widely used to diagnose and prevent various diseases and to obtain the causes and factors affecting diseases. Therefore, the aim of the present study is to discover frequent patterns in the data of the Kashan Trauma Registry based on a new method. Methods We utilized real data from the Kashan Trauma ...