• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

data type in research method

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

customer advocacy software

21 Best Customer Advocacy Software for Customers in 2024

Apr 19, 2024

quantitative data analysis software

10 Quantitative Data Analysis Software for Every Data Scientist

Apr 18, 2024

Enterprise Feedback Management software

11 Best Enterprise Feedback Management Software in 2024

online reputation management software

17 Best Online Reputation Management Software in 2024

Apr 17, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

News alert: UC Berkeley has announced its next university librarian

Secondary menu

  • Log in to your Library account
  • Hours and Maps
  • Connect from Off Campus
  • UC Berkeley Home

Search form

Research methods--quantitative, qualitative, and more: overview.

  • Quantitative Research
  • Qualitative Research
  • Data Science Methods (Machine Learning, AI, Big Data)
  • Text Mining and Computational Text Analysis
  • Evidence Synthesis/Systematic Reviews
  • Get Data, Get Help!

About Research Methods

This guide provides an overview of research methods, how to choose and use them, and supports and resources at UC Berkeley. 

As Patten and Newhart note in the book Understanding Research Methods , "Research methods are the building blocks of the scientific enterprise. They are the "how" for building systematic knowledge. The accumulation of knowledge through research is by its nature a collective endeavor. Each well-designed study provides evidence that may support, amend, refute, or deepen the understanding of existing knowledge...Decisions are important throughout the practice of research and are designed to help researchers collect evidence that includes the full spectrum of the phenomenon under study, to maintain logical rules, and to mitigate or account for possible sources of bias. In many ways, learning research methods is learning how to see and make these decisions."

The choice of methods varies by discipline, by the kind of phenomenon being studied and the data being used to study it, by the technology available, and more.  This guide is an introduction, but if you don't see what you need here, always contact your subject librarian, and/or take a look to see if there's a library research guide that will answer your question. 

Suggestions for changes and additions to this guide are welcome! 

START HERE: SAGE Research Methods

Without question, the most comprehensive resource available from the library is SAGE Research Methods.  HERE IS THE ONLINE GUIDE  to this one-stop shopping collection, and some helpful links are below:

  • SAGE Research Methods
  • Little Green Books  (Quantitative Methods)
  • Little Blue Books  (Qualitative Methods)
  • Dictionaries and Encyclopedias  
  • Case studies of real research projects
  • Sample datasets for hands-on practice
  • Streaming video--see methods come to life
  • Methodspace- -a community for researchers
  • SAGE Research Methods Course Mapping

Library Data Services at UC Berkeley

Library Data Services Program and Digital Scholarship Services

The LDSP offers a variety of services and tools !  From this link, check out pages for each of the following topics:  discovering data, managing data, collecting data, GIS data, text data mining, publishing data, digital scholarship, open science, and the Research Data Management Program.

Be sure also to check out the visual guide to where to seek assistance on campus with any research question you may have!

Library GIS Services

Other Data Services at Berkeley

D-Lab Supports Berkeley faculty, staff, and graduate students with research in data intensive social science, including a wide range of training and workshop offerings Dryad Dryad is a simple self-service tool for researchers to use in publishing their datasets. It provides tools for the effective publication of and access to research data. Geospatial Innovation Facility (GIF) Provides leadership and training across a broad array of integrated mapping technologies on campu Research Data Management A UC Berkeley guide and consulting service for research data management issues

General Research Methods Resources

Here are some general resources for assistance:

  • Assistance from ICPSR (must create an account to access): Getting Help with Data , and Resources for Students
  • Wiley Stats Ref for background information on statistics topics
  • Survey Documentation and Analysis (SDA) .  Program for easy web-based analysis of survey data.

Consultants

  • D-Lab/Data Science Discovery Consultants Request help with your research project from peer consultants.
  • Research data (RDM) consulting Meet with RDM consultants before designing the data security, storage, and sharing aspects of your qualitative project.
  • Statistics Department Consulting Services A service in which advanced graduate students, under faculty supervision, are available to consult during specified hours in the Fall and Spring semesters.

Related Resourcex

  • IRB / CPHS Qualitative research projects with human subjects often require that you go through an ethics review.
  • OURS (Office of Undergraduate Research and Scholarships) OURS supports undergraduates who want to embark on research projects and assistantships. In particular, check out their "Getting Started in Research" workshops
  • Sponsored Projects Sponsored projects works with researchers applying for major external grants.
  • Next: Quantitative Research >>
  • Last Updated: Apr 3, 2023 3:14 PM
  • URL: https://guides.lib.berkeley.edu/researchmethods

FIU Libraries Logo

  •   LibGuides
  •   A-Z List
  •   Help

Research Methods Help Guide

  • Quick Glossary

Introduction

Quantitative data, qualitative data.

  • Types of Research
  • Types of Studies
  • Helpful Resources
  • Get Help @ FIU

More Information

  • Qualitative vs Quantitative LibGuide by the Ebling Library, Health Sciences Learning Center at the University of Wisconsin-Madison.
  • Differences Between Qualitative and Quantitative Research Methods Table comparing qualitative and quantitative research methods, created by the Oak Ridge Institute for Science and Education.
  • Nursing Research: Quantitative and Qualitative Research Information provided by the University of Texas Arlington Libraries.

Internet link, free resource

  • Types of Variables From the UF Biostatistics Open Learning Textbook.
  • Qualitative vs Quantitative Methods: Two Opposites that Make a Perfect Match Article discussing the different philosophies behind qualitative and quantitative methods, and an example of how to blend them in the health sciences.

Database Guides

  • ERIC Search Guide by Ramces Marsilli Last Updated Dec 4, 2023 84 views this year
  • PsycINFO Guide by Sarah J. Hammill Last Updated Feb 1, 2024 3111 views this year

Studies can use quantitative data, qualititative data, or both types of data. Each approach has advantages and disadvantages. Explore the resources in the box at the left for more information.

Of the available library databases, only ERIC (for education topics) and PsycINFO (for psychology topics) allow you to limit your results by the type of data a study uses. Hover over the database name below for information on how to do so.

Note: database limits are helpful but not perfect. Rely on your own judgment when determining if data match the type you are seeking.

Login required

Numerical data.

  • How to Analyze Quantitative Data

Non-numerical data.

  • How to Analyze Qualitative Data
  • << Previous: Quick Glossary
  • Next: Types of Research >>
  • Last Updated: Apr 17, 2024 11:36 AM
  • URL: https://library.fiu.edu/researchmethods

Information

Fiu libraries floorplans, green library, modesto a. maidique campus, hubert library, biscayne bay campus.

Federal Depository Library Program logo

Directions: Green Library, MMC

Directions: Hubert Library, BBC

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology

Research Methods | Definition, Types, Examples

Research methods are specific procedures for collecting and analysing data. Developing your research methods is an integral part of your research design . When planning your methods, there are two key decisions you will make.

First, decide how you will collect data . Your methods depend on what type of data you need to answer your research question :

  • Qualitative vs quantitative : Will your data take the form of words or numbers?
  • Primary vs secondary : Will you collect original data yourself, or will you use data that have already been collected by someone else?
  • Descriptive vs experimental : Will you take measurements of something as it is, or will you perform an experiment?

Second, decide how you will analyse the data .

  • For quantitative data, you can use statistical analysis methods to test relationships between variables.
  • For qualitative data, you can use methods such as thematic analysis to interpret patterns and meanings in the data.

Table of contents

Methods for collecting data, examples of data collection methods, methods for analysing data, examples of data analysis methods, frequently asked questions about methodology.

Data are the information that you collect for the purposes of answering your research question . The type of data you need depends on the aims of your research.

Qualitative vs quantitative data

Your choice of qualitative or quantitative data collection depends on the type of knowledge you want to develop.

For questions about ideas, experiences and meanings, or to study something that can’t be described numerically, collect qualitative data .

If you want to develop a more mechanistic understanding of a topic, or your research involves hypothesis testing , collect quantitative data .

You can also take a mixed methods approach, where you use both qualitative and quantitative research methods.

Primary vs secondary data

Primary data are any original information that you collect for the purposes of answering your research question (e.g. through surveys , observations and experiments ). Secondary data are information that has already been collected by other researchers (e.g. in a government census or previous scientific studies).

If you are exploring a novel research question, you’ll probably need to collect primary data. But if you want to synthesise existing knowledge, analyse historical trends, or identify patterns on a large scale, secondary data might be a better choice.

Descriptive vs experimental data

In descriptive research , you collect data about your study subject without intervening. The validity of your research will depend on your sampling method .

In experimental research , you systematically intervene in a process and measure the outcome. The validity of your research will depend on your experimental design .

To conduct an experiment, you need to be able to vary your independent variable , precisely measure your dependent variable, and control for confounding variables . If it’s practically and ethically possible, this method is the best choice for answering questions about cause and effect.

Prevent plagiarism, run a free check.

Your data analysis methods will depend on the type of data you collect and how you prepare them for analysis.

Data can often be analysed both quantitatively and qualitatively. For example, survey responses could be analysed qualitatively by studying the meanings of responses or quantitatively by studying the frequencies of responses.

Qualitative analysis methods

Qualitative analysis is used to understand words, ideas, and experiences. You can use it to interpret data that were collected:

  • From open-ended survey and interview questions, literature reviews, case studies, and other sources that use text rather than numbers.
  • Using non-probability sampling methods .

Qualitative analysis tends to be quite flexible and relies on the researcher’s judgement, so you have to reflect carefully on your choices and assumptions.

Quantitative analysis methods

Quantitative analysis uses numbers and statistics to understand frequencies, averages and correlations (in descriptive studies) or cause-and-effect relationships (in experiments).

You can use quantitative analysis to interpret data that were collected either:

  • During an experiment.
  • Using probability sampling methods .

Because the data are collected and analysed in a statistically valid way, the results of quantitative analysis can be easily standardised and shared among researchers.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to test a hypothesis by systematically collecting and analysing data, while qualitative methods allow you to explore ideas and experiences in depth.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

A sample is a subset of individuals from a larger population. Sampling means selecting the group that you will actually collect data from in your research.

For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

Statistical sampling allows you to test a hypothesis about the characteristics of a population. There are various sampling methods you can use to ensure that your sample is representative of the population as a whole.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
  • If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyse data (e.g. experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Is this article helpful?

More interesting articles.

  • A Quick Guide to Experimental Design | 5 Steps & Examples
  • Between-Subjects Design | Examples, Pros & Cons
  • Case Study | Definition, Examples & Methods
  • Cluster Sampling | A Simple Step-by-Step Guide with Examples
  • Confounding Variables | Definition, Examples & Controls
  • Construct Validity | Definition, Types, & Examples
  • Content Analysis | A Step-by-Step Guide with Examples
  • Control Groups and Treatment Groups | Uses & Examples
  • Controlled Experiments | Methods & Examples of Control
  • Correlation vs Causation | Differences, Designs & Examples
  • Correlational Research | Guide, Design & Examples
  • Critical Discourse Analysis | Definition, Guide & Examples
  • Cross-Sectional Study | Definitions, Uses & Examples
  • Data Cleaning | A Guide with Examples & Steps
  • Data Collection Methods | Step-by-Step Guide & Examples
  • Descriptive Research Design | Definition, Methods & Examples
  • Doing Survey Research | A Step-by-Step Guide & Examples
  • Ethical Considerations in Research | Types & Examples
  • Explanatory Research | Definition, Guide, & Examples
  • Explanatory vs Response Variables | Definitions & Examples
  • Exploratory Research | Definition, Guide, & Examples
  • External Validity | Types, Threats & Examples
  • Extraneous Variables | Examples, Types, Controls
  • Face Validity | Guide with Definition & Examples
  • How to Do Thematic Analysis | Guide & Examples
  • How to Write a Strong Hypothesis | Guide & Examples
  • Inclusion and Exclusion Criteria | Examples & Definition
  • Independent vs Dependent Variables | Definition & Examples
  • Inductive Reasoning | Types, Examples, Explanation
  • Inductive vs Deductive Research Approach (with Examples)
  • Internal Validity | Definition, Threats & Examples
  • Internal vs External Validity | Understanding Differences & Examples
  • Longitudinal Study | Definition, Approaches & Examples
  • Mediator vs Moderator Variables | Differences & Examples
  • Mixed Methods Research | Definition, Guide, & Examples
  • Multistage Sampling | An Introductory Guide with Examples
  • Naturalistic Observation | Definition, Guide & Examples
  • Operationalisation | A Guide with Examples, Pros & Cons
  • Population vs Sample | Definitions, Differences & Examples
  • Primary Research | Definition, Types, & Examples
  • Qualitative vs Quantitative Research | Examples & Methods
  • Quasi-Experimental Design | Definition, Types & Examples
  • Questionnaire Design | Methods, Question Types & Examples
  • Random Assignment in Experiments | Introduction & Examples
  • Reliability vs Validity in Research | Differences, Types & Examples
  • Reproducibility vs Replicability | Difference & Examples
  • Research Design | Step-by-Step Guide with Examples
  • Sampling Methods | Types, Techniques, & Examples
  • Semi-Structured Interview | Definition, Guide & Examples
  • Simple Random Sampling | Definition, Steps & Examples
  • Stratified Sampling | A Step-by-Step Guide with Examples
  • Structured Interview | Definition, Guide & Examples
  • Systematic Review | Definition, Examples & Guide
  • Systematic Sampling | A Step-by-Step Guide with Examples
  • Textual Analysis | Guide, 3 Approaches & Examples
  • The 4 Types of Reliability in Research | Definitions & Examples
  • The 4 Types of Validity | Types, Definitions & Examples
  • Transcribing an Interview | 5 Steps & Transcription Software
  • Triangulation in Research | Guide, Types, Examples
  • Types of Interviews in Research | Guide & Examples
  • Types of Research Designs Compared | Examples
  • Types of Variables in Research | Definitions & Examples
  • Unstructured Interview | Definition, Guide & Examples
  • What Are Control Variables | Definition & Examples
  • What Is a Case-Control Study? | Definition & Examples
  • What Is a Cohort Study? | Definition & Examples
  • What Is a Conceptual Framework? | Tips & Examples
  • What Is a Double-Barrelled Question?
  • What Is a Double-Blind Study? | Introduction & Examples
  • What Is a Focus Group? | Step-by-Step Guide & Examples
  • What Is a Likert Scale? | Guide & Examples
  • What is a Literature Review? | Guide, Template, & Examples
  • What Is a Prospective Cohort Study? | Definition & Examples
  • What Is a Retrospective Cohort Study? | Definition & Examples
  • What Is Action Research? | Definition & Examples
  • What Is an Observational Study? | Guide & Examples
  • What Is Concurrent Validity? | Definition & Examples
  • What Is Content Validity? | Definition & Examples
  • What Is Convenience Sampling? | Definition & Examples
  • What Is Convergent Validity? | Definition & Examples
  • What Is Criterion Validity? | Definition & Examples
  • What Is Deductive Reasoning? | Explanation & Examples
  • What Is Discriminant Validity? | Definition & Example
  • What Is Ecological Validity? | Definition & Examples
  • What Is Ethnography? | Meaning, Guide & Examples
  • What Is Non-Probability Sampling? | Types & Examples
  • What Is Participant Observation? | Definition & Examples
  • What Is Peer Review? | Types & Examples
  • What Is Predictive Validity? | Examples & Definition
  • What Is Probability Sampling? | Types & Examples
  • What Is Purposive Sampling? | Definition & Examples
  • What Is Qualitative Observation? | Definition & Examples
  • What Is Qualitative Research? | Methods & Examples
  • What Is Quantitative Observation? | Definition & Examples
  • What Is Quantitative Research? | Definition & Methods
  • What Is Quota Sampling? | Definition & Examples
  • What is Secondary Research? | Definition, Types, & Examples
  • What Is Snowball Sampling? | Definition & Examples
  • Within-Subjects Design | Explanation, Approaches, Examples

Banner

Data Module #1: What is Research Data?

  • Defining Research Data
  • Qualitative vs. Quantitative

Types of Research Data

  • Data and Statistics
  • Let's Review...

Quick Navigation

Data may be grouped into four main types based on methods for collection: observational, experimental, simulation, and derived. the type of research data you collect may affect the way you manage that data. for example, data that is hard or impossible to replace (e.g. the recording of an event at a specific time and place) requires extra backup procedures to reduce the risk of data loss. or, if you will need to combine data points from different sources, you will need to follow best practices to prevent data corruption.  .

data type in research method

Observational Data

Observational data are captured through observation of a behavior or activity. It is collected using methods such as human observation, open-ended surveys, or the use of an instrument or sensor to monitor and record information -- such as the use of sensors to observe noise levels at the Mpls/St Paul airport. Because observational data are captured in real time, it would be very difficult or impossible to re-create if lost. Image courtesy of  https://dorothyjoseph.com

data type in research method

Experimental Data

Experimental data are collected through active intervention by the researcher to produce and measure change or to create difference when a variable is altered. Experimental data typically allows the researcher to determine a causal relationship and is typically projectable to a larger population. This type of data are often reproducible, but it often can be expensive to do so.  

data type in research method

Simulation Data

Simulation data are generated by imitating the operation of a real-world process or system over time using computer test models. For example, to predict weather conditions, economic models, chemical reactions, or seismic activity. This method is used to try to determine what would, or could, happen under certain conditions. The test model used is often as, or even more, important than the data generated from the simulation.  

data type in research method

Derived / Compiled Data

Derived data involves using existing data points, often from different data sources, to create new data through some sort of transformation, such as an arithmetic formula or aggregation. For example, combining area and population data from the Twin Cities metro area to create population density data. While this type of data can usually be replaced if lost, it may be very time-consuming (and possibly expensive) to do so.  

  • << Previous: Qualitative vs. Quantitative
  • Next: Data and Statistics >>
  • Last Updated: Feb 2, 2024 1:41 PM
  • URL: https://libguides.macalester.edu/data1

What is Data in Statistics & Types Of Data With Examples

Data forms the bedrock of analysis and decision-making in statistics. Knowing data and its various types is essential for conducting meaningful statistical studies.

This article explores data and types of data in statistics. By understanding these concepts, you will be better equipped to interpret and utilize data effectively in your analysis.

What is Data?

Data encompasses all the information, observations, or measurements you gather through various means, such as surveys, experiments, or observations. It can take different forms, including numbers, text, images, or even sensory inputs like temperature readings or sound waves.

In statistics, data serves as the starting point for analysis. It's what you examine, manipulate, and interpret to draw conclusions or make predictions about a particular phenomenon or population.

What is the Role of Data in Statistics?

Data plays an important role in understanding and drawing conclusions. It forms the foundation for analysis, providing the evidence needed to make informed decisions. Without data, your statistical studies lack the real-world information necessary to be meaningful. 

Exploration is driven forward by examining and interpreting collected data. Through this process, you uncover patterns, relationships, and trends, aiding in making sense of the world around you. Ultimately, data serves as the guiding light, illuminating the path to understanding complex events.

What are the Types of Data in Statistics?

Data types are crucial in statistics because different types require different statistical methods for analysis. For instance, analyzing continuous data requires fundamentally different techniques from analyzing categorical data. Using the wrong method for a particular data type can lead to erroneous conclusions. Therefore, understanding the types of data you're working with enables you to select the appropriate method of analysis, ensuring accurate and reliable results.

In statistical analysis, data is broadly categorized into two main types—qualitative data and quantitative data. Each type has its own characteristics, examples, and applications, which are essential for understanding and interpreting statistical information effectively.

Qualitative Data 

Qualitative data, also known as categorical data, consist of categories or labels that represent qualitative characteristics. It simply categorizes individuals or items based on shared attributes.

There are two types of qualitative data:

Nominal Data

Nominal data are categories without any inherent order. Examples include gender (male, female), types of fruits (apple, banana, orange), and city names (New York, London, Paris). Nominal data are typically analyzed using frequency counts and percentages. For example, counting the number of males and females population or the frequency of different types of fruits sold in a specific region.

Ordinal Data

Ordinal data are categories with a natural order or ranking. Examples include survey ratings (poor, fair, good, excellent), educational levels (high school, college, graduate school), and socioeconomic status (low, middle, high). Ordinal data are used for ranking or ordering data, and they can be analyzed using median and mode, as well as non-parametric tests like the Mann-Whitney U test.

Quantitative Data

Quantitative data, also known as numerical data, consists of numbers representing quantities or measurements. Unlike qualitative data, which categorizes individuals or items based on attributes, quantitative data can be measured and expressed numerically, allowing for mathematical operations and statistical data analysis .

There are two types of Quantitative Data:

Discrete Data

Discrete data are distinct, separate values that can be counted. Examples include the number of students in a class, the count of defects in a product, and the number of goals scored in a game. Discrete data are used for counting and tracking occurrences, and they can be analyzed using measures of central tendency such as mean and median, as well as discrete probability distributions like the Poisson distribution.

Continuous Data

Continuous data can take any value within a range. Examples include height, weight, temperature, and time. Continuous data are used for measurements and observations, and they can be analyzed using mean and median, as well as continuous probability distributions like the normal distribution.

Difference Between Qualitative vs Quantitative Data

Quantitative and qualitative data exhibit significant differences. The fundamental distinctions are explored in the table below.

Examples of Qualitative Data

Some examples of qualitative data include:

Documents are a prevalent form of qualitative data, comprising materials like letters, diaries, blog posts, and digital images. These sources offer valuable insights into various research topics by providing firsthand accounts of individuals' thoughts and experiences. They are precious for understanding historical events, offering unique perspectives. When examining qualitative documents, you use a meticulous interpretation process to extract meaning from the text, considering its potential for multiple interpretations.

Case Studies

Case studies are frequently utilized qualitative research methodolody, involving detailed investigations into specific individuals, groups, or events. They offer insights into complex phenomena, shedding light on human thought processes, behaviors, and influencing factors. While valuable, case studies have limitations due to their reliance on a small sample size, potentially leading to a lack of representativeness and researcher bias.

Photographs

Photographs serve as a valuable form of qualitative data, providing insights into various visual aspects of human life, such as clothing, social interactions, and daily activities. They can also document changes over time, such as urban development or product evolution. Apart from their informational value, photographs can evoke emotions and visually capture human behavior complexities.

Audio Recordings

Audio recordings represent raw and unprocessed qualitative data, offering firsthand accounts of events or experiences. They capture spoken language nuances, emotions, and nonverbal cues, making them valuable for research purposes. Audio recordings are commonly used for interviews, focus groups, and studying naturalistic behaviors, albeit requiring meticulous analysis due to their complexity.

Examples of Quantitative Data

Some examples of quantitative data include:

Age in Years

Age commonly serves as a quantitative variable, often recorded in years. Whether precisely documented or categorized broadly (e.g., infancy, adolescence), age is a vital metric in various contexts. It can be represented continuously in units like days, weeks, or months or dichotomously to differentiate between child and adult age groups. Understanding age distribution facilitates demographic analysis and informs decisions across sectors like education and healthcare.

Height Measurement in Feet or Inches

Gathering quantitative data involves various methods. For instance, if you aim to measure the height of a group of individuals, you could utilize a tape measure, ruler, or yardstick to collect data in feet or inches. Once data is gathered, it can be used to compute the average height of the group and discern patterns or trends. For instance, you might observe correlations such as taller individuals tend to have higher weights or gender disparities in average height. Quantitative data proves invaluable for comprehending human behavior and making informed predictions.

Number of Weeks in a Year

A year comprises 52 weeks, providing a precise and measurable quantity, which exemplifies quantitative data. This type of data is crucial in scientific research because the number of weeks allows for standardized comparisons across studies.  

For instance, you can track changes in a population's health over 52 weeks (a year) and compare those findings to studies that measured health changes over 26 weeks (half a year). This consistency in measurement enables the identification of trends and relationships between variables more effectively, leading to insightful analyses.

Revenue in Dollars

Quantitative data, which is numerical and measurable, encompasses metrics like revenue expressed in any form of currency. This data type proves invaluable for assessing various aspects, such as a company's financial performance, products sold on a website and its traffic volume, or product sales quantity. The data is commonly gathered through surveys, experiments, or data analysis, enabling statistical methods to discern trends and correlations.

Distance in Kilometers

Distance measurement stands as another quintessential example of quantitative data, with kilometers being the universally accepted unit for long distances. Kilometers provide a manageable scale for expressing distances without necessitating unwieldy numbers. For instance, kilometers offer a convenient and widely understood metric when measuring the distance from a source to destination.

Since statistical analysis hinges on a unified data set, Airbyte can help you bridge the gap. It effortlessly allows you to gather and centralize information, eliminating the hassle of data collection.

Simplify Statistical Data Analysis with Airbyte

Airbyte

Airbyte , a data integration platform, simplifies the process of integrating and replicating data from various sources. Once centralized, this data empowers statisticians to perform in-depth analysis. By eliminating manual data transfer and ensuring consistent data flow, Airbyte saves valuable time and resources. This allows them to focus on what matters most—extracting meaningful insights from the data.

Here’s what Airbyte offers:

  • Connectors: Airbyte has a vast library of pre-built connectors , exceeding 350 sources and applications. This lets you connect to a wide range of data sources effortlessly, eliminating the need for custom development in many cases. ‍
  • Open-source and Customizable: Airbyte is an open-source platform providing access to its source code for transparency and customization. You can modify existing connectors or build entirely new ones using their Connector Development Kit (CDK) . ‍
  • Monitoring and Integrations: Airbyte allows you to seamlessly integrate with monitoring platforms like Datadog, enabling you to keep track of data pipeline health and performance. Additionally, it supports integrations with popular workflow orchestration tools like Airflow, Prefect, and Dagster for streamlined data pipeline management and processing. ‍
  • Security Features: Airbyte takes security seriously. It offers features like dedicated secret stores to store sensitive information. The platform also supports OAuth for secure authentication and role-based access control for user management. ‍
  • PyAirbyte: PyAirbyte , a Python library, lets you programmatically interact with Airbyte's vast library of pre-built connectors. This allows you to automate data integration tasks and leverage Airbyte's extensive functionality through code.

Data and types of data in statistics are significant as they aid in understanding global phenomena and guiding your decision-making process. Statistics data encompass various data types, each with its use cases. However, by comprehending these data types, you can utilize them effectively to obtain the most accurate insights possible.

About the Author

Table of contents, get your data syncing in minutes, join our newsletter to get all the insights on the data stack., integrate with 300+ apps using airbyte, integrate mysql with 300+ apps using airbyte., related posts.

Elsevier QRcode Wechat

  • Research Process

Choosing the Right Research Methodology: A Guide for Researchers

  • 3 minute read
  • 35.9K views

Table of Contents

Choosing an optimal research methodology is crucial for the success of any research project. The methodology you select will determine the type of data you collect, how you collect it, and how you analyse it. Understanding the different types of research methods available along with their strengths and weaknesses, is thus imperative to make an informed decision.

Understanding different research methods:

There are several research methods available depending on the type of study you are conducting, i.e., whether it is laboratory-based, clinical, epidemiological, or survey based . Some common methodologies include qualitative research, quantitative research, experimental research, survey-based research, and action research. Each method can be opted for and modified, depending on the type of research hypotheses and objectives.

Qualitative vs quantitative research:

When deciding on a research methodology, one of the key factors to consider is whether your research will be qualitative or quantitative. Qualitative research is used to understand people’s experiences, concepts, thoughts, or behaviours . Quantitative research, on the contrary, deals with numbers, graphs, and charts, and is used to test or confirm hypotheses, assumptions, and theories. 

Qualitative research methodology:

Qualitative research is often used to examine issues that are not well understood, and to gather additional insights on these topics. Qualitative research methods include open-ended survey questions, observations of behaviours described through words, and reviews of literature that has explored similar theories and ideas. These methods are used to understand how language is used in real-world situations, identify common themes or overarching ideas, and describe and interpret various texts. Data analysis for qualitative research typically includes discourse analysis, thematic analysis, and textual analysis. 

Quantitative research methodology:

The goal of quantitative research is to test hypotheses, confirm assumptions and theories, and determine cause-and-effect relationships. Quantitative research methods include experiments, close-ended survey questions, and countable and numbered observations. Data analysis for quantitative research relies heavily on statistical methods.

Analysing qualitative vs quantitative data:

The methods used for data analysis also differ for qualitative and quantitative research. As mentioned earlier, quantitative data is generally analysed using statistical methods and does not leave much room for speculation. It is more structured and follows a predetermined plan. In quantitative research, the researcher starts with a hypothesis and uses statistical methods to test it. Contrarily, methods used for qualitative data analysis can identify patterns and themes within the data, rather than provide statistical measures of the data. It is an iterative process, where the researcher goes back and forth trying to gauge the larger implications of the data through different perspectives and revising the analysis if required.

When to use qualitative vs quantitative research:

The choice between qualitative and quantitative research will depend on the gap that the research project aims to address, and specific objectives of the study. If the goal is to establish facts about a subject or topic, quantitative research is an appropriate choice. However, if the goal is to understand people’s experiences or perspectives, qualitative research may be more suitable. 

Conclusion:

In conclusion, an understanding of the different research methods available, their applicability, advantages, and disadvantages is essential for making an informed decision on the best methodology for your project. If you need any additional guidance on which research methodology to opt for, you can head over to Elsevier Author Services (EAS). EAS experts will guide you throughout the process and help you choose the perfect methodology for your research goals.

Why is data validation important in research

Why is data validation important in research?

Importance-of-Data-Collection

When Data Speak, Listen: Importance of Data Collection and Analysis Methods

You may also like.

what is a descriptive research design

Descriptive Research Design and Its Myriad Uses

Doctor doing a Biomedical Research Paper

Five Common Mistakes to Avoid When Writing a Biomedical Research Paper

data type in research method

Making Technical Writing in Environmental Engineering Accessible

Risks of AI-assisted Academic Writing

To Err is Not Human: The Dangers of AI-assisted Academic Writing

Importance-of-Data-Collection

Writing a good review article

Scholarly Sources What are They and Where can You Find Them

Scholarly Sources: What are They and Where can You Find Them?

Input your search keywords and press Enter.

Analyst Answers

Data & Finance for Work & Life

data analysis types, methods, and techniques tree diagram

Data Analysis: Types, Methods & Techniques (a Complete List)

( Updated Version )

While the term sounds intimidating, “data analysis” is nothing more than making sense of information in a table. It consists of filtering, sorting, grouping, and manipulating data tables with basic algebra and statistics.

In fact, you don’t need experience to understand the basics. You have already worked with data extensively in your life, and “analysis” is nothing more than a fancy word for good sense and basic logic.

Over time, people have intuitively categorized the best logical practices for treating data. These categories are what we call today types , methods , and techniques .

This article provides a comprehensive list of types, methods, and techniques, and explains the difference between them.

For a practical intro to data analysis (including types, methods, & techniques), check out our Intro to Data Analysis eBook for free.

Descriptive, Diagnostic, Predictive, & Prescriptive Analysis

If you Google “types of data analysis,” the first few results will explore descriptive , diagnostic , predictive , and prescriptive analysis. Why? Because these names are easy to understand and are used a lot in “the real world.”

Descriptive analysis is an informational method, diagnostic analysis explains “why” a phenomenon occurs, predictive analysis seeks to forecast the result of an action, and prescriptive analysis identifies solutions to a specific problem.

That said, these are only four branches of a larger analytical tree.

Good data analysts know how to position these four types within other analytical methods and tactics, allowing them to leverage strengths and weaknesses in each to uproot the most valuable insights.

Let’s explore the full analytical tree to understand how to appropriately assess and apply these four traditional types.

Tree diagram of Data Analysis Types, Methods, and Techniques

Here’s a picture to visualize the structure and hierarchy of data analysis types, methods, and techniques.

If it’s too small you can view the picture in a new tab . Open it to follow along!

data type in research method

Note: basic descriptive statistics such as mean , median , and mode , as well as standard deviation , are not shown because most people are already familiar with them. In the diagram, they would fall under the “descriptive” analysis type.

Tree Diagram Explained

The highest-level classification of data analysis is quantitative vs qualitative . Quantitative implies numbers while qualitative implies information other than numbers.

Quantitative data analysis then splits into mathematical analysis and artificial intelligence (AI) analysis . Mathematical types then branch into descriptive , diagnostic , predictive , and prescriptive .

Methods falling under mathematical analysis include clustering , classification , forecasting , and optimization . Qualitative data analysis methods include content analysis , narrative analysis , discourse analysis , framework analysis , and/or grounded theory .

Moreover, mathematical techniques include regression , Nïave Bayes , Simple Exponential Smoothing , cohorts , factors , linear discriminants , and more, whereas techniques falling under the AI type include artificial neural networks , decision trees , evolutionary programming , and fuzzy logic . Techniques under qualitative analysis include text analysis , coding , idea pattern analysis , and word frequency .

It’s a lot to remember! Don’t worry, once you understand the relationship and motive behind all these terms, it’ll be like riding a bike.

We’ll move down the list from top to bottom and I encourage you to open the tree diagram above in a new tab so you can follow along .

But first, let’s just address the elephant in the room: what’s the difference between methods and techniques anyway?

Difference between methods and techniques

Though often used interchangeably, methods ands techniques are not the same. By definition, methods are the process by which techniques are applied, and techniques are the practical application of those methods.

For example, consider driving. Methods include staying in your lane, stopping at a red light, and parking in a spot. Techniques include turning the steering wheel, braking, and pushing the gas pedal.

Data sets: observations and fields

It’s important to understand the basic structure of data tables to comprehend the rest of the article. A data set consists of one far-left column containing observations, then a series of columns containing the fields (aka “traits” or “characteristics”) that describe each observations. For example, imagine we want a data table for fruit. It might look like this:

Now let’s turn to types, methods, and techniques. Each heading below consists of a description, relative importance, the nature of data it explores, and the motivation for using it.

Quantitative Analysis

  • It accounts for more than 50% of all data analysis and is by far the most widespread and well-known type of data analysis.
  • As you have seen, it holds descriptive, diagnostic, predictive, and prescriptive methods, which in turn hold some of the most important techniques available today, such as clustering and forecasting.
  • It can be broken down into mathematical and AI analysis.
  • Importance : Very high . Quantitative analysis is a must for anyone interesting in becoming or improving as a data analyst.
  • Nature of Data: data treated under quantitative analysis is, quite simply, quantitative. It encompasses all numeric data.
  • Motive: to extract insights. (Note: we’re at the top of the pyramid, this gets more insightful as we move down.)

Qualitative Analysis

  • It accounts for less than 30% of all data analysis and is common in social sciences .
  • It can refer to the simple recognition of qualitative elements, which is not analytic in any way, but most often refers to methods that assign numeric values to non-numeric data for analysis.
  • Because of this, some argue that it’s ultimately a quantitative type.
  • Importance: Medium. In general, knowing qualitative data analysis is not common or even necessary for corporate roles. However, for researchers working in social sciences, its importance is very high .
  • Nature of Data: data treated under qualitative analysis is non-numeric. However, as part of the analysis, analysts turn non-numeric data into numbers, at which point many argue it is no longer qualitative analysis.
  • Motive: to extract insights. (This will be more important as we move down the pyramid.)

Mathematical Analysis

  • Description: mathematical data analysis is a subtype of qualitative data analysis that designates methods and techniques based on statistics, algebra, and logical reasoning to extract insights. It stands in opposition to artificial intelligence analysis.
  • Importance: Very High. The most widespread methods and techniques fall under mathematical analysis. In fact, it’s so common that many people use “quantitative” and “mathematical” analysis interchangeably.
  • Nature of Data: numeric. By definition, all data under mathematical analysis are numbers.
  • Motive: to extract measurable insights that can be used to act upon.

Artificial Intelligence & Machine Learning Analysis

  • Description: artificial intelligence and machine learning analyses designate techniques based on the titular skills. They are not traditionally mathematical, but they are quantitative since they use numbers. Applications of AI & ML analysis techniques are developing, but they’re not yet mainstream enough to show promise across the field.
  • Importance: Medium . As of today (September 2020), you don’t need to be fluent in AI & ML data analysis to be a great analyst. BUT, if it’s a field that interests you, learn it. Many believe that in 10 year’s time its importance will be very high .
  • Nature of Data: numeric.
  • Motive: to create calculations that build on themselves in order and extract insights without direct input from a human.

Descriptive Analysis

  • Description: descriptive analysis is a subtype of mathematical data analysis that uses methods and techniques to provide information about the size, dispersion, groupings, and behavior of data sets. This may sounds complicated, but just think about mean, median, and mode: all three are types of descriptive analysis. They provide information about the data set. We’ll look at specific techniques below.
  • Importance: Very high. Descriptive analysis is among the most commonly used data analyses in both corporations and research today.
  • Nature of Data: the nature of data under descriptive statistics is sets. A set is simply a collection of numbers that behaves in predictable ways. Data reflects real life, and there are patterns everywhere to be found. Descriptive analysis describes those patterns.
  • Motive: the motive behind descriptive analysis is to understand how numbers in a set group together, how far apart they are from each other, and how often they occur. As with most statistical analysis, the more data points there are, the easier it is to describe the set.

Diagnostic Analysis

  • Description: diagnostic analysis answers the question “why did it happen?” It is an advanced type of mathematical data analysis that manipulates multiple techniques, but does not own any single one. Analysts engage in diagnostic analysis when they try to explain why.
  • Importance: Very high. Diagnostics are probably the most important type of data analysis for people who don’t do analysis because they’re valuable to anyone who’s curious. They’re most common in corporations, as managers often only want to know the “why.”
  • Nature of Data : data under diagnostic analysis are data sets. These sets in themselves are not enough under diagnostic analysis. Instead, the analyst must know what’s behind the numbers in order to explain “why.” That’s what makes diagnostics so challenging yet so valuable.
  • Motive: the motive behind diagnostics is to diagnose — to understand why.

Predictive Analysis

  • Description: predictive analysis uses past data to project future data. It’s very often one of the first kinds of analysis new researchers and corporate analysts use because it is intuitive. It is a subtype of the mathematical type of data analysis, and its three notable techniques are regression, moving average, and exponential smoothing.
  • Importance: Very high. Predictive analysis is critical for any data analyst working in a corporate environment. Companies always want to know what the future will hold — especially for their revenue.
  • Nature of Data: Because past and future imply time, predictive data always includes an element of time. Whether it’s minutes, hours, days, months, or years, we call this time series data . In fact, this data is so important that I’ll mention it twice so you don’t forget: predictive analysis uses time series data .
  • Motive: the motive for investigating time series data with predictive analysis is to predict the future in the most analytical way possible.

Prescriptive Analysis

  • Description: prescriptive analysis is a subtype of mathematical analysis that answers the question “what will happen if we do X?” It’s largely underestimated in the data analysis world because it requires diagnostic and descriptive analyses to be done before it even starts. More than simple predictive analysis, prescriptive analysis builds entire data models to show how a simple change could impact the ensemble.
  • Importance: High. Prescriptive analysis is most common under the finance function in many companies. Financial analysts use it to build a financial model of the financial statements that show how that data will change given alternative inputs.
  • Nature of Data: the nature of data in prescriptive analysis is data sets. These data sets contain patterns that respond differently to various inputs. Data that is useful for prescriptive analysis contains correlations between different variables. It’s through these correlations that we establish patterns and prescribe action on this basis. This analysis cannot be performed on data that exists in a vacuum — it must be viewed on the backdrop of the tangibles behind it.
  • Motive: the motive for prescriptive analysis is to establish, with an acceptable degree of certainty, what results we can expect given a certain action. As you might expect, this necessitates that the analyst or researcher be aware of the world behind the data, not just the data itself.

Clustering Method

  • Description: the clustering method groups data points together based on their relativeness closeness to further explore and treat them based on these groupings. There are two ways to group clusters: intuitively and statistically (or K-means).
  • Importance: Very high. Though most corporate roles group clusters intuitively based on management criteria, a solid understanding of how to group them mathematically is an excellent descriptive and diagnostic approach to allow for prescriptive analysis thereafter.
  • Nature of Data : the nature of data useful for clustering is sets with 1 or more data fields. While most people are used to looking at only two dimensions (x and y), clustering becomes more accurate the more fields there are.
  • Motive: the motive for clustering is to understand how data sets group and to explore them further based on those groups.
  • Here’s an example set:

data type in research method

Classification Method

  • Description: the classification method aims to separate and group data points based on common characteristics . This can be done intuitively or statistically.
  • Importance: High. While simple on the surface, classification can become quite complex. It’s very valuable in corporate and research environments, but can feel like its not worth the work. A good analyst can execute it quickly to deliver results.
  • Nature of Data: the nature of data useful for classification is data sets. As we will see, it can be used on qualitative data as well as quantitative. This method requires knowledge of the substance behind the data, not just the numbers themselves.
  • Motive: the motive for classification is group data not based on mathematical relationships (which would be clustering), but by predetermined outputs. This is why it’s less useful for diagnostic analysis, and more useful for prescriptive analysis.

Forecasting Method

  • Description: the forecasting method uses time past series data to forecast the future.
  • Importance: Very high. Forecasting falls under predictive analysis and is arguably the most common and most important method in the corporate world. It is less useful in research, which prefers to understand the known rather than speculate about the future.
  • Nature of Data: data useful for forecasting is time series data, which, as we’ve noted, always includes a variable of time.
  • Motive: the motive for the forecasting method is the same as that of prescriptive analysis: the confidently estimate future values.

Optimization Method

  • Description: the optimization method maximized or minimizes values in a set given a set of criteria. It is arguably most common in prescriptive analysis. In mathematical terms, it is maximizing or minimizing a function given certain constraints.
  • Importance: Very high. The idea of optimization applies to more analysis types than any other method. In fact, some argue that it is the fundamental driver behind data analysis. You would use it everywhere in research and in a corporation.
  • Nature of Data: the nature of optimizable data is a data set of at least two points.
  • Motive: the motive behind optimization is to achieve the best result possible given certain conditions.

Content Analysis Method

  • Description: content analysis is a method of qualitative analysis that quantifies textual data to track themes across a document. It’s most common in academic fields and in social sciences, where written content is the subject of inquiry.
  • Importance: High. In a corporate setting, content analysis as such is less common. If anything Nïave Bayes (a technique we’ll look at below) is the closest corporations come to text. However, it is of the utmost importance for researchers. If you’re a researcher, check out this article on content analysis .
  • Nature of Data: data useful for content analysis is textual data.
  • Motive: the motive behind content analysis is to understand themes expressed in a large text

Narrative Analysis Method

  • Description: narrative analysis is a method of qualitative analysis that quantifies stories to trace themes in them. It’s differs from content analysis because it focuses on stories rather than research documents, and the techniques used are slightly different from those in content analysis (very nuances and outside the scope of this article).
  • Importance: Low. Unless you are highly specialized in working with stories, narrative analysis rare.
  • Nature of Data: the nature of the data useful for the narrative analysis method is narrative text.
  • Motive: the motive for narrative analysis is to uncover hidden patterns in narrative text.

Discourse Analysis Method

  • Description: the discourse analysis method falls under qualitative analysis and uses thematic coding to trace patterns in real-life discourse. That said, real-life discourse is oral, so it must first be transcribed into text.
  • Importance: Low. Unless you are focused on understand real-world idea sharing in a research setting, this kind of analysis is less common than the others on this list.
  • Nature of Data: the nature of data useful in discourse analysis is first audio files, then transcriptions of those audio files.
  • Motive: the motive behind discourse analysis is to trace patterns of real-world discussions. (As a spooky sidenote, have you ever felt like your phone microphone was listening to you and making reading suggestions? If it was, the method was discourse analysis.)

Framework Analysis Method

  • Description: the framework analysis method falls under qualitative analysis and uses similar thematic coding techniques to content analysis. However, where content analysis aims to discover themes, framework analysis starts with a framework and only considers elements that fall in its purview.
  • Importance: Low. As with the other textual analysis methods, framework analysis is less common in corporate settings. Even in the world of research, only some use it. Strangely, it’s very common for legislative and political research.
  • Nature of Data: the nature of data useful for framework analysis is textual.
  • Motive: the motive behind framework analysis is to understand what themes and parts of a text match your search criteria.

Grounded Theory Method

  • Description: the grounded theory method falls under qualitative analysis and uses thematic coding to build theories around those themes.
  • Importance: Low. Like other qualitative analysis techniques, grounded theory is less common in the corporate world. Even among researchers, you would be hard pressed to find many using it. Though powerful, it’s simply too rare to spend time learning.
  • Nature of Data: the nature of data useful in the grounded theory method is textual.
  • Motive: the motive of grounded theory method is to establish a series of theories based on themes uncovered from a text.

Clustering Technique: K-Means

  • Description: k-means is a clustering technique in which data points are grouped in clusters that have the closest means. Though not considered AI or ML, it inherently requires the use of supervised learning to reevaluate clusters as data points are added. Clustering techniques can be used in diagnostic, descriptive, & prescriptive data analyses.
  • Importance: Very important. If you only take 3 things from this article, k-means clustering should be part of it. It is useful in any situation where n observations have multiple characteristics and we want to put them in groups.
  • Nature of Data: the nature of data is at least one characteristic per observation, but the more the merrier.
  • Motive: the motive for clustering techniques such as k-means is to group observations together and either understand or react to them.

Regression Technique

  • Description: simple and multivariable regressions use either one independent variable or combination of multiple independent variables to calculate a correlation to a single dependent variable using constants. Regressions are almost synonymous with correlation today.
  • Importance: Very high. Along with clustering, if you only take 3 things from this article, regression techniques should be part of it. They’re everywhere in corporate and research fields alike.
  • Nature of Data: the nature of data used is regressions is data sets with “n” number of observations and as many variables as are reasonable. It’s important, however, to distinguish between time series data and regression data. You cannot use regressions or time series data without accounting for time. The easier way is to use techniques under the forecasting method.
  • Motive: The motive behind regression techniques is to understand correlations between independent variable(s) and a dependent one.

Nïave Bayes Technique

  • Description: Nïave Bayes is a classification technique that uses simple probability to classify items based previous classifications. In plain English, the formula would be “the chance that thing with trait x belongs to class c depends on (=) the overall chance of trait x belonging to class c, multiplied by the overall chance of class c, divided by the overall chance of getting trait x.” As a formula, it’s P(c|x) = P(x|c) * P(c) / P(x).
  • Importance: High. Nïave Bayes is a very common, simplistic classification techniques because it’s effective with large data sets and it can be applied to any instant in which there is a class. Google, for example, might use it to group webpages into groups for certain search engine queries.
  • Nature of Data: the nature of data for Nïave Bayes is at least one class and at least two traits in a data set.
  • Motive: the motive behind Nïave Bayes is to classify observations based on previous data. It’s thus considered part of predictive analysis.

Cohorts Technique

  • Description: cohorts technique is a type of clustering method used in behavioral sciences to separate users by common traits. As with clustering, it can be done intuitively or mathematically, the latter of which would simply be k-means.
  • Importance: Very high. With regard to resembles k-means, the cohort technique is more of a high-level counterpart. In fact, most people are familiar with it as a part of Google Analytics. It’s most common in marketing departments in corporations, rather than in research.
  • Nature of Data: the nature of cohort data is data sets in which users are the observation and other fields are used as defining traits for each cohort.
  • Motive: the motive for cohort analysis techniques is to group similar users and analyze how you retain them and how the churn.

Factor Technique

  • Description: the factor analysis technique is a way of grouping many traits into a single factor to expedite analysis. For example, factors can be used as traits for Nïave Bayes classifications instead of more general fields.
  • Importance: High. While not commonly employed in corporations, factor analysis is hugely valuable. Good data analysts use it to simplify their projects and communicate them more clearly.
  • Nature of Data: the nature of data useful in factor analysis techniques is data sets with a large number of fields on its observations.
  • Motive: the motive for using factor analysis techniques is to reduce the number of fields in order to more quickly analyze and communicate findings.

Linear Discriminants Technique

  • Description: linear discriminant analysis techniques are similar to regressions in that they use one or more independent variable to determine a dependent variable; however, the linear discriminant technique falls under a classifier method since it uses traits as independent variables and class as a dependent variable. In this way, it becomes a classifying method AND a predictive method.
  • Importance: High. Though the analyst world speaks of and uses linear discriminants less commonly, it’s a highly valuable technique to keep in mind as you progress in data analysis.
  • Nature of Data: the nature of data useful for the linear discriminant technique is data sets with many fields.
  • Motive: the motive for using linear discriminants is to classify observations that would be otherwise too complex for simple techniques like Nïave Bayes.

Exponential Smoothing Technique

  • Description: exponential smoothing is a technique falling under the forecasting method that uses a smoothing factor on prior data in order to predict future values. It can be linear or adjusted for seasonality. The basic principle behind exponential smoothing is to use a percent weight (value between 0 and 1 called alpha) on more recent values in a series and a smaller percent weight on less recent values. The formula is f(x) = current period value * alpha + previous period value * 1-alpha.
  • Importance: High. Most analysts still use the moving average technique (covered next) for forecasting, though it is less efficient than exponential moving, because it’s easy to understand. However, good analysts will have exponential smoothing techniques in their pocket to increase the value of their forecasts.
  • Nature of Data: the nature of data useful for exponential smoothing is time series data . Time series data has time as part of its fields .
  • Motive: the motive for exponential smoothing is to forecast future values with a smoothing variable.

Moving Average Technique

  • Description: the moving average technique falls under the forecasting method and uses an average of recent values to predict future ones. For example, to predict rainfall in April, you would take the average of rainfall from January to March. It’s simple, yet highly effective.
  • Importance: Very high. While I’m personally not a huge fan of moving averages due to their simplistic nature and lack of consideration for seasonality, they’re the most common forecasting technique and therefore very important.
  • Nature of Data: the nature of data useful for moving averages is time series data .
  • Motive: the motive for moving averages is to predict future values is a simple, easy-to-communicate way.

Neural Networks Technique

  • Description: neural networks are a highly complex artificial intelligence technique that replicate a human’s neural analysis through a series of hyper-rapid computations and comparisons that evolve in real time. This technique is so complex that an analyst must use computer programs to perform it.
  • Importance: Medium. While the potential for neural networks is theoretically unlimited, it’s still little understood and therefore uncommon. You do not need to know it by any means in order to be a data analyst.
  • Nature of Data: the nature of data useful for neural networks is data sets of astronomical size, meaning with 100s of 1000s of fields and the same number of row at a minimum .
  • Motive: the motive for neural networks is to understand wildly complex phenomenon and data to thereafter act on it.

Decision Tree Technique

  • Description: the decision tree technique uses artificial intelligence algorithms to rapidly calculate possible decision pathways and their outcomes on a real-time basis. It’s so complex that computer programs are needed to perform it.
  • Importance: Medium. As with neural networks, decision trees with AI are too little understood and are therefore uncommon in corporate and research settings alike.
  • Nature of Data: the nature of data useful for the decision tree technique is hierarchical data sets that show multiple optional fields for each preceding field.
  • Motive: the motive for decision tree techniques is to compute the optimal choices to make in order to achieve a desired result.

Evolutionary Programming Technique

  • Description: the evolutionary programming technique uses a series of neural networks, sees how well each one fits a desired outcome, and selects only the best to test and retest. It’s called evolutionary because is resembles the process of natural selection by weeding out weaker options.
  • Importance: Medium. As with the other AI techniques, evolutionary programming just isn’t well-understood enough to be usable in many cases. It’s complexity also makes it hard to explain in corporate settings and difficult to defend in research settings.
  • Nature of Data: the nature of data in evolutionary programming is data sets of neural networks, or data sets of data sets.
  • Motive: the motive for using evolutionary programming is similar to decision trees: understanding the best possible option from complex data.
  • Video example :

Fuzzy Logic Technique

  • Description: fuzzy logic is a type of computing based on “approximate truths” rather than simple truths such as “true” and “false.” It is essentially two tiers of classification. For example, to say whether “Apples are good,” you need to first classify that “Good is x, y, z.” Only then can you say apples are good. Another way to see it helping a computer see truth like humans do: “definitely true, probably true, maybe true, probably false, definitely false.”
  • Importance: Medium. Like the other AI techniques, fuzzy logic is uncommon in both research and corporate settings, which means it’s less important in today’s world.
  • Nature of Data: the nature of fuzzy logic data is huge data tables that include other huge data tables with a hierarchy including multiple subfields for each preceding field.
  • Motive: the motive of fuzzy logic to replicate human truth valuations in a computer is to model human decisions based on past data. The obvious possible application is marketing.

Text Analysis Technique

  • Description: text analysis techniques fall under the qualitative data analysis type and use text to extract insights.
  • Importance: Medium. Text analysis techniques, like all the qualitative analysis type, are most valuable for researchers.
  • Nature of Data: the nature of data useful in text analysis is words.
  • Motive: the motive for text analysis is to trace themes in a text across sets of very long documents, such as books.

Coding Technique

  • Description: the coding technique is used in textual analysis to turn ideas into uniform phrases and analyze the number of times and the ways in which those ideas appear. For this reason, some consider it a quantitative technique as well. You can learn more about coding and the other qualitative techniques here .
  • Importance: Very high. If you’re a researcher working in social sciences, coding is THE analysis techniques, and for good reason. It’s a great way to add rigor to analysis. That said, it’s less common in corporate settings.
  • Nature of Data: the nature of data useful for coding is long text documents.
  • Motive: the motive for coding is to make tracing ideas on paper more than an exercise of the mind by quantifying it and understanding is through descriptive methods.

Idea Pattern Technique

  • Description: the idea pattern analysis technique fits into coding as the second step of the process. Once themes and ideas are coded, simple descriptive analysis tests may be run. Some people even cluster the ideas!
  • Importance: Very high. If you’re a researcher, idea pattern analysis is as important as the coding itself.
  • Nature of Data: the nature of data useful for idea pattern analysis is already coded themes.
  • Motive: the motive for the idea pattern technique is to trace ideas in otherwise unmanageably-large documents.

Word Frequency Technique

  • Description: word frequency is a qualitative technique that stands in opposition to coding and uses an inductive approach to locate specific words in a document in order to understand its relevance. Word frequency is essentially the descriptive analysis of qualitative data because it uses stats like mean, median, and mode to gather insights.
  • Importance: High. As with the other qualitative approaches, word frequency is very important in social science research, but less so in corporate settings.
  • Nature of Data: the nature of data useful for word frequency is long, informative documents.
  • Motive: the motive for word frequency is to locate target words to determine the relevance of a document in question.

Types of data analysis in research

Types of data analysis in research methodology include every item discussed in this article. As a list, they are:

  • Quantitative
  • Qualitative
  • Mathematical
  • Machine Learning and AI
  • Descriptive
  • Prescriptive
  • Classification
  • Forecasting
  • Optimization
  • Grounded theory
  • Artificial Neural Networks
  • Decision Trees
  • Evolutionary Programming
  • Fuzzy Logic
  • Text analysis
  • Idea Pattern Analysis
  • Word Frequency Analysis
  • Nïave Bayes
  • Exponential smoothing
  • Moving average
  • Linear discriminant

Types of data analysis in qualitative research

As a list, the types of data analysis in qualitative research are the following methods:

Types of data analysis in quantitative research

As a list, the types of data analysis in quantitative research are:

Data analysis methods

As a list, data analysis methods are:

  • Content (qualitative)
  • Narrative (qualitative)
  • Discourse (qualitative)
  • Framework (qualitative)
  • Grounded theory (qualitative)

Quantitative data analysis methods

As a list, quantitative data analysis methods are:

Tabular View of Data Analysis Types, Methods, and Techniques

About the author.

Noah is the founder & Editor-in-Chief at AnalystAnswers. He is a transatlantic professional and entrepreneur with 5+ years of corporate finance and data analytics experience, as well as 3+ years in consumer financial products and business software. He started AnalystAnswers to provide aspiring professionals with accessible explanations of otherwise dense finance and data concepts. Noah believes everyone can benefit from an analytical mindset in growing digital world. When he's not busy at work, Noah likes to explore new European cities, exercise, and spend time with friends and family.

File available immediately.

data type in research method

Notice: JavaScript is required for this content.

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

Cover of StatPearls

StatPearls [Internet].

Qualitative study.

Steven Tenny ; Janelle M. Brannan ; Grace D. Brannan .

Affiliations

Last Update: September 18, 2022 .

  • Introduction

Qualitative research is a type of research that explores and provides deeper insights into real-world problems. [1] Instead of collecting numerical data points or intervene or introduce treatments just like in quantitative research, qualitative research helps generate hypotheses as well as further investigate and understand quantitative data. Qualitative research gathers participants' experiences, perceptions, and behavior. It answers the hows and whys instead of how many or how much. It could be structured as a stand-alone study, purely relying on qualitative data or it could be part of mixed-methods research that combines qualitative and quantitative data. This review introduces the readers to some basic concepts, definitions, terminology, and application of qualitative research.

Qualitative research at its core, ask open-ended questions whose answers are not easily put into numbers such as ‘how’ and ‘why’. [2] Due to the open-ended nature of the research questions at hand, qualitative research design is often not linear in the same way quantitative design is. [2] One of the strengths of qualitative research is its ability to explain processes and patterns of human behavior that can be difficult to quantify. [3] Phenomena such as experiences, attitudes, and behaviors can be difficult to accurately capture quantitatively, whereas a qualitative approach allows participants themselves to explain how, why, or what they were thinking, feeling, and experiencing at a certain time or during an event of interest. Quantifying qualitative data certainly is possible, but at its core, qualitative data is looking for themes and patterns that can be difficult to quantify and it is important to ensure that the context and narrative of qualitative work are not lost by trying to quantify something that is not meant to be quantified.

However, while qualitative research is sometimes placed in opposition to quantitative research, where they are necessarily opposites and therefore ‘compete’ against each other and the philosophical paradigms associated with each, qualitative and quantitative work are not necessarily opposites nor are they incompatible. [4] While qualitative and quantitative approaches are different, they are not necessarily opposites, and they are certainly not mutually exclusive. For instance, qualitative research can help expand and deepen understanding of data or results obtained from quantitative analysis. For example, say a quantitative analysis has determined that there is a correlation between length of stay and level of patient satisfaction, but why does this correlation exist? This dual-focus scenario shows one way in which qualitative and quantitative research could be integrated together.

Examples of Qualitative Research Approaches

Ethnography

Ethnography as a research design has its origins in social and cultural anthropology, and involves the researcher being directly immersed in the participant’s environment. [2] Through this immersion, the ethnographer can use a variety of data collection techniques with the aim of being able to produce a comprehensive account of the social phenomena that occurred during the research period. [2] That is to say, the researcher’s aim with ethnography is to immerse themselves into the research population and come out of it with accounts of actions, behaviors, events, etc. through the eyes of someone involved in the population. Direct involvement of the researcher with the target population is one benefit of ethnographic research because it can then be possible to find data that is otherwise very difficult to extract and record.

Grounded Theory

Grounded Theory is the “generation of a theoretical model through the experience of observing a study population and developing a comparative analysis of their speech and behavior.” [5] As opposed to quantitative research which is deductive and tests or verifies an existing theory, grounded theory research is inductive and therefore lends itself to research that is aiming to study social interactions or experiences. [3] [2] In essence, Grounded Theory’s goal is to explain for example how and why an event occurs or how and why people might behave a certain way. Through observing the population, a researcher using the Grounded Theory approach can then develop a theory to explain the phenomena of interest.

Phenomenology

Phenomenology is defined as the “study of the meaning of phenomena or the study of the particular”. [5] At first glance, it might seem that Grounded Theory and Phenomenology are quite similar, but upon careful examination, the differences can be seen. At its core, phenomenology looks to investigate experiences from the perspective of the individual. [2] Phenomenology is essentially looking into the ‘lived experiences’ of the participants and aims to examine how and why participants behaved a certain way, from their perspective . Herein lies one of the main differences between Grounded Theory and Phenomenology. Grounded Theory aims to develop a theory for social phenomena through an examination of various data sources whereas Phenomenology focuses on describing and explaining an event or phenomena from the perspective of those who have experienced it.

Narrative Research

One of qualitative research’s strengths lies in its ability to tell a story, often from the perspective of those directly involved in it. Reporting on qualitative research involves including details and descriptions of the setting involved and quotes from participants. This detail is called ‘thick’ or ‘rich’ description and is a strength of qualitative research. Narrative research is rife with the possibilities of ‘thick’ description as this approach weaves together a sequence of events, usually from just one or two individuals, in the hopes of creating a cohesive story, or narrative. [2] While it might seem like a waste of time to focus on such a specific, individual level, understanding one or two people’s narratives for an event or phenomenon can help to inform researchers about the influences that helped shape that narrative. The tension or conflict of differing narratives can be “opportunities for innovation”. [2]

Research Paradigm

Research paradigms are the assumptions, norms, and standards that underpin different approaches to research. Essentially, research paradigms are the ‘worldview’ that inform research. [4] It is valuable for researchers, both qualitative and quantitative, to understand what paradigm they are working within because understanding the theoretical basis of research paradigms allows researchers to understand the strengths and weaknesses of the approach being used and adjust accordingly. Different paradigms have different ontology and epistemologies . Ontology is defined as the "assumptions about the nature of reality” whereas epistemology is defined as the “assumptions about the nature of knowledge” that inform the work researchers do. [2] It is important to understand the ontological and epistemological foundations of the research paradigm researchers are working within to allow for a full understanding of the approach being used and the assumptions that underpin the approach as a whole. Further, it is crucial that researchers understand their own ontological and epistemological assumptions about the world in general because their assumptions about the world will necessarily impact how they interact with research. A discussion of the research paradigm is not complete without describing positivist, postpositivist, and constructivist philosophies.

Positivist vs Postpositivist

To further understand qualitative research, we need to discuss positivist and postpositivist frameworks. Positivism is a philosophy that the scientific method can and should be applied to social as well as natural sciences. [4] Essentially, positivist thinking insists that the social sciences should use natural science methods in its research which stems from positivist ontology that there is an objective reality that exists that is fully independent of our perception of the world as individuals. Quantitative research is rooted in positivist philosophy, which can be seen in the value it places on concepts such as causality, generalizability, and replicability.

Conversely, postpositivists argue that social reality can never be one hundred percent explained but it could be approximated. [4] Indeed, qualitative researchers have been insisting that there are “fundamental limits to the extent to which the methods and procedures of the natural sciences could be applied to the social world” and therefore postpositivist philosophy is often associated with qualitative research. [4] An example of positivist versus postpositivist values in research might be that positivist philosophies value hypothesis-testing, whereas postpositivist philosophies value the ability to formulate a substantive theory.

Constructivist

Constructivism is a subcategory of postpositivism. Most researchers invested in postpositivist research are constructivist as well, meaning they think there is no objective external reality that exists but rather that reality is constructed. Constructivism is a theoretical lens that emphasizes the dynamic nature of our world. “Constructivism contends that individuals’ views are directly influenced by their experiences, and it is these individual experiences and views that shape their perspective of reality”. [6] Essentially, Constructivist thought focuses on how ‘reality’ is not a fixed certainty and experiences, interactions, and backgrounds give people a unique view of the world. Constructivism contends, unlike in positivist views, that there is not necessarily an ‘objective’ reality we all experience. This is the ‘relativist’ ontological view that reality and the world we live in are dynamic and socially constructed. Therefore, qualitative scientific knowledge can be inductive as well as deductive.” [4]

So why is it important to understand the differences in assumptions that different philosophies and approaches to research have? Fundamentally, the assumptions underpinning the research tools a researcher selects provide an overall base for the assumptions the rest of the research will have and can even change the role of the researcher themselves. [2] For example, is the researcher an ‘objective’ observer such as in positivist quantitative work? Or is the researcher an active participant in the research itself, as in postpositivist qualitative work? Understanding the philosophical base of the research undertaken allows researchers to fully understand the implications of their work and their role within the research, as well as reflect on their own positionality and bias as it pertains to the research they are conducting.

Data Sampling 

The better the sample represents the intended study population, the more likely the researcher is to encompass the varying factors at play. The following are examples of participant sampling and selection: [7]

  • Purposive sampling- selection based on the researcher’s rationale in terms of being the most informative.
  • Criterion sampling-selection based on pre-identified factors.
  • Convenience sampling- selection based on availability.
  • Snowball sampling- the selection is by referral from other participants or people who know potential participants.
  • Extreme case sampling- targeted selection of rare cases.
  • Typical case sampling-selection based on regular or average participants. 

Data Collection and Analysis

Qualitative research uses several techniques including interviews, focus groups, and observation. [1] [2] [3] Interviews may be unstructured, with open-ended questions on a topic and the interviewer adapts to the responses. Structured interviews have a predetermined number of questions that every participant is asked. It is usually one on one and is appropriate for sensitive topics or topics needing an in-depth exploration. Focus groups are often held with 8-12 target participants and are used when group dynamics and collective views on a topic are desired. Researchers can be a participant-observer to share the experiences of the subject or a non-participant or detached observer.

While quantitative research design prescribes a controlled environment for data collection, qualitative data collection may be in a central location or in the environment of the participants, depending on the study goals and design. Qualitative research could amount to a large amount of data. Data is transcribed which may then be coded manually or with the use of Computer Assisted Qualitative Data Analysis Software or CAQDAS such as ATLAS.ti or NVivo. [8] [9] [10]

After the coding process, qualitative research results could be in various formats. It could be a synthesis and interpretation presented with excerpts from the data. [11] Results also could be in the form of themes and theory or model development.

Dissemination

To standardize and facilitate the dissemination of qualitative research outcomes, the healthcare team can use two reporting standards. The Consolidated Criteria for Reporting Qualitative Research or COREQ is a 32-item checklist for interviews and focus groups. [12] The Standards for Reporting Qualitative Research (SRQR) is a checklist covering a wider range of qualitative research. [13]

Examples of Application

Many times a research question will start with qualitative research. The qualitative research will help generate the research hypothesis which can be tested with quantitative methods. After the data is collected and analyzed with quantitative methods, a set of qualitative methods can be used to dive deeper into the data for a better understanding of what the numbers truly mean and their implications. The qualitative methods can then help clarify the quantitative data and also help refine the hypothesis for future research. Furthermore, with qualitative research researchers can explore subjects that are poorly studied with quantitative methods. These include opinions, individual's actions, and social science research.

A good qualitative study design starts with a goal or objective. This should be clearly defined or stated. The target population needs to be specified. A method for obtaining information from the study population must be carefully detailed to ensure there are no omissions of part of the target population. A proper collection method should be selected which will help obtain the desired information without overly limiting the collected data because many times, the information sought is not well compartmentalized or obtained. Finally, the design should ensure adequate methods for analyzing the data. An example may help better clarify some of the various aspects of qualitative research.

A researcher wants to decrease the number of teenagers who smoke in their community. The researcher could begin by asking current teen smokers why they started smoking through structured or unstructured interviews (qualitative research). The researcher can also get together a group of current teenage smokers and conduct a focus group to help brainstorm factors that may have prevented them from starting to smoke (qualitative research).

In this example, the researcher has used qualitative research methods (interviews and focus groups) to generate a list of ideas of both why teens start to smoke as well as factors that may have prevented them from starting to smoke. Next, the researcher compiles this data. The research found that, hypothetically, peer pressure, health issues, cost, being considered “cool,” and rebellious behavior all might increase or decrease the likelihood of teens starting to smoke.

The researcher creates a survey asking teen participants to rank how important each of the above factors is in either starting smoking (for current smokers) or not smoking (for current non-smokers). This survey provides specific numbers (ranked importance of each factor) and is thus a quantitative research tool.

The researcher can use the results of the survey to focus efforts on the one or two highest-ranked factors. Let us say the researcher found that health was the major factor that keeps teens from starting to smoke, and peer pressure was the major factor that contributed to teens to start smoking. The researcher can go back to qualitative research methods to dive deeper into each of these for more information. The researcher wants to focus on how to keep teens from starting to smoke, so they focus on the peer pressure aspect.

The researcher can conduct interviews and/or focus groups (qualitative research) about what types and forms of peer pressure are commonly encountered, where the peer pressure comes from, and where smoking first starts. The researcher hypothetically finds that peer pressure often occurs after school at the local teen hangouts, mostly the local park. The researcher also hypothetically finds that peer pressure comes from older, current smokers who provide the cigarettes.

The researcher could further explore this observation made at the local teen hangouts (qualitative research) and take notes regarding who is smoking, who is not, and what observable factors are at play for peer pressure of smoking. The researcher finds a local park where many local teenagers hang out and see that a shady, overgrown area of the park is where the smokers tend to hang out. The researcher notes the smoking teenagers buy their cigarettes from a local convenience store adjacent to the park where the clerk does not check identification before selling cigarettes. These observations fall under qualitative research.

If the researcher returns to the park and counts how many individuals smoke in each region of the park, this numerical data would be quantitative research. Based on the researcher's efforts thus far, they conclude that local teen smoking and teenagers who start to smoke may decrease if there are fewer overgrown areas of the park and the local convenience store does not sell cigarettes to underage individuals.

The researcher could try to have the parks department reassess the shady areas to make them less conducive to the smokers or identify how to limit the sales of cigarettes to underage individuals by the convenience store. The researcher would then cycle back to qualitative methods of asking at-risk population their perceptions of the changes, what factors are still at play, as well as quantitative research that includes teen smoking rates in the community, the incidence of new teen smokers, among others. [14] [15]

Qualitative research functions as a standalone research design or in combination with quantitative research to enhance our understanding of the world. Qualitative research uses techniques including structured and unstructured interviews, focus groups, and participant observation to not only help generate hypotheses which can be more rigorously tested with quantitative research but also to help researchers delve deeper into the quantitative research numbers, understand what they mean, and understand what the implications are.  Qualitative research provides researchers with a way to understand what is going on, especially when things are not easily categorized. [16]

  • Issues of Concern

As discussed in the sections above, quantitative and qualitative work differ in many different ways, including the criteria for evaluating them. There are four well-established criteria for evaluating quantitative data: internal validity, external validity, reliability, and objectivity. The correlating concepts in qualitative research are credibility, transferability, dependability, and confirmability. [4] [11] The corresponding quantitative and qualitative concepts can be seen below, with the quantitative concept is on the left, and the qualitative concept is on the right:

  • Internal validity--- Credibility
  • External validity---Transferability
  • Reliability---Dependability
  • Objectivity---Confirmability

In conducting qualitative research, ensuring these concepts are satisfied and well thought out can mitigate potential issues from arising. For example, just as a researcher will ensure that their quantitative study is internally valid so should qualitative researchers ensure that their work has credibility.  

Indicators such as triangulation and peer examination can help evaluate the credibility of qualitative work.

  • Triangulation: Triangulation involves using multiple methods of data collection to increase the likelihood of getting a reliable and accurate result. In our above magic example, the result would be more reliable by also interviewing the magician, back-stage hand, and the person who "vanished." In qualitative research, triangulation can include using telephone surveys, in-person surveys, focus groups, and interviews as well as surveying an adequate cross-section of the target demographic.
  • Peer examination: Results can be reviewed by a peer to ensure the data is consistent with the findings.

‘Thick’ or ‘rich’ description can be used to evaluate the transferability of qualitative research whereas using an indicator such as an audit trail might help with evaluating the dependability and confirmability.

  • Thick or rich description is a detailed and thorough description of details, the setting, and quotes from participants in the research. [5] Thick descriptions will include a detailed explanation of how the study was carried out. Thick descriptions are detailed enough to allow readers to draw conclusions and interpret the data themselves, which can help with transferability and replicability.
  • Audit trail: An audit trail provides a documented set of steps of how the participants were selected and the data was collected. The original records of information should also be kept (e.g., surveys, notes, recordings).

One issue of concern that qualitative researchers should take into consideration is observation bias. Here are a few examples:

  • Hawthorne effect: The Hawthorne effect is the change in participant behavior when they know they are being observed. If a researcher was wanting to identify factors that contribute to employee theft and tells the employees they are going to watch them to see what factors affect employee theft, one would suspect employee behavior would change when they know they are being watched.
  • Observer-expectancy effect: Some participants change their behavior or responses to satisfy the researcher's desired effect. This happens in an unconscious manner for the participant so it is important to eliminate or limit transmitting the researcher's views.
  • Artificial scenario effect: Some qualitative research occurs in artificial scenarios and/or with preset goals. In such situations, the information may not be accurate because of the artificial nature of the scenario. The preset goals may limit the qualitative information obtained.
  • Clinical Significance

Qualitative research by itself or combined with quantitative research helps healthcare providers understand patients and the impact and challenges of the care they deliver. Qualitative research provides an opportunity to generate and refine hypotheses and delve deeper into the data generated by quantitative research. Qualitative research does not exist as an island apart from quantitative research, but as an integral part of research methods to be used for the understanding of the world around us. [17]

  • Enhancing Healthcare Team Outcomes

Qualitative research is important for all members of the health care team as all are affected by qualitative research. Qualitative research may help develop a theory or a model for health research that can be further explored by quantitative research.  Much of the qualitative research data acquisition is completed by numerous team members including social works, scientists, nurses, etc.  Within each area of the medical field, there is copious ongoing qualitative research including physician-patient interactions, nursing-patient interactions, patient-environment interactions, health care team function, patient information delivery, etc. 

  • Review Questions
  • Access free multiple choice questions on this topic.
  • Comment on this article.

Disclosure: Steven Tenny declares no relevant financial relationships with ineligible companies.

Disclosure: Janelle Brannan declares no relevant financial relationships with ineligible companies.

Disclosure: Grace Brannan declares no relevant financial relationships with ineligible companies.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.

  • Cite this Page Tenny S, Brannan JM, Brannan GD. Qualitative Study. [Updated 2022 Sep 18]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

In this Page

Bulk download.

  • Bulk download StatPearls data from FTP

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Similar articles in PubMed

  • Suicidal Ideation. [StatPearls. 2024] Suicidal Ideation. Harmer B, Lee S, Duong TVH, Saadabadi A. StatPearls. 2024 Jan
  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. [Cochrane Database Syst Rev. 2022] Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, et al. Cochrane Database Syst Rev. 2022 Feb 1; 2(2022). Epub 2022 Feb 1.
  • Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012). [Phys Biol. 2013] Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012). Foffi G, Pastore A, Piazza F, Temussi PA. Phys Biol. 2013 Aug; 10(4):040301. Epub 2013 Aug 2.
  • Review Evidence Brief: The Effectiveness Of Mandatory Computer-Based Trainings On Government Ethics, Workplace Harassment, Or Privacy And Information Security-Related Topics [ 2014] Review Evidence Brief: The Effectiveness Of Mandatory Computer-Based Trainings On Government Ethics, Workplace Harassment, Or Privacy And Information Security-Related Topics Peterson K, McCleery E. 2014 May
  • Review Public sector reforms and their impact on the level of corruption: A systematic review. [Campbell Syst Rev. 2021] Review Public sector reforms and their impact on the level of corruption: A systematic review. Mugellini G, Della Bella S, Colagrossi M, Isenring GL, Killias M. Campbell Syst Rev. 2021 Jun; 17(2):e1173. Epub 2021 May 24.

Recent Activity

  • Qualitative Study - StatPearls Qualitative Study - StatPearls

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

Research Methods In Psychology

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

Research methods in psychology are systematic procedures used to observe, describe, predict, and explain behavior and mental processes. They include experiments, surveys, case studies, and naturalistic observations, ensuring data collection is objective and reliable to understand and explain psychological phenomena.

research methods3

Hypotheses are statements about the prediction of the results, that can be verified or disproved by some investigation.

There are four types of hypotheses :
  • Null Hypotheses (H0 ) – these predict that no difference will be found in the results between the conditions. Typically these are written ‘There will be no difference…’
  • Alternative Hypotheses (Ha or H1) – these predict that there will be a significant difference in the results between the two conditions. This is also known as the experimental hypothesis.
  • One-tailed (directional) hypotheses – these state the specific direction the researcher expects the results to move in, e.g. higher, lower, more, less. In a correlation study, the predicted direction of the correlation can be either positive or negative.
  • Two-tailed (non-directional) hypotheses – these state that a difference will be found between the conditions of the independent variable but does not state the direction of a difference or relationship. Typically these are always written ‘There will be a difference ….’

All research has an alternative hypothesis (either a one-tailed or two-tailed) and a corresponding null hypothesis.

Once the research is conducted and results are found, psychologists must accept one hypothesis and reject the other. 

So, if a difference is found, the Psychologist would accept the alternative hypothesis and reject the null.  The opposite applies if no difference is found.

Sampling techniques

Sampling is the process of selecting a representative group from the population under study.

Sample Target Population

A sample is the participants you select from a target population (the group you are interested in) to make generalizations about.

Representative means the extent to which a sample mirrors a researcher’s target population and reflects its characteristics.

Generalisability means the extent to which their findings can be applied to the larger population of which their sample was a part.

  • Volunteer sample : where participants pick themselves through newspaper adverts, noticeboards or online.
  • Opportunity sampling : also known as convenience sampling , uses people who are available at the time the study is carried out and willing to take part. It is based on convenience.
  • Random sampling : when every person in the target population has an equal chance of being selected. An example of random sampling would be picking names out of a hat.
  • Systematic sampling : when a system is used to select participants. Picking every Nth person from all possible participants. N = the number of people in the research population / the number of people needed for the sample.
  • Stratified sampling : when you identify the subgroups and select participants in proportion to their occurrences.
  • Snowball sampling : when researchers find a few participants, and then ask them to find participants themselves and so on.
  • Quota sampling : when researchers will be told to ensure the sample fits certain quotas, for example they might be told to find 90 participants, with 30 of them being unemployed.

Experiments always have an independent and dependent variable .

  • The independent variable is the one the experimenter manipulates (the thing that changes between the conditions the participants are placed into). It is assumed to have a direct effect on the dependent variable.
  • The dependent variable is the thing being measured, or the results of the experiment.

variables

Operationalization of variables means making them measurable/quantifiable. We must use operationalization to ensure that variables are in a form that can be easily tested.

For instance, we can’t really measure ‘happiness’, but we can measure how many times a person smiles within a two-hour period. 

By operationalizing variables, we make it easy for someone else to replicate our research. Remember, this is important because we can check if our findings are reliable.

Extraneous variables are all variables which are not independent variable but could affect the results of the experiment.

It can be a natural characteristic of the participant, such as intelligence levels, gender, or age for example, or it could be a situational feature of the environment such as lighting or noise.

Demand characteristics are a type of extraneous variable that occurs if the participants work out the aims of the research study, they may begin to behave in a certain way.

For example, in Milgram’s research , critics argued that participants worked out that the shocks were not real and they administered them as they thought this was what was required of them. 

Extraneous variables must be controlled so that they do not affect (confound) the results.

Randomly allocating participants to their conditions or using a matched pairs experimental design can help to reduce participant variables. 

Situational variables are controlled by using standardized procedures, ensuring every participant in a given condition is treated in the same way

Experimental Design

Experimental design refers to how participants are allocated to each condition of the independent variable, such as a control or experimental group.
  • Independent design ( between-groups design ): each participant is selected for only one group. With the independent design, the most common way of deciding which participants go into which group is by means of randomization. 
  • Matched participants design : each participant is selected for only one group, but the participants in the two groups are matched for some relevant factor or factors (e.g. ability; sex; age).
  • Repeated measures design ( within groups) : each participant appears in both groups, so that there are exactly the same participants in each group.
  • The main problem with the repeated measures design is that there may well be order effects. Their experiences during the experiment may change the participants in various ways.
  • They may perform better when they appear in the second group because they have gained useful information about the experiment or about the task. On the other hand, they may perform less well on the second occasion because of tiredness or boredom.
  • Counterbalancing is the best way of preventing order effects from disrupting the findings of an experiment, and involves ensuring that each condition is equally likely to be used first and second by the participants.

If we wish to compare two groups with respect to a given independent variable, it is essential to make sure that the two groups do not differ in any other important way. 

Experimental Methods

All experimental methods involve an iv (independent variable) and dv (dependent variable)..

  • Field experiments are conducted in the everyday (natural) environment of the participants. The experimenter still manipulates the IV, but in a real-life setting. It may be possible to control extraneous variables, though such control is more difficult than in a lab experiment.
  • Natural experiments are when a naturally occurring IV is investigated that isn’t deliberately manipulated, it exists anyway. Participants are not randomly allocated, and the natural event may only occur rarely.

Case studies are in-depth investigations of a person, group, event, or community. It uses information from a range of sources, such as from the person concerned and also from their family and friends.

Many techniques may be used such as interviews, psychological tests, observations and experiments. Case studies are generally longitudinal: in other words, they follow the individual or group over an extended period of time. 

Case studies are widely used in psychology and among the best-known ones carried out were by Sigmund Freud . He conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.

Case studies provide rich qualitative data and have high levels of ecological validity. However, it is difficult to generalize from individual cases as each one has unique characteristics.

Correlational Studies

Correlation means association; it is a measure of the extent to which two variables are related. One of the variables can be regarded as the predictor variable with the other one as the outcome variable.

Correlational studies typically involve obtaining two different measures from a group of participants, and then assessing the degree of association between the measures. 

The predictor variable can be seen as occurring before the outcome variable in some sense. It is called the predictor variable, because it forms the basis for predicting the value of the outcome variable.

Relationships between variables can be displayed on a graph or as a numerical score called a correlation coefficient.

types of correlation. Scatter plot. Positive negative and no correlation

  • If an increase in one variable tends to be associated with an increase in the other, then this is known as a positive correlation .
  • If an increase in one variable tends to be associated with a decrease in the other, then this is known as a negative correlation .
  • A zero correlation occurs when there is no relationship between variables.

After looking at the scattergraph, if we want to be sure that a significant relationship does exist between the two variables, a statistical test of correlation can be conducted, such as Spearman’s rho.

The test will give us a score, called a correlation coefficient . This is a value between 0 and 1, and the closer to 1 the score is, the stronger the relationship between the variables. This value can be both positive e.g. 0.63, or negative -0.63.

Types of correlation. Strong, weak, and perfect positive correlation, strong, weak, and perfect negative correlation, no correlation. Graphs or charts ...

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.

Correlation does not always prove causation, as a third variable may be involved. 

causation correlation

Interview Methods

Interviews are commonly divided into two types: structured and unstructured.

A fixed, predetermined set of questions is put to every participant in the same order and in the same way. 

Responses are recorded on a questionnaire, and the researcher presets the order and wording of questions, and sometimes the range of alternative answers.

The interviewer stays within their role and maintains social distance from the interviewee.

There are no set questions, and the participant can raise whatever topics he/she feels are relevant and ask them in their own way. Questions are posed about participants’ answers to the subject

Unstructured interviews are most useful in qualitative research to analyze attitudes and values.

Though they rarely provide a valid basis for generalization, their main advantage is that they enable the researcher to probe social actors’ subjective point of view. 

Questionnaire Method

Questionnaires can be thought of as a kind of written interview. They can be carried out face to face, by telephone, or post.

The choice of questions is important because of the need to avoid bias or ambiguity in the questions, ‘leading’ the respondent or causing offense.

  • Open questions are designed to encourage a full, meaningful answer using the subject’s own knowledge and feelings. They provide insights into feelings, opinions, and understanding. Example: “How do you feel about that situation?”
  • Closed questions can be answered with a simple “yes” or “no” or specific information, limiting the depth of response. They are useful for gathering specific facts or confirming details. Example: “Do you feel anxious in crowds?”

Its other practical advantages are that it is cheaper than face-to-face interviews and can be used to contact many respondents scattered over a wide area relatively quickly.

Observations

There are different types of observation methods :
  • Covert observation is where the researcher doesn’t tell the participants they are being observed until after the study is complete. There could be ethical problems or deception and consent with this particular observation method.
  • Overt observation is where a researcher tells the participants they are being observed and what they are being observed for.
  • Controlled : behavior is observed under controlled laboratory conditions (e.g., Bandura’s Bobo doll study).
  • Natural : Here, spontaneous behavior is recorded in a natural setting.
  • Participant : Here, the observer has direct contact with the group of people they are observing. The researcher becomes a member of the group they are researching.  
  • Non-participant (aka “fly on the wall): The researcher does not have direct contact with the people being observed. The observation of participants’ behavior is from a distance

Pilot Study

A pilot  study is a small scale preliminary study conducted in order to evaluate the feasibility of the key s teps in a future, full-scale project.

A pilot study is an initial run-through of the procedures to be used in an investigation; it involves selecting a few people and trying out the study on them. It is possible to save time, and in some cases, money, by identifying any flaws in the procedures designed by the researcher.

A pilot study can help the researcher spot any ambiguities (i.e. unusual things) or confusion in the information given to participants or problems with the task devised.

Sometimes the task is too hard, and the researcher may get a floor effect, because none of the participants can score at all or can complete the task – all performances are low.

The opposite effect is a ceiling effect, when the task is so easy that all achieve virtually full marks or top performances and are “hitting the ceiling”.

Research Design

In cross-sectional research , a researcher compares multiple segments of the population at the same time

Sometimes, we want to see how people change over time, as in studies of human development and lifespan. Longitudinal research is a research design in which data-gathering is administered repeatedly over an extended period of time.

In cohort studies , the participants must share a common factor or characteristic such as age, demographic, or occupation. A cohort study is a type of longitudinal study in which researchers monitor and observe a chosen population over an extended period.

Triangulation means using more than one research method to improve the study’s validity.

Reliability

Reliability is a measure of consistency, if a particular measurement is repeated and the same result is obtained then it is described as being reliable.

  • Test-retest reliability :  assessing the same person on two different occasions which shows the extent to which the test produces the same answers.
  • Inter-observer reliability : the extent to which there is an agreement between two or more observers.

Meta-Analysis

A meta-analysis is a systematic review that involves identifying an aim and then searching for research studies that have addressed similar aims/hypotheses.

This is done by looking through various databases, and then decisions are made about what studies are to be included/excluded.

Strengths: Increases the conclusions’ validity as they’re based on a wider range.

Weaknesses: Research designs in studies can vary, so they are not truly comparable.

Peer Review

A researcher submits an article to a journal. The choice of the journal may be determined by the journal’s audience or prestige.

The journal selects two or more appropriate experts (psychologists working in a similar field) to peer review the article without payment. The peer reviewers assess: the methods and designs used, originality of the findings, the validity of the original research findings and its content, structure and language.

Feedback from the reviewer determines whether the article is accepted. The article may be: Accepted as it is, accepted with revisions, sent back to the author to revise and re-submit or rejected without the possibility of submission.

The editor makes the final decision whether to accept or reject the research report based on the reviewers comments/ recommendations.

Peer review is important because it prevent faulty data from entering the public domain, it provides a way of checking the validity of findings and the quality of the methodology and is used to assess the research rating of university departments.

Peer reviews may be an ideal, whereas in practice there are lots of problems. For example, it slows publication down and may prevent unusual, new work being published. Some reviewers might use it as an opportunity to prevent competing researchers from publishing work.

Some people doubt whether peer review can really prevent the publication of fraudulent research.

The advent of the internet means that a lot of research and academic comment is being published without official peer reviews than before, though systems are evolving on the internet where everyone really has a chance to offer their opinions and police the quality of research.

Types of Data

  • Quantitative data is numerical data e.g. reaction time or number of mistakes. It represents how much or how long, how many there are of something. A tally of behavioral categories and closed questions in a questionnaire collect quantitative data.
  • Qualitative data is virtually any type of information that can be observed and recorded that is not numerical in nature and can be in the form of written or verbal communication. Open questions in questionnaires and accounts from observational studies collect qualitative data.
  • Primary data is first-hand data collected for the purpose of the investigation.
  • Secondary data is information that has been collected by someone other than the person who is conducting the research e.g. taken from journals, books or articles.

Validity means how well a piece of research actually measures what it sets out to, or how well it reflects the reality it claims to represent.

Validity is whether the observed effect is genuine and represents what is actually out there in the world.

  • Concurrent validity is the extent to which a psychological measure relates to an existing similar measure and obtains close results. For example, a new intelligence test compared to an established test.
  • Face validity : does the test measure what it’s supposed to measure ‘on the face of it’. This is done by ‘eyeballing’ the measuring or by passing it to an expert to check.
  • Ecological validit y is the extent to which findings from a research study can be generalized to other settings / real life.
  • Temporal validity is the extent to which findings from a research study can be generalized to other historical times.

Features of Science

  • Paradigm – A set of shared assumptions and agreed methods within a scientific discipline.
  • Paradigm shift – The result of the scientific revolution: a significant change in the dominant unifying theory within a scientific discipline.
  • Objectivity – When all sources of personal bias are minimised so not to distort or influence the research process.
  • Empirical method – Scientific approaches that are based on the gathering of evidence through direct observation and experience.
  • Replicability – The extent to which scientific procedures and findings can be repeated by other researchers.
  • Falsifiability – The principle that a theory cannot be considered scientific unless it admits the possibility of being proved untrue.

Statistical Testing

A significant result is one where there is a low probability that chance factors were responsible for any observed difference, correlation, or association in the variables tested.

If our test is significant, we can reject our null hypothesis and accept our alternative hypothesis.

If our test is not significant, we can accept our null hypothesis and reject our alternative hypothesis. A null hypothesis is a statement of no effect.

In Psychology, we use p < 0.05 (as it strikes a balance between making a type I and II error) but p < 0.01 is used in tests that could cause harm like introducing a new drug.

A type I error is when the null hypothesis is rejected when it should have been accepted (happens when a lenient significance level is used, an error of optimism).

A type II error is when the null hypothesis is accepted when it should have been rejected (happens when a stringent significance level is used, an error of pessimism).

Ethical Issues

  • Informed consent is when participants are able to make an informed judgment about whether to take part. It causes them to guess the aims of the study and change their behavior.
  • To deal with it, we can gain presumptive consent or ask them to formally indicate their agreement to participate but it may invalidate the purpose of the study and it is not guaranteed that the participants would understand.
  • Deception should only be used when it is approved by an ethics committee, as it involves deliberately misleading or withholding information. Participants should be fully debriefed after the study but debriefing can’t turn the clock back.
  • All participants should be informed at the beginning that they have the right to withdraw if they ever feel distressed or uncomfortable.
  • It causes bias as the ones that stayed are obedient and some may not withdraw as they may have been given incentives or feel like they’re spoiling the study. Researchers can offer the right to withdraw data after participation.
  • Participants should all have protection from harm . The researcher should avoid risks greater than those experienced in everyday life and they should stop the study if any harm is suspected. However, the harm may not be apparent at the time of the study.
  • Confidentiality concerns the communication of personal information. The researchers should not record any names but use numbers or false names though it may not be possible as it is sometimes possible to work out who the researchers were.

Print Friendly, PDF & Email

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 April 2024

SPADE: spatial deconvolution for domain specific cell-type estimation

  • Yingying Lu 1 ,
  • Qin M. Chen 2 &
  • Lingling An   ORCID: orcid.org/0000-0001-8273-0776 1 , 3 , 4  

Communications Biology volume  7 , Article number:  469 ( 2024 ) Cite this article

143 Accesses

1 Altmetric

Metrics details

  • Data integration
  • Statistical methods

Understanding gene expression in different cell types within their spatial context is a key goal in genomics research. SPADE (SPAtial DEconvolution), our proposed method, addresses this by integrating spatial patterns into the analysis of cell type composition. This approach uses a combination of single-cell RNA sequencing, spatial transcriptomics, and histological data to accurately estimate the proportions of cell types in various locations. Our analyses of synthetic data have demonstrated SPADE’s capability to discern cell type-specific spatial patterns effectively. When applied to real-life datasets, SPADE provides insights into cellular dynamics and the composition of tumor tissues. This enhances our comprehension of complex biological systems and aids in exploring cellular diversity. SPADE represents a significant advancement in deciphering spatial gene expression patterns, offering a powerful tool for the detailed investigation of cell types in spatial transcriptomics.

Similar content being viewed by others

data type in research method

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Srinivas Niranj Chandrasekaran, Beth A. Cimini, … Anne E. Carpenter

data type in research method

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Qiuyue Yuan & Zhana Duren

data type in research method

Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice

Honggui Wu, Jiankun Zhang, … X. Sunney Xie

Introduction

Spatial transcriptomics is a cutting-edge technology that has fundamentally transformed the field of transcriptomics by enabling studies of gene expression with unprecedented resolution and specificity 1 . The ability to identify the precise location of gene expression within a tissue represents a game-changing development, as it provides fresh avenues for investigating the complex interplay between gene expression and tissue architecture. By profiling the transcriptome at a high resolution in a spatial context, researchers can gain insights into the cellular heterogeneity that underlies normal tissue function or disease states 2 , with significant implications for addressing a broad range of biological and medical questions. For example, spatial transcriptomics has demonstrated great promise in elucidating the cellular basis of brain function 3 , and in enabling precision treatments for heart disease 4 or cancer 5 . Moreover, spatial transcriptomics has shown immense potential for studying the immune system 6 . Profiling the transcriptome of immune cells in various tissues has yielded insights into how the immune system responds to infection and disease 7 . This approach could play a crucial role in shaping the future of immunotherapies for cancer and other diseases, with spatial precision that is critical for effective treatment with minimal non-specific side effects 8 .

Current spatial transcriptomics technologies face limitations in yielding cell type-specific information within a tissue region, thereby prohibiting the complete capture of gene expression patterns at single-cell resolution in space 9 . For instance, imaging-based spatial transcriptomics protocols provide detailed information at a single cell or subcellular level, but they are unable to measure a large number of genes, making them less suitable for exploratory investigations at the transcriptome level 10 . On the other hand, sequencing-based approaches allow for the measurement of gene expression for each spatial location across the entire transcriptome, but this comes at the cost of single-cell resolution 11 . As the compositions of cell types vary between different tissue locations, the data obtained from sequencing may be inconsistent for subsequent analyses. Specifically, when identifying differentially expressed genes across multiple spatial locations, the observed gene expression variations may not solely be influenced by spatial location, but also by differences in the categories or proportions of cell types 12 . Hence, there is a growing need for methodologies that accurately depict and describe the spatial patterns of gene expression variations while accounting for the specificity of individual cell types.

Single-cell RNA sequencing (scRNA-seq) has significantly advanced our understanding of cell heterogeneity and gene expression patterns at an individual cell level 13 . While scRNA-seq reveals intricate details of cellular functions, its limitation lies in not capturing the spatial context of cells within tissues 9 . Addressing this gap, computational deconvolution techniques have emerged, focusing particularly on integrating spatial transcriptomics with single-cell data. This integration is vital for understanding tissue architecture and the spatial distribution of cell types. Several spatially resolved cell type deconvolution techniques have been developed, including SPOTlight 14 , spatialDWLS 15 , RCTD 16 , SpatialDecon 17 , and CARD 12 . SPOTlight utilizes non-negative matrix factorization and non-negative least squares for cell type proportion calculation but neglects location correlations. RCTD leverages single-cell RNA-Seq data for cell type composition deconvolution while accounting for sequencing technology differences, but it does not model spatial patterns. SpatialDWLS extends DWLS 18 , employing a modified weighted least square for cell type composition estimation and uniquely using an enrichment test for cell type determination, but its enrichment score selection is arbitrary and spatial patterns are not considered. SpatialDecon surpasses traditional least-squares methods through log-normal regression and background modeling but overlooks location relationships. CARD incorporates conditional autoregressive modeling for spatial correlation structure consideration but disregards varying cell type identities in spatial patterns. Notably, none of these methods utilize valuable histological information. Overlooking spatial structures can lead to misleading conclusions, as they significantly impact biological functions.

To meet the challenge of cell type deconvolution in spatially resolved transcriptomics data, we have developed SPADE, a deconvolution tool that integrates cell type information derived from scRNA-seq data obtained from corresponding samples to accurately estimate the proportions of diverse cell types. Recognizing the unique characteristics of spatial transcriptomics data, such as the association of particular cell types with specific locations, the correlation between spatial positions and cell types, and the similarity between adjacent locations, we incorporated a cutting-edge spatial domain detection algorithm 19 that capitalizes on gene expression patterns, spatial coordinates, and histological data. To accommodate variations in cell type composition across distinct locations, we implemented an adaptive cell type selection step that efficiently determines the presence of specific cell types within each spot. Our findings substantiate the effectiveness of SPADE through rigorous simulations, wherein we benchmarked it against the existing spatial deconvolution methodologies. Furthermore, we applied SPADE to publicly available spatial transcriptomics studies across various areas, underscoring its utility in deciphering cell type-specific gene expression profiles. The proposed approach constitutes a significant advancement in the field of spatial transcriptomics, facilitating comprehensive and precise analyses of complex, heterogeneous tissue samples.

Overview of SPADE

SPADE methodology involves a three-step approach to estimate the cell type proportions within a spatial domain, as depicted in Fig.  1 and Supplementary Fig.  1 . In the first step, SPADE identifies the spatial domains within a tissue by employing spaGCN 19 , a graph convolutional network specifically designed for spatial transcriptomics data. This integration of gene expression, spatial location, and histology data enables SPADE to identify the spatial domains that spatially coherent in both gene expression and histology. In the second phase, a cell type reference dataset is built from scRNA-seq to guide cell type identification within each domain, employing a Lasso regression algorithm 20 . This algorithm capitalizes on spatial gene expression data and cell type information to determine the optimal number of cell types present within each domain, which is subsequently employed for deconvolution analysis in the ensuing step. Concurrently, scRNA-seq data is adopted to create cell type-specific gene expression profiles, which guide the deconvolution process. In the final step, SPADE calculates the proportions of cell types within each spatial domain by utilizing cell type-specific features. These features consist of genes that are differentially expressed to each cell type. The SPADE analysis output provides the calculated cell type proportions for every spatial location for a given tissue region, which is an essential metric for investigating complex biological systems.

figure 1

SPADE leverages reference single-cell RNA sequencing data to determine the cell type proportion at each location in the sample. To achieve this, SPADE first uses a combination of histology, spatial location, and gene expression information to identify spatial domains within a tissue. Subsequently, it performs a cell type selection for each domain by identifying the specific cell types present. Once the cell type information is obtained, SPADE utilizes scRNA-seq data to perform deconvolution, resulting in the estimation of cell type proportions for every spatial location. The final outcome of SPADE is the calculated cell type proportions for every spatial location. Part of this figure is created with BioRender.com.

Simulation studies

To simulate synthetic spatial gene expression data, we implemented a simulation approach similar to the CARD methodology 12 , leveraging single-cell RNA-seq data. The synthetic data generation involved three steps: (1) generating random proportions for each spatial location within domains using a Dirichlet distribution and this proportion will be used as ground truth, (2) selecting cells from single-cell RNA-seq data within each cell type and summing these counts to produce cell type specific gene expression data, and (3) aggregating gene expression across all cell types within each location and constructing a gene by location matrix as the pseudo-spatial transcriptomic data. More details can be found in Supplementary Fig.  2 . This approach produced synthetic spatial gene expression data similar to real-world data. We conducted two separate simulation experiments that were generated on different mouse tissues. We also compared the method SPADE with existing spatial deconvolution methods, including CARD, SPOTlight, RCTD, spatialDWLS, and SpatialDecon.

The first simulation involves using mouse olfactory bulb (MOB) data. In this simulation study, we utilized three publicly available datasets to generate spatial transcriptomic data, including a single-cell RNA-seq dataset consisting of 10 cell types of the mouse olfactory bulb 21 , a spatial gene expression dataset for the same area, and corresponding hematoxylin and eosin stain (H&E) image data (Fig.  2 a) 22 . We employed SpaGCN and detected four distinct spatial domains (Fig.  2 b), assigning a dominant cell type to each domain, along with varying numbers of minor cell types. To assess the accuracy of cell type detection, we created a bar plot displaying the true positive and false positive rates for each domain (Fig.  2 c). This visualization highlights SPADE’s capability to achieve the highest true positive rates and lowest false positive rates across all domains. The scatter plot (Fig.  2 d) comparing the estimated and true proportions demonstrates that the SPADE estimation closely aligns with the ground truth, achieving results comparable to those of CARD. To represent the inferred cell type proportions for each spatial location, we employed a spatial scatter pie plot (Fig.  2 e), in which SPADE generated an overall pattern that closely mirrored the true patterns and outperformed competing methods. Finally, to account for the stochastic nature of data generation, we evaluated SPADE and other methods by repeating the simulation ten times with varying proportions. The results are shown in a boxplot (Fig.  2 f) in terms of mean absolute deviation (mAD), root mean squared error (RMSE), and correlation (R). Our results demonstrated that SPADE consistently outperformed other methods, achieving the lowest mAD and RMSE and the highest correlation across all simulations, followed by CARD.

figure 2

a H& E staining for the mouse olfactory bulb downloaded from 22 . b Spatial domain detection c True positive and false positive rate for detecting the correct cell types within each domain. d Scatter plot for comparing the estimated proportion with the true proportion. Each dot represents proportion at a location, with a color depicting a cell type. The color code is consistent with the color assigned in e . A 45-degree line indicates the same value for true and estimated proportion. e Spatial scatter pie plot shows the estimated cell-type composition on each spatial location from different deconvolution methods, compared to the true distribution. Colors represent cell types. f Boxplot of performance metrics for 10 simulation replicates. The overall simulation results indicate that SPADE outperformed other methods, achieving the lowest mean Absolute Deviation (mAD), Root Mean Square Error (RMSE), and the highest R. Source data can be found in Supplementary Data  1 .

To further investigate the performance of SPADE, we calculated the mAD and RMSE for each cell type, with the stacked bar plot signifying the least deviation in proportions inferred by SPADE (Supplementary Fig.  3 a). Owing to the effective cell type selection of SPADE, its mean absolute deviation (Supplementary Fig.  3 b) and correlation (Supplementary Fig.  3 c) displayed superior outcomes across all cell types and domains in comparison to alternative methodologies. To visualize the estimation of dominant cell types, we employed a half violin plot (Supplementary Fig.  3 d). This plot indicates that the distribution of dominant cell type proportions estimated by SPADE is more closely aligned with the true proportions than those obtained from other methods. We have also considered adding different levels of noise when generating the synthetic data, and compared the performance of SPADE with other methods on noisy data. From Supplementary Fig.  4 , the results indicate that SPADE not only performs well under noisy conditions but also maintains its superior performance among all the compared methods.

In the second simulation study, we generated additional synthetic data from mouse kidney single-cell RNA-seq data 23 and obtained the mouse kidney spatial location and histology information from 10X Genomics to assess the robustness of our algorithm further. Specifically, we applied spaGCN and identified three spatial domains. SPADE was utilized to accurately retrieve the spatial pattern (Fig.  3 a, b) compared to other methods (Supplementary Fig.  5) by assigning the most precise proportions to the dominant cell type within each spatial domain, as evidenced by Fig.  3 c. To evaluate the accuracy of SPADE within each cell type across all locations, we compared the mAD and RMSE between true and inferred proportions to those obtained with other methods. Our analysis revealed that SPADE had the lowest error rate (Fig.  3 d). Additionally, we created a scatter plot to compare the estimated cell-type proportions against the true proportions and found that SPADE displayed a close alignment to the 45-degree line (Fig.  3 e). Furthermore, we assessed the ability of SPADE to accurately identify the correct cell types within each domain. Our analysis indicated the superior ability of SPADE to detect the correct cell types in spatial locations, as evidenced by the high true positive rate and low false positive rate (Fig.  3 f). Finally, we assessed the stability of SPADE’s estimation by repeating the simulation ten times. The consistently low deviations, as well as high correlation (Fig.  3 g), demonstrated that SPADE is a robust and accurate method for spatial deconvolution, superior to existing methods. To evaluate the ability of SPADE in handling noise data, we introduced varying levels of noise during the creation of synthetic data and compared its performance with other methods. The results, as shown in Supplementary Fig.  6 , reveal that SPADE not only copes well with noisy conditions but also continues to maintain low deviance and high correlation among all compared methods.

figure 3

a,b Scatter pie plot representing cell type proportions within each location. Each location is depicted by a pie plot showing the cell type composition denoted by distinct colors. c Violin-box plot displaying the distribution of the predicted proportions of the dominant cell type within each domain compared with true proportion. d Stacked barplot exhibiting the mean absolute deviation and root mean square deviation between the true and predicted proportions. e Scatter plot showing cell type proportions, where each dot represents proportion at a location and the color corresponds to the cell type. f True positive and false positive rates for cell type identification within each domain. g Boxplot of performance metrics for 10 simulation replicates. Source data can be found in Supplementary Data  2 .

Application of real data on developmental chicken heart

The heart is the first organ to develop during embryogenesis, and interactions among various cell populations play a pivotal role in driving cardiac fate decision. The heterogeneity of cell types in heart development poses a challenge to study by traditional methods. Therefore, it is important to explore varied techniques for prediction of cell type heterogeneity during heart development.

During early embryonic development, the heart initially forms as a simple tube and undergoes a series of intricate morphological changes, eventually developing into a fully functional four-chambered heart complete with the blood vessels. In their previous research, Mantri, M. et al. employed a combination of spatially resolved RNA sequencing and high-throughput single-cell RNA sequencing to investigate the spatial and temporal interactions as well as the regulatory mechanisms involved in the development of the embryonic chicken heart 24 . Their research employed chicken embryos to generate over 22,000 single-cell transcriptomes across four pivotal developmental stages, in addition to spatially resolved RNA-seq on 12 heart tissue sections at the same stages, encompassing approximately 700 to nearly 2,000 tissue locations. These stages comprised day 4, an early stage of chamber formation and the initiation of ventricular septation; day 7, when the four-chamber cardiac morphology is initiated; day 10, representing the mid-stage of four-chambered heart development; and day 14, denoting the late stage of four-chamber development.

The study of early embryonic development details the progression of anatomical development across multiple temporal points by way of H&E stained images, as presented in the Supplementary Fig.  7 a. Upon applying SPADE, spatial domains were defined for each timepoint, revealing the emergence of ventricular separation by day 4, as illustrated in Fig.  4 a. From day 7 onwards, the clustering of diverse chambers was readily discernible, as evidenced by Fig.  4 b, c, and d. The estimated cell type proportions for each chamber over the four temporal points are illustrated in the bar plot in Fig.  4 e, which indicates a preponderance of immature myocardial and fibroblast cells on day 4, with a decreasing trend as the heart matures. This pattern is further verified by the scatter pie plot, as presented in the Supplementary Fig.  7 b. This phenomenon is attributed to the tube-like structure of the chicken heart during early developmental stages, which necessitates the presence and active participation of fibroblast cells in the creation of connective tissue 25 . As the heart develops, fibroblast cells undergo proliferation and differentiation into various types of connective tissue cells. During later stages of development, the number of fibroblast cells in the heart declines, coinciding with its maturation and specialization. However, fibroblast cells continue to play a vital role in maintaining the heart’s structure and function throughout the chicken’s lifespan 26 , 27 , 28 , 29 . Conversely, the number of cardiomyocyte cells increases significantly during the development of the chicken heart, with the highest rate of proliferation occurring from day 4 to day 7, and slowing down from day 10 to day 14, as shown in Fig.  4 e.

figure 4

a – d Estimated spatial domains for various time points during the experiment: ( a ) Day 4, ( b ) Day 7, ( c ) Day 10, and ( d ) Day 14. Colors indicate different domains, with an increasing number of domains detected as time progresses. Specifically, 3 domains were detected on Day 4, while 5 domains were identified on Day 7 and beyond. e Predicted cell type proportions during heart development, with colors representing different cell types. f Comparison of cell type proportions between time points, using a two-sided Wilcoxon Rank Sum test to assess differences for pairs of cell types. Asterisks indicate the significance level. g , h Scatter plots displaying the spatial locations of four selected cell types on Day 4 and Day 10, respectively, with each location colored according to the cell type proportion. i , j Correlation plots for cell type colocalization on Day 4 and Day 14, respectively. The size of the dot indicated the magnitude of the absolute correlations. Source data can be found in Supplementary Data  3 .

The proliferation of cardiomyocytes is a pivotal process during embryonic heart development, leading to a significant increase in their numbers. Previous studies have demonstrated that the rate of cardiomyocyte proliferation is highest during early developmental stages and gradually decreases as the heart matures 28 , 30 , 31 , 32 , 33 . Our findings, obtained through the application of SPADE, support this notion. Specifically, we observed that immature myocardial cells constitute a subset of cardiomyocytes that are present only during days 4 to 7 of embryonic development (Fig.  4 e). These immature cells undergo differentiation to become mature cardiomyocytes, which is a crucial step for the proper contractile function of the heart 24 .

We aimed to investigate the trends in proportions of various cell types, including cardiomyocytes, vascular endothelial cells, fibroblasts, and endocardial cells. To determine whether the observed changes in proportions were statistically significant, we conducted a thorough analysis, comparing every pair of time points for each cell type using Fig.  4 f. Our results indicate that nearly all changes between any two days were statistically significant (from Wilcoxon test with p  < 0.05). Furthermore, we employed a spatial cell type map (Fig.  4 g, h) to visually represent the proportions of each cell type at Day 4 and Day 14. Results for Day 7 and Day 10 can be found in Supplementary Fig.  8 a. As expected, at Day 4, both cardiomyocytes and vascular endothelial cells exhibited relatively low proportions in Fig.  4 g, while at Day 14 (Fig.  4 h), their proportions increased significantly. These findings highlight the dynamic changes in cell type proportions over time, providing crucial insights into the development and function of the studied tissues.

The heart, a vital organ composed of various cell types, including cardiomyocytes, fibroblasts, endothelial cells, and smooth muscle cells, undergoes intricate cellular interactions and network formation during its developmental stages that are crucial for its proper functioning 34 , 35 , 36 , 37 . We utilized cellular colocalization analysis, a key technique in spatial transcriptomics, to quantitatively evaluate how different cell types are positioned and interact within tissue. This approach provides insights into the spatial dynamics of cellular environments, revealing potential interactions and functional relationships between cells. By analyzing the spatial organization and proximity of cell types, we aim to understand their roles in tissue function and development, and how they contribute to the overall tissue architecture and intercellular communication 35 , 36 . Our results revealed an increased cohesion between cell types, particularly between cardiomyocytes and vascular endothelial cells, in conjunction with heart development. This was supported by our results, as illustrated in Fig.  4 i, j for Day 4 and Day 14, respectively, which showed stronger spatial coherence of organization during development. The correlation plots for Day 7 and Day 10 are in the Supplementary Fig.  8 b. Collectively, our study highlights the significant variability in the spatial organization of cell types across different developmental stages and underscores the significance of dynamic interactions among various cell types for a comprehensive understanding of heart development as compared to the results from other methods (indicated in Supplementary Figs.  9 – 13) .

Application of real data on human breast cancer

Breast cancer is a complex disease that arises from the uncontrolled growth of malignant cells in the breast tissue, with varying molecular and cellular characteristics among individual patients. The Luminal subtype, which constitutes approximately 70% of all cases, is characterized by the expression of hormone receptors, namely estrogen receptor (ER) and progesterone receptor (PR) 38 . The combination of spatial transcriptomics and single-cell data is proving to be a valuable method for unraveling the complexities of human breast cancer 38 . This method maps gene expression and analyzes single-cell transcriptomes to identify cell types and their interactions in the tumor environment, crucial for understanding cancer progression and treatment effectiveness.

We retrieved the single-cell RNA-seq data as well as the spatial transcriptomics data of primary pre-treatment breast tumor samples from a human breast cancer study 39 . To create a reference for cell types in SPADE analysis, we utilized scRNA-Seq data comprising 9 distinct cell types from breast tumors. This reference was then employed to deconstruct a spatially mapped tumor sample. In the SPADE results (Fig.  5 a), a preponderance of cancer epithelial cells is evident, with plasmablasts as the subsequent most abundant cell type. A comprehensive examination of cellular composition across various spatial locations, depicted in Supplementary Fig.  14 , further corroborates the prevalence of cancer epithelial cells at the majority of these sites. In Luminal breast cancer, the development of malignancy typically stems from epithelial cells, which may undergo genetic mutations leading to uncontrolled growth and tumor formation. These malignant epithelial cells often express high levels of hormone receptors, which facilitate response to the growth-promoting effects of estrogen and progesterone 40 , 41 .

figure 5

a The estimated cell type proportions are shown, with different cell types represented by different colors. b Correlation for every pair of cell type proportions across the spatial location. c The cell type proportion for cancer epithelial, cancer-associated fibroblast (CAFs), and B cell is visualized in each location. d The marker gene expression levels for these three cell types are also displayed for each location respectively. Source data can be found in Supplementary Data  4 .

Plasmablasts are a type of immune cell that plays a crucial role in humoral immune response, responsible for antibody production and secretion. Recent evidence has shown that Luminal breast tumors with higher levels of infiltrating plasmablasts have a better prognosis compared to the tumors with lower levels of plasmablasts, suggesting a potential protective role of these cells in Luminal breast cancer 42 , 43 . We observed the colocalization of cancer epithelial cells and immune cells, such as plasmablasts, myeloid cells, and T/B cells, in the tumor microenvironment (Fig.  5 b). We noted strong negative correlations between cancer epithelial cells and immune cells in these areas. The presence of tumor-infiltrating lymphocytes (TILs) is an important aspect of cancer epithelial cell and immune cell colocalization. TILs are immune cells that migrate into the tumor microenvironment and are believed to play a crucial role in anti-tumor immunity 42 . In several cancer types, including breast cancer, the presence of TILs has been linked to improved outcomes 43 , 44 , 45 .Furthermore, We investigated the cell type proportion within each location for cancer epithelial cells, cancer-associated fibroblast cells (CAFs), and B cell (Fig.  5 c), along with their associated marker genes EPCAM (Epithelial), FAP(CAFs) and CD55 (B cell) (Fig.  5 d). The spatial distribution of cell types corresponded with their marker gene expression, confirming the cell types inferred by SPADE. The results displayed a similar pattern to those of CARD and RCTD, as shown in Supplementary Fig.  15 .

Application of real data on mouse visual cortex

The mouse brain, with its millions of neurons, is an ideal model for studying mammalian brain structure and function, especially in the visual cortex. This region, crucial for processing visual information, is organized into layers, each with specialized cell types, making it a good model for human visual cognition research 46 , 47 , 48 , 49 . The visual cortex hosts various neuron types, including excitatory neurons using glutamate and inhibitory neurons using GABA, forming a network for interpreting visual stimuli 50 , 51 . Each neuron type plays a specific role in visual processing, from detecting visual features to integrating complex visual information 52 , 53 , 54 . Understanding these functions and their disruptions can provide insights into neurological and psychiatric disorders 55

We implemented a single-cell analysis 56 to identify 30 distinct cell types in the mouse visual cortex. This analysis was used to deconstruct the adult mouse brain, which had undergone spatial processing (see Fig.  6 a). Initially, we divided the mouse brain into 19 different regions (illustrated in Fig.  6 b). In these regions, we were able to identify specific layers that correlate with various brain functions. Compared to the other methods (results are in Supplementary Fig.  16) , SPADE successfully decomposed each brain region into its constituent cell types. The predominant cell type in each location is shown in Fig.  6 c. Our focus was particularly on the visual cortex, where we found that most areas were primarily composed of excitatory neurons, followed by inhibitory neurons and oligodendrocytes, as detailed in Fig.  6 d. Excitatory neurons, which utilize the neurotransmitter glutamate to typically enhance neuronal activity, are an integral part of the mouse visual cortex, as well as all mammalian brains. These neurons have a central role in transmitting and processing visual data. They are found in all layers of the mouse visual cortex from the deeper layers (layers 5 and 6) to the superficial layers (layers 2 and 3) (Fig.  6 e). The genes expressed differently (Fig.  6 f) for each subtypes of excitatory neuron further confirmed the corresponding multiple-layer structures.

figure 6

a Original image of Adult Mouse Brain (Coronal) downloaded from 10x Visium. b Detected spatial domain. Colors represent different domains c SPADE inferred the dominant cell type at each location. d Estimated cell type in the mouse visual cortex. Each location is indicated by a composition of several cell types. e 4 subtypes of the excitatory neurons at the mouse visual cortex. f Genes displayed differences in expression within each excitatory neuronal cell subtype at the mouse visual cortex. Source data can be found in Supplementary Data  5 .

Spatial transcriptomics, essential for studying gene expression and tissue diversity, is more informative when combined with cell type deconvolution. This computational method identifies cell types from gene expression data, enhancing our understanding of biological processes at the cellular level within tissues. Spatial transcriptomics is a critical tool for investigating gene expression patterns and regional differences in a tissue, providing insight into its biological significance. However, the interpretation of this data can be challenging without knowledge of the specific cell types present in each region. Cell type deconvolution is a computational approach that can identify cell types based on gene expression data. By applying this technique to spatial transcriptomics data, it becomes possible to contextualize gene expression data and gain a deeper understanding of the biological processes occurring within a tissue at the cellular level. While many existing cell type deconvolution methods do not account for the spatial domain structure, SPADE has been developed to overcome this limitation. Our method stands out by integrating spatial structures and using a reliable approach for cell type selection. Differing from other techniques, it employs lasso regression and adaptive thresholding for more accurate and flexible cell type identification. This effectiveness is evident in our results, notably in Fig.  2 c, enhancing SPADE’s robustness and precision in complex spatial transcriptomics datasets.

The SPADE algorithm effectively predicts cell types and their distribution in tissues, as shown in tests on synthetic mouse datasets. Applied to chicken heart development, human breast cancer, and mouse visual cortex, SPADE revealed insights into cell type development and spatial patterns in diseases. This has promising implications for clinical studies, especially in understanding cancer cell type heterogeneity and informing treatment strategies.

Although the SPADE algorithm has demonstrated superior accuracy, one of its notable challenges is the accurate deconvolution of rare cell types. Our analysis, especially with rare cell types like the Immature cells in the mouse olfactory bulb data, indicated a tendency for underestimation, a limitation common to current deconvolution methods. This underestimation issue is critical to address in order to improve SPADE’s robustness and applicability, particularly in complex biological tissues where rare cell types are are crucial for functional significance or disease state.

Moreover, it’s important to note that SPADE’s performance can be further improved by incorporating better-designed reference datasets. In our study, we utilized one single scRNA-seq dataset to construct the reference, potentially limiting the algorithm’s overall efficacy. Advances in scRNA-seq technologies have led to the generation of multiple reference datasets from different platforms or samples obtained from the same tissues. Integration of these diverse scRNA-seq datasets holds the potential to provide a comprehensive and accurate reference set, thereby improving the performance of the SPADE algorithm.

It should also be mentioned that SPADE, while not the most efficient in processing time and memory usage, involves a meticulous process of identifying cell types within each domain before estimating proportions. This methodological aspect, though extending processing time, substantially enhances the accuracy and robustness of our analyses, particularly for complex datasets. This balance between processing efficiency and analytical precision is a key consideration, making SPADE a valuable tool for in-depth spatial gene expression studies. Continuous methodological improvements are necessary. Future studies should explore the use of multiple reference datasets to improve the accuracy and efficacy of SPADE in predicting cell types and their spatial distribution across different tissues.

Spatial domain detection

Spatial domain detection constitutes a critical aspect of spatial transcriptomics, as evidenced by numerous studies 19 , 57 , 58 . A spatial domain encompasses regions that demonstrate spatial coherence in both gene expression and histology. Traditional approaches for identifying these domains are dependent on clustering algorithms that solely consider gene expression, neglecting spatial information and histology 19 . To address this limitation, spaGCN 19 incorporates gene expression, spatial location, and histology to construct a graph convolutional network, facilitating the identification of spatial domains. The spaGCN algorithm unfolds in three stages. Initially, information derived from physical location and histology is employed to establish an undirected graph, reflecting the relationships between all spots. Subsequently, a graph convolutional network is implemented to integrate gene expression, spatial location and histological data. Finally, an iterative unsupervised clustering algorithm is applied to segregate spots into distinct spatial domains based on gene expression and histology coherence. Importantly, spaGCN can be applied to the datasets where histology images are absent. In these situations, it makes use of spatial gene expression data to identify the spatial domains which is comparable to methods used in other spatial domain detection approaches. For a comprehensive understanding, refer to the original publication 19 .

Determine the number of cell types for each domain

A crucial disparity between bulk deconvolution and spatial deconvolution is that not all cell types are uniformly distributed across all regions. Consequently, identifying the presence of specific cell types in individual locations is crucial for efficacious cell type deconvolution. A key assumption underlying this approach is that while different locations within the same domain are closely related, they may not have exactly the same cell types. Instead, each location is thought to contain a similar set of cell types, but the proportions of these cell types can vary from one location to another. To tackle this issue, we leverage a Lasso-regularized generalized linear model 20 , which offers the advantages of concurrent feature selection and regularization, enforcement of sparsity, computational efficiency, resistance to multicollinearity, and broad applicability across diverse domains. Employing Lasso, cell types are selected for each domain through the subsequent methodology:

where y i is the gene expression for gene i , x i j is the gene i expression for cell type j , β j is the coefficient for cell type j . To perform cell type selection, we estimate cell type coefficients, effectively eliminating a cell type from a given location if its coefficient shrinks to 0. The tuning parameter, λ , is chosen via 10-fold cross-validation.

Upon obtaining the cell-type-associated coefficient matrix for each location within the spatial domains, we transform it into a binary matrix, where each entry holds a value of either 1 or 0. To achieve this, we employ an adaptive thresholding technique 59 that utilizes a 2D convolution with the Fast Fourier Transform (FFT) to filter the coefficient matrix, thus enabling the efficient identification of entries surpassing a specific threshold. In particular, if a coefficient exceeds the filtered value, the corresponding entry is set to 1, whereas entries falling below the threshold are assigned a value of 0. A comprehensive description of these steps can be found in Supplementary Fig.  17 .

Cell type proportion estimation for each location within each domain

The deconvolution problem can be solved to find the optimal estimation for cell type proportion that minimize the difference between estimated spatial gene expression and observed spatial gene expression for each location as below:

subject to p j ≥ 0 and ∑ j ∈ S p j  = 1 where y i is the expression for gene i ( = 1… M ). x i j is the expression for gene i for cell type j that is extracted from single-cell reference. p j is the proportion for cell type j . S is the set of cell types determined for each domain. Here we select the absolute deviation loss as the optimal choice due to its less sensitive to the extreme values than the commonly used quadratic loss function. The optimization problem is solved using the Augmented Lagrange Minimization algorithm that is implemented by auglag function in R package alabama 60 . Due to the unique feature of proportion, we not only minimize the nonlinear objective function, but also satisfy two constrains. The proportion for each cell type has to be nonnegative, and the sum of all cell type within each sample needs to be 1.

Construct reference

The accurate estimation of cell types is essential for understanding tissue function and identifying cell type specific features. A well-designed cell type reference is crucial for this purpose, and in this study, we utilize single-cell RNA-seq data that contains tissue or samples with a similar phenotype to the spatial transcriptomics data. The scRNA-seq data were first checked for quality based on the commonly used pre-processing workflow from Seurat 61 .

To extract cell type information, we followed the main idea from MuSiC 62 and applied several steps. Firstly, we calculated the cross-cell variation for each gene of each cell type within an individual sample, taking into account cell type and sample-specific library size. To achieve this, we subset the expression data by removing redundant cell type annotations given by the original single-cell study and by removing genes with zero counts. For each sample within each cell type, we scaled the gene expression by their library size, which is calculated by summing all gene counts for each cell. Next, we filtered genes by three criteria to keep genes that satisfy any of these criteria: 1) the genes shared between single-cell data and bulk data, 2) commonly used cell type biomarkers or highly cited markers, and 3) differentially expressed genes (DEGs) by comparing each pair of cell types. To detect DEGs, we used the FindAllMarkers function from Seurat. The resulting table is a gene by cell type expression matrix that can be implemented in the cell type deconvolution model. For a more in-depth step of reference construction, please refer to the flowchart depicted in the Supplementary Fig.  18 for more details.

Statistics and reproducibility

All single-cell and spatial transcriptomics data used for simulation and real datasets are publicly available. The codes for other methods are also publicly accessible; we adhered to the online tutorials for running each method. For our method, we have developed an R package that enables the reproduction of our results. The R package and tutorial for implement SPADE is freely available on GitHub ( https://github.com/anlingUA/SPADE ).

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The datasets utilized in this study are publicly available. The spatial MOB, mouse kidney, and mouse brain datasets were obtained from the 10x Visium dataset, which can be accessed at https://www.10xgenomics.com/resources/datasets . The single cell RNA-seq data for the MOB and mouse kidney samples are available through the GEO Series accession numbers GSE162654 and GSE107585, respectively. The spatial transcriptomics data for the developmental chicken heart were downloaded from https://github.com/madhavmantri/chicken_heart/tree/master/data , while the corresponding single cell data can be accessed via the GEO Series accession number GSE149457. The human breast cancer spatial transcriptomic data is available from the Zenodo data repository ( https://doi.org/10.5281/zenodo.4739739 ), and the single cell data can be obtained via the GEO Series accession number GSE176078. Finally, the single cell data for the mouse visual cortex can be accessed through GSE102827.

Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19 , 534–546 (2022).

Article   CAS   PubMed   Google Scholar  

Walker, B. L., Cang, Z., Ren, H., Bourgain-Chang, E. & Nie, Q. Deciphering tissue structure and function using spatial transcriptomics. Commun. Biol. 5 , 220 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Close, J. L., Long, B. R. & Zeng, H. Spatially resolved transcriptomics in neuroscience. Nat. Methods 18 , 23–25 (2021).

Roth, R., Kim, S., Kim, J. & Rhee, S. Single-cell and spatial transcriptomics approaches of cardiovascular development and disease. BMB Rep. 53 , 393–399 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Yu, Q., Jiang, M. & Wu, L. Spatial transcriptomics technology in cancer research. Front. Oncol. 12 , 1019111 (2022).

Hu, B., Sajid, M., Lv, R., Liu, L. & Sun, C. A review of spatial profiling technologies for characterizing the tumor microenvironment in immuno-oncology. Front. Immunol. 13 , 996721 (2022).

Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature 596 , 211–220 (2021).

Williams, C. G., Lee, H. J., Asatsuma, T., Vento-Tormo, R. & Haque, A. An introduction to spatial transcriptomics for biomedical research. Genome Med. 14 , 68 (2022).

Longo, S. K., Guo, M. G., Ji, A. L. & Khavari, P. A. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat. Rev. Genet. 22 , 627–644 (2021).

Kleino, I., Frolovaitė, P., Suomi, T. & Laura, L. E. Computational solutions for spatial transcriptomics. Comput. Struc. Biotechnol. J. 20 , 4870–4884 (2022).

Article   CAS   Google Scholar  

Chen, J. et al. A comprehensive comparison on cell-type composition inference for spatial transcriptomics data. Brief. Bioinform. 23 , bbac245 (2022).

Ma, Y. & Zhou, X. Spatially informed cell-type deconvolution for spatial transcriptomics. Nat. Biotechnol. 40 , 1349–1359 (2022).

Lu, Y., Chen, Q. M. & An, L. Semi-reference based cell type deconvolution with application to human metastatic cancers. NAR Genom. Bioinformatics 5 , 4 (2023).

Google Scholar  

Elosua-Bayes, M., Nieto, P., Mereu, E., Gut, I. & Heyn, H. Spotlight: seeded nmf regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res. 49 , e50–e50 (2021).

Dong, R. & Yuan, G.-C. Spatialdwls: accurate deconvolution of spatial transcriptomic data. Genome Biol. 22 , 145 (2021).

Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40 , 517–526 (2022).

Danaher, P. et al. Advances in mixed cell deconvolution enable quantification of cell types in spatial transcriptomic data. Nat. Commun. 13 , 385 (2022).

Tsoucas, D. et al. Accurate estimation of cell-type composition from gene expression data. Nat. Commun. 10 , 2975 (2019).

Hu, J. et al. Spagcn: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18 , 1342–1351 (2021).

Article   PubMed   Google Scholar  

Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58 , 267–288 (1996).

Tepe, B. et al. Single-cell rna-seq of mouse olfactory bulb reveals cellular heterogeneity and activity-dependent molecular census of adult-born neurons. Cell Rep. 25 , 2689–2703.e3 (2018).

10x Genomics. Adult mouse olfactory bulb: Spatial gene expression dataset by space ranger 2.0.0. https://support.10xgenomics.com (2022).

Park, J. et al. Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. Science 360 , 758–763 (2018).

Mantri, M. et al. Spatiotemporal single-cell rna sequencing of developing chicken hearts identifies interplay between cellular differentiation and morphogenesis. Cell Rep. 12 , 1771 (2021).

CAS   Google Scholar  

Wittig, J. G. & Münsterberg, A. The chicken as a model organism to study heart development. Cold Spring Harb. Perspect. Biol. 12 , a037218 (2020).

Choy, M., Oltjen, S., Ratcliff, D., Armstrong, M. & Armstrong, P. Fibroblast behavior in the embryonic chick heart. Dev. Dyn. Off. Publ. Am. Assoc. Anat. 198 , 97–107 (1993).

Consigli, S. A. & Joseph-Silverstein, J. Immunolocalization of basic fibroblast growth factor during chicken cardiac development. J. Cell. Physiol. 146 , 379–385 (1991).

Tallquist, M. D. Developmental pathways of cardiac fibroblasts. Cold Spring Harb. Perspect. Biol. 12 , a037184 (2020).

Ivey, M. J. & Tallquist, M. D. Defining the cardiac fibroblast. Circ. J. Off. J. Jpn. Circ. Soc. 80 , 2269–2276 (2016).

Guo, Y. & Pu, W. T. Cardiomyocyte maturation. Circ. Res. 126 , 1086–1106 (2020).

Soufan, A. T. et al. Regionalized sequence of myocardial cell growth and proliferation characterizes early chamber formation. Circ. Res. 99 , 545–552 (2006).

Evans-Anderson, H. J., Alfieri, C. M. & Yutzey, K. E. Regulation of cardiomyocyte proliferation and myocardial growth during development by foxo transcription factors. Circ. Res. 102 , 686–694 (2008).

Günthel, M., Barnett, P. & Christoffels, V. M. Development, proliferation, and growth of the mammalian heart. Mol. Ther. 26 , 1599–1609 (2018).

Litviňuková, M. et al. Cells of the adult human heart. Nature 588 , 466–472 (2020).

Ieda, M. Heart development and regeneration via cellular interaction and reprogramming. Keio J. Med. 62 , 99–106 (2013).

Tirziu, D., Giordano, F. J. & Simons, M. Cell communications in the heart. Circulation 122 , 928–937 (2010).

Wittig, J. G. & Münsterberg, A. The early stages of heart development: Insights from chicken embryos. J. Cardiovasc. Dev. Dis . 3 , 12 (2016).

Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53 , 1334–1347 (2021).

Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. [Data set]. Zenodo https://zenodo.org/records/4739739 (2021).

Hinck, L. & Näthke, I. Changes in cell and tissue organization in cancer of the breast and colon. Curr. Opin. Cell Biol. 26 , 87–95 (2013).

Scabia, V. et al. Estrogen receptor positive breast cancers have patient specific hormone sensitivities and rely on progesterone receptor. Nat. Commun. 13 , 3127 (2022).

Nelson, M., Ngamcherdtrakul, W., Luoh, S. & Yantasee, W. Prognostic and therapeutic role of tumor-infiltrating lymphocyte subtypes in breast cancer. Cancer Metastasis Rev. 40 , 519–536 (2021).

Garaud, S. et al. Tumor-infiltrating b cells signal functional humoral immune responses in breast cancer. JCI Insight 4 , e129641 (2019).

Article   PubMed Central   Google Scholar  

Paijens, S., Vledder, A., de Bruyn, M. & Nijman, H. Tumor-infiltrating lymphocytes in the immunotherapy era. Cell Mol. Immunol. 18 , 842–859 (2021).

Li, F. et al. The association between CD8+ tumor-infiltrating lymphocytes and the clinical outcome of cancer immunotherapy: A systematic review and meta-analysis. eClin. Med. 41 , 101134 (2021).

Meechan, D. et al. Modeling a model: Mouse genetics, 22q11.2 deletion syndrome, and disorders of cortical circuit development. Prog. Neurobiol. 130 , 1–28 (2015).

Pessoa, L. & Adolphs, R. Emotion processing and the amygdala: from a ‘low road’ to ‘many roads’ of evaluating biological significance. Nat. Rev. Neurosci. 11 , 773–783 (2010).

Preuss, T. Taking the measure of diversity: comparative alternatives to the model-animal paradigm in cortical neuroscience. Brain Behav. Evol. 55 , 287–299 (2000).

Espinosa, J. & Stryker, M. Development and plasticity of the primary visual cortex. Neuron 75 , 230–49 (2012).

Fee, C., Banasr, M. & Sibille, E. Somatostatin-positive gamma-aminobutyric acid interneuron deficits in depression: Cortical microcircuit and therapeutic perspectives. Biol. Psychiatry 82 , 549–559 (2017).

Yuste, R. & Katz, L. Control of postsynaptic ca2+ influx in developing neocortex by excitatory and inhibitory neurotransmitters. Neuron 6 , 333–344 (1991).

Lamme, V., Supèr, H., Landman, R., Roelfsema, P. & Spekreijse, H. The role of primary visual cortex (v1) in visual awareness. Vision Res. 40 , 1507–1521 (2000).

Epstein, R. The cortical basis of visual scene processing. Visual Cogn. 12 , 954–978 (2005).

Article   Google Scholar  

Pellicano, E., Gibson, L., Maybery, M., Durkin, K. & Badcock, D. Abnormal global processing along the dorsal visual pathway in autism: a possible mechanism for weak visuospatial coherence? Neuropsychologia 43 , 1044–1053 (2005).

Siddiqi, S., Kording, K., Parvizi, J. & Fox, M. Causal mapping of human brain function. Nat. Rev. Neurosci. 23 , 361–375 (2022).

Hrvatin, S. et al. Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat Neurosci. 21 , 120–129 (2018).

Li, Z. & Zhou, X. Bass: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol. 23 , 168 (2022).

Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13 , 1739 (2022).

Pau, G., Fuchs, F., Sklyar, O., Boutros, M. & Huber, W. Ebimage–an r package for image processing with applications to cellular phenotypes. Bioinformatics 26 , 979–981 (2010).

Varadhan, R. alabama: Constrained nonlinear optimization https://CRAN.R-project.org/package=alabama (2022). R package version 2022.4-1.

Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33 , 495–502 (2015).

Wang, X., Park, J., Susztak, K., Zhang, N. R. & Li, M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 10 , 380 (2019).

Download references

Acknowledgements

This research was partially supported by the National Institute of Health R01 GM125212, R01 GM126165, and Holsclaw endowment (Q.M.C.); R01 GM139829, P01 AI148104-01A1, and United States Department of Agriculture (ARZT-1361620-H22-149) (L.A.).

Author information

Authors and affiliations.

Interdisciplinary Program in Statistics and Data Science, University of Arizona, Tucson, AZ, 85721, USA

Yingying Lu & Lingling An

College of Pharmacy, University of Arizona, Tucson, AZ, 85721, USA

Qin M. Chen

Department of Biosystems Engineering, University of Arizona, Tucson, AZ, 85721, USA

Lingling An

Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ, 85721, USA

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, L.A. and Y.L.; methodology, Y.L. and L.A.; simulation studies, Y.L. and L.A.; real data analysis, Y.L., Q.M.C., and L.A.; writing and revising the manuscript, Y.L., Q.M.C. and L.A.

Corresponding author

Correspondence to Lingling An .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Communications Biology thanks Krishan Gupta and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Debarka Sengupta, Gene Chong and Christina Karlsson Rosenthal. A  peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review file, supplementary information, description of supplementary materials, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Lu, Y., Chen, Q.M. & An, L. SPADE: spatial deconvolution for domain specific cell-type estimation. Commun Biol 7 , 469 (2024). https://doi.org/10.1038/s42003-024-06172-y

Download citation

Received : 06 June 2023

Accepted : 10 April 2024

Published : 17 April 2024

DOI : https://doi.org/10.1038/s42003-024-06172-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

data type in research method

Analysis of the impact of terrain factors and data fusion methods on uncertainty in intelligent landslide detection

  • Original Paper
  • Published: 16 April 2024

Cite this article

  • Rui Zhang 1 ,
  • Jichao Lv   ORCID: orcid.org/0000-0003-2082-945X 1 ,
  • Yunjie Yang 1 ,
  • Tianyu Wang 1 &
  • Guoxiang Liu 1  

63 Accesses

Explore all metrics

Current research on deep learning-based intelligent landslide detection modeling has focused primarily on improving and innovating model structures. However, the impact of terrain factors and data fusion methods on the prediction accuracy of models remains underexplored. To clarify the contribution of terrain information to landslide detection modeling, 1022 landslide samples compiled from Planet remote sensing images and DEM data in the Sichuan–Tibet area. We investigate the impact of digital elevation models (DEMs), remote sensing image fusion, and feature fusion techniques on the landslide prediction accuracy of models. First, we analyze the role of DEM data in landslide modeling using models such as Fast_SCNN, the SegFormer, and the Swin Transformer. Next, we use a dual-branch network for feature fusion to assess different data fusion methods. We then conduct both quantitative and qualitative analyses of the modeling uncertainty, including examining the validation set accuracy, test set confusion matrices, prediction probability distributions, segmentation results, and Grad-CAM results. The findings indicate the following: (1) model predictions become more reliable when fusing DEM data with remote sensing images, enhancing the robustness of intelligent landslide detection modeling; (2) the results obtained through dual-branch network data feature fusion lead to slightly greater accuracy than those from data channel fusion; and (3) under consistent data conditions, deep convolutional neural network models and attention mechanism models show comparable capabilities in predicting landslides. These research outcomes provide valuable references and insights for deep learning-based intelligent landslide detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

data type in research method

Similar content being viewed by others

data type in research method

Landslide detection based on shipborne images and deep learning models: a case study in the Three Gorges Reservoir Area in China

Yi Li, Ping Wang, … Jianhua Gong

data type in research method

An Efficient U-Net Model for Improved Landslide Detection from Satellite Images

Naveen Chandra, Suraj Sawant & Himadri Vaidya

data type in research method

A comprehensive transferability evaluation of U-Net and ResU-Net for landslide detection from Sentinel-2 data (case study areas from Taiwan, China, and Japan)

Omid Ghorbanzadeh, Alessandro Crivellari, … Thomas Blaschke

Amankwah SOY, Wang G, Gnyawali K, Hagan DFT, Sarfo I, Zhen D, Nooni IK, Ullah W, Duan Z (2022) Landslide detection from bitemporal satellite imagery using attention-based deep neural networks. Landslides 19:2459–2471. https://doi.org/10.1007/s10346-022-01915-6

Article   Google Scholar  

Catani F (2021) Landslide detection by deep learning of non-nadiral and crowdsourced optical images. Landslides 18:1025–1044. https://doi.org/10.1007/s10346-020-01513-4

Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale

Dou J, Yunus AP, Bui DT, Merghadi A, Sahana M, Zhu Z, Chen C-W, Han Z, Pham BT (2020) Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan. Landslides 17:641–658. https://doi.org/10.1007/s10346-019-01286-5

Đurić D, Mladenović A, Pešić-Georgiadis M, Marjanović M, Abolmasov B (2017) Using multiresolution and multitemporal satellite data for post-disaster landslide inventory in the Republic of Serbia. Landslides 14:1467–1482. https://doi.org/10.1007/s10346-017-0847-2

Fan X, Scaringi G, Xu Q, Zhan W, Dai L, Li Y, Pei X, Yang Q, Huang R (2018) Coseismic landslides triggered by the 8th August 2017 Ms 7.0 Jiuzhaigou earthquake (Sichuan, China): factors controlling their spatial distribution and implications for the seismogenic blind fault identification. Landslides 15:967–983. https://doi.org/10.1007/s10346-018-0960-x

Fang Z, Wang Y, Peng L, Hong H (2020) Integration of convolutional neural network and conventional machine learning classifiers for landslide susceptibility mapping. Comput Geosci 139:104470. https://doi.org/10.1016/j.cageo.2020.104470

Ghorbanzadeh O, Xu Y, Ghamisi P, Kopp M, Kreil D (2022) Landslide4Sense: reference benchmark data and deep learning models for landslide detection. IEEE Trans Geosci Remote Sensing 60:1–17. https://doi.org/10.1109/TGRS.2022.3215209

Haque U, Da Silva PF, Devoli G, Pilz J, Zhao B, Khaloua A, Wilopo W, Andersen P, Lu P, Lee J, Yamamoto T, Keellings D, Wu J-H, Glass GE (2019) The human cost of global warming: deadly landslides and their triggers (1995–2014). Sci Total Environ 682:673–684. https://doi.org/10.1016/j.scitotenv.2019.03.415

Article   CAS   Google Scholar  

He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition

Ji S, Yu D, Shen C, Li W, Xu Q (2020) Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 17:1337–1352. https://doi.org/10.1007/s10346-020-01353-2

Lei T, Zhang Y, Lv Z, Li S, Liu S, Nandi AK (2019) Landslide inventory mapping from bitemporal images using deep convolutional neural networks. IEEE Geosci Remote Sensing Lett 16:982–986. https://doi.org/10.1109/LGRS.2018.2889307

Li D, Tang X, Tu Z, Fang C, Ju Y (2023a) Automatic detection of forested landslides: a case study in Jiuzhaigou County. China Remote Sensing 15:3850. https://doi.org/10.3390/rs15153850

Li W, Fu Y, Fan S, Xin M, Bai H (2023b) DCI-PGCN: dual-channel interaction portable graph convolutional network for landslide detection. IEEE Trans Geosci Remote Sensing 61:1–16. https://doi.org/10.1109/TGRS.2023.3273623

Liu X, Peng Y, Lu Z, Li W, Yu J, Ge D, Xiang W (2023) Feature-fusion segmentation network for landslide detection using high-resolution remote sensing images and digital elevation model data. IEEE Trans Geosci Remote Sensing 61:1–14. https://doi.org/10.1109/TGRS.2022.3233637

Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows

Lu P, Qin Y, Li Z, Mondini AC, Casagli N (2019) Landslide mapping from multi-sensor data through improved change detection-based Markov random field. Remote Sens Environ 231:111235. https://doi.org/10.1016/j.rse.2019.111235

Lu W, Hu Y, Zhang Z, Cao W (2023) A dual-encoder U-Net for landslide detection using Sentinel-2 and DEM data. Landslides 20(9):1975–1987

Poudel RPK, Liwicki S, Cipolla R (2019) Fast-SCNN: fast semantic segmentation network

Sangelantoni L, Gioia E, Marincioni F (2018) Impact of climate change on landslides frequency: the Esino river basin case study (Central Italy). Nat Hazards 93:849–884. https://doi.org/10.1007/s11069-018-3328-6

Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization, in: 2017 IEEE International Conference on Computer Vision (ICCV). Presented at the 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, pp. 618–626. https://doi.org/10.1109/ICCV.2017.74

Soares LP, Dias HC, Grohmann CH (2020) Landslide segmentation with U-Net: evaluating different sampling methods and patch sizes.

Su Z, Chow JK, Tan PS, Wu J, Ho YK, Wang Y-H (2021) Deep convolutional neural network–based pixel-wise landslide inventory mapping. Landslides 18:1421–1443. https://doi.org/10.1007/s10346-020-01557-6

Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: simple and efficient design for semantic segmentation with transformers

Xu Q, Ouyang C, Jiang T, Yuan X, Fan X, Cheng D (2022) MFFENet and ADANet: a robust deep transfer learning method and its application in high precision and fast cross-scene recognition of earthquake-induced landslides. Landslides 19:1617–1647. https://doi.org/10.1007/s10346-022-01847-1

Yang Z, Xu C, Li L (2022) Landslide detection based on ResU-Net with transformer and CBAM embedded: two examples with geologically different environments. Remote Sensing 14:2885. https://doi.org/10.3390/rs14122885

Yu B, Chen F, Xu C, Wang L, Wang N (2021) Matrix SegNet: a practical deep learning framework for landslide mapping from images of different areas with different spatial resolutions. Remote Sensing 13:3158. https://doi.org/10.3390/rs13163158

Zeng T, Glade T, Xie Y, Yin K, Peduto D (2023a) Deep learning powered long-term warning systems for reservoir landslides. International Journal of Disaster Risk Reduction 94:103820. https://doi.org/10.1016/j.ijdrr.2023.103820

Zeng T, Gong Q, Wu L, Zhu Y, Yin K, Peduto D (2023b) Double-index rainfall warning and probabilistic physically based model for fast-moving landslide hazard analysis in subtropical-typhoon area. Landslides. https://doi.org/10.1007/s10346-023-02187-4

Zeng T, Wu L, Peduto D, Glade T, Hayakawa YS, Yin K (2023c) Ensemble learning framework for landslide susceptibility mapping: different basic classifier and ensemble strategy. Geosci Front 14:101645. https://doi.org/10.1016/j.gsf.2023.101645

Zeng T, Jin B, Glade T, Xie Y, Li Y, Zhu Y, Yin K (2024a) Assessing the imperative of conditioning factor grading in machine learning-based landslide susceptibility modeling: a critical inquiry. CATENA 236:107732. https://doi.org/10.1016/j.catena.2023.107732

Zeng T, Wu L, Hayakawa YS, Yin K, Gui L, Jin B, Guo Z, Peduto D (2024b) Advanced integration of ensemble learning and MT-InSAR for enhanced slow-moving landslide susceptibility zoning. Eng Geol 331:107436. https://doi.org/10.1016/j.enggeo.2024.107436

Zhang X, Yu W, Pun M-O, Shi W (2023) Cross-domain landslide mapping from large-scale remote sensing images using prototype-guided domain-aware progressive representation learning. ISPRS J Photogramm Remote Sens 197:1–17. https://doi.org/10.1016/j.isprsjprs.2023.01.018

Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2015) Learning deep features for discriminative localization

Zhou Y, Xu H, Zhang W, Gao B, Heng PA (2021) C 3 -SemiSeg: contrastive semi-supervised segmentation via cross-set learning and dynamic class-balancing, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Presented at the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Montreal, QC, Canada, pp. 7016–7025. https://doi.org/10.1109/ICCV48922.2021.00695

Download references

Acknowledgements

We wish to express our gratitude to Planet for providing high-resolution remote-sensing imagery.

This research was jointly funded by the National Key Research and Development Program of China (Grant No. 2023YFB2604001) and the National Natural Science Foundation of China (Grant Nos. 42371460, U22A20565, and 42171355).

Author information

Authors and affiliations.

Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu, 611756, Sichuan, China

Rui Zhang, Jichao Lv, Yunjie Yang, Tianyu Wang & Guoxiang Liu

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jichao Lv .

Ethics declarations

Ethics approval.

Not applicable to studies not involving humans or animals.

Informed consent

Not applicable to studies not involving humans.

Competing interests

The authors declare no competing interests.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Zhang, R., Lv, J., Yang, Y. et al. Analysis of the impact of terrain factors and data fusion methods on uncertainty in intelligent landslide detection. Landslides (2024). https://doi.org/10.1007/s10346-024-02260-6

Download citation

Received : 11 January 2024

Accepted : 05 April 2024

Published : 16 April 2024

DOI : https://doi.org/10.1007/s10346-024-02260-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Landslide detection
  • Terrain factors
  • Data fusion
  • Deep learning
  • Find a journal
  • Publish with us
  • Track your research

National Science Foundation - Where Discoveries Begin

  • Biological Sciences (BIO)
  • Computer and Information Science and Engineering (CISE)
  • Education and Human Resources (EHR)
  • Engineering (ENG)
  • Environmental Research and Education (ERE)
  • Geosciences (GEO)
  • Integrative Activities (OIA)
  • International Science and Engineering (OISE)
  • Mathematical and Physical Sciences (MPS)

Social, Behavioral and Economic Sciences (SBE)

  • Technology, Innovation and Partnerships (TIP)
  • Related Links
  • Interdisciplinary Research
  • NSF Organization List
  • Responsible and Ethical Conduct of Research
  • Staff Directory
  • Understanding NSF Research
  • About Funding
  • Archived Funding Search
  • Find Funding
  • Merit Review
  • Policies and Procedures
  • Preparing Proposals
  • Recent Opportunities
  • Transformative Research
  • Proposal and Award Policies and Procedures Guide (PAPPG)
  • Research.gov
  • Funding Opportunities For
  • Graduate Students
  • K-12 Educators
  • Postdoctoral Fellows
  • Undergraduate Students
  • Small Business
  • About Awards
  • Award Statistics (Budget Internet Info System)
  • Award Conditions
  • Managing Awards
  • Presidential and Honorary Awards
  • Search Awards
  • Public Access Initiative
  • All Documents
  • National Center for Science and Engineering Statistics (NCSES)
  • Obtaining Documents
  • Search Documents
  • For News Media
  • Multimedia Gallery
  • News Archive
  • Search News
  • Special Reports
  • Speeches and Lectures
  • About NSF Logo
  • Broadening Participation/Diversity
  • Budget and Performance
  • Career Opportunities
  • Contracting Opportunities
  • National Science Board (NSB)
  • NSF and Congress
  • NSF Toolkit
  • Office of Equity and Civil Rights
  • Organization List
  • Remote Participant Support
  • Transparency and Accountability
  • Social, Behavioral and Economic Sciences (SBE) Home
  • Behavioral and Cognitive Sciences (BCS)
  • About NCSES
  • Schedule of Release Dates
  • Corrections
  • Social and Economic Sciences (SES)
  • SBE Office of Multidisciplinary Activities (SMA)
  • Get NCSES Email Updates Get NCSES Email Updates Go
  • Contact NCSES
  • Research Areas
  • Share on X/Twitter
  • Share on Facebook
  • Share on LinkedIn
  • Send as Email

FFRDC Research and Development Expenditures: Fiscal Year 2020

General notes.

This report provides data from the 2020 Federally Funded Research and Development Center (FFRDC) Research and Development Survey. This survey is the primary source of information on separately accounted R&D expenditures at FFRDCs in the United States. Conducted annually for university-administered FFRDCs since FY 1953 and all FFRDCs since FY 2001, the survey collects information on R&D expenditures by source of funds and types of research and expenses. The survey is an annual census of the full population of eligible FFRDCs. See https://www.nsf.gov/statistics/ffrdclist/ for the Master Government List of FFRDCs.

Data Tables

Technical notes, survey overview, key survey information, survey design, data collection and processing methods, survey quality measures, data comparability (changes), definitions.

Purpos e . The Federally Funded Research and Development Center (FFRDC) Research and Development Survey is conducted by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation (NSF). It is the primary source of information on separately accounted for R&D expenditures at FFRDCs in the United States.

Data c oll ec tion authorit y . The information is solicited under the authority of the NSF Act of 1950, as amended, and the American COMPETES Reauthorization Act of 2010. The Office of Management and Budget control number for the FY 2020 FFRDC R&D Survey is 3145-0100, with an expiration date of 31 August 2022.

Sur ve y c ontr a c tor. ICF.

Sur ve y sponsor. NCSES.

Fr e qu e n c y . Annual.

I nitial sur ve y ye ar. 2001.

R e f e r e n c e p e riod. FY 2020.

R e sponse unit. Establishment.

Sample or ce nsus. Census.

Population siz e . 42.

Sample siz e . The survey is a census of all known eligible FFRDCs.

T arg e t population. All FFRDCs.

Sampling fram e . The total survey universe is identified through the NSF Master Government List of FFRDCs ( https://www.nsf.gov/statistics/ffrdclist/ ). NSF is responsible for maintaining this list and queries all federal agencies annually to determine any changes to, additions to, or deletions from the list.

Sample d e sign. The FFRDC R&D Survey is a census of all eligible organizations.

Data c oll ec tion. The FY 2020 survey announcements were sent by e-mail to all FFRDCs in December 2020. Respondents could choose to complete a questionnaire downloaded from the Web or use a Web-based data collection system to respond to the survey. Every effort was made to maintain close contact with respondents to preserve the consistency and continuity of the resulting data. Survey data reports were available on the survey website for each institution; these reports showed comparisons between the current year and the 2 prior years of data and noted any substantive disparities. Questionnaires were carefully examined for completeness upon receipt. Respondents were sent personalized e-mail messages asking them to provide any necessary revisions before the final processing and tabulation of data. These e-mail messages included a link to the FFRDC R&D Survey Web-based collection system, allowing respondents to view and correct their data online.

Respondents were asked to explain significant differences between current year reporting and established patterns of reporting verified for prior years. They were encouraged to correct prior year data, if necessary. When respondents updated or amended figures from past years, NCSES made corresponding changes to trend data in the 2020 data tables and to the underlying microdata. For accurate historical data, use only the most recently released data tables.

M od e . Respondents could respond to the survey by completing an Adobe PDF questionnaire downloaded from the Web or by using a Web-based data collection system. All FFRDCs submitted data using the Web-based survey.

R e sponse rat e s. All 42 FFRDCs included on the NSF Master Government List of FFRDCs during the FY 2020 survey cycle completed the key survey questions.

Data e diting. The FFRDC R&D Survey was subject to very little editing; respondents were contacted and asked to resolve possible self-reporting issues themselves. Questionnaires were carefully examined by survey staff upon receipt. These reviews focused on unexplained missing data and explanations provided for changes in reporting patterns. If additional explanations or data revisions were needed, respondents were sent personalized e-mail messages asking them to provide any necessary revisions before the final processing and tabulation of data.

Imputation. No data were imputed for FY 2020.

W e ighting. FFRDC R&D Survey data were not weighted.

Varian c e e stimation. No variance estimation techniques were used.

Sampling e rror. Because the FY 2020 survey was a survey distributed to all organizations in the universe, there was no sampling error.

C o ve rage e rror. Given the availability of a comprehensive FFRDC list, there is no known coverage error for this survey. FFRDCs are identified through the NSF Master Government List of FFRDCs . NSF is responsible for maintaining the master list and queries all federal agencies annually to determine changes to, additions to, or deletions from the list.

N onr e sponse e rror. Most FFRDCs have incorporated the data needed to complete most of the survey questions into their record-keeping systems. Twelve FFRDCs chose not to complete Question 5 of the survey, which asks for expenditures by type of cost. Eleven of those FFRDCs are managed by private companies for whom salary information is considered proprietary. One FFRDC, which is managed by a university, could not separate its expenditures by type from those of the managing institution. Other FFRDCs did not answer all sections of Question 5: four FFRDCs could not provide information on software expenditures, and four could not provide data on equipment expenditures. One FFRDC did not report its operating budget (Question 6). One FFRDC could not provide expenditures funded by nonprofit organizations separately from those funded by businesses (Question 1); the combined amount was reported as business-funded.

Me asur e m e nt e rror. NCSES discovered during the FY 2011 survey cycle that seven FFRDCs were including capital project expenditures in the R&D totals reported on the survey. Corrections made for the FY 2011 survey cycle lowered total expenditures by $468 million. However, previous years still include an unknown amount of capital expenditures in the total. The amount is estimated to be less than $500 million per year.

Prior to the FY 2011 survey, the five FFRDCs administered by the MITRE Corporation had reported only internally funded R&D expenditures. After discussions with NCSES, these five FFRDCs agreed to report all FY 2011 operating expenditures for R&D and to revise their data for FYs 2008–10.

NCSES discovered during the FY 2013 survey cycle that Los Alamos National Laboratory (LANL) was reporting some expenditures that were not for R&D, as defined by this survey. Corrections made for the FY 2013 survey cycle lowered the laboratory’s total expenditures by $349 million. LANL also incorrectly was reporting that all expenditures were for basic research. In corrections made for FY 2013, LANL reported that $1,554 million (91%) of its total research expenditures was for applied research. LANL data from previous years still include an unknown amount of expenditures that were not for R&D and categorize all expenditures as basic research.

Prior to FY 2014, the Aerospace FFRDC reported only expenditures on internal R&D projects. After discussions with NCSES, the Aerospace Corporation agreed to report all R&D expenditures for FY 2014 and provide revised data to include all R&D expenditures for FYs 2010–13. R&D expenditures increased by more than $800 million each year.

During the FY 2014 survey, NCSES discovered that the National Optical Astronomy Observatory had been including data for the National Solar Observatory since FY 2010. The Association of Universities for Research in Astronomy, the administrator of both FFRDCs, provided revised data for both FFRDCs for FYs 2010–13.

During the FY 2016 survey, NCSES discovered that the Judiciary Engineering and Modernization Center was incorrectly classified as an industry-administered FFRDC in FYs 2011–15. This FFRDC is administered by the MITRE Corporation, a nonprofit organization, and should have been classified as a nonprofit-administered FFRDC. The classification was corrected for FY 2016, and the Judiciary Engineering and Modernization Center’s FYs 2011–15 data were reclassified as coming from a nonprofit-administered FFRDC.

Annual data are available for FYs 2001–20. When the review for consistency between each year’s data and submissions in prior years reveals discrepancies, it is sometimes necessary to modify prior year data. For accurate historical data, use only the most recently released data tables. Individuals wishing to analyze trends other than those in the most recent NCSES publication are encouraged to contact the Survey Manager for more information about the comparability of data over time.

C hang e s in survey coverage and population. Most years, there are some changes to the FFRDC population that may affect trend analyses. FFRDCs have been created, decertified, renamed, or restructured, as described below:

  • On 20 December 2006, the National Biodefense Analysis and Countermeasure Center was created.
  • Prior to FY 2009, the Center for Enterprise Modernization was listed as the Internal Revenue Service FFRDC.
  • On 5 March 2009, the Homeland Security Studies and Analysis Institute and the Homeland Security Systems Engineering and Development Institute were created. These new FFRDCs replaced the Homeland Security Institute.
  • On 1 October 2009, the National Solar Observatory split from the National Optical Astronomy Observatory, with both retaining their FFRDC status.
  • On 2 September 2010, the Judiciary Engineering and Modernization Center was created.
  • Prior to FY 2011, the National Security Engineering Center was listed as C3I FFRDC.
  • On 1 October 2011, the National Astronomy and Ionosphere Center was decertified as an FFRDC.
  • Prior to FY 2012, the Frederick National Laboratory for Cancer Research was listed as the National Cancer Institute at Frederick.
  • On 27 September 2012, the Centers for Medicare and Medicaid Services FFRDC was created. On 15 August 2013, its name was changed to the CMS Alliance to Modernize Healthcare.
  • Prior to FY 2013, the Systems and Analyses Center was listed as the Studies and Analyses Center.
  • On 19 September 2014, the National Cybersecurity Center of Excellence was created.
  • On 15 September 2016, the Homeland Security Operational Analysis Center was created.
  • The Homeland Security Studies and Analysis Institute was phased out on 31 October 2016.
  • In June 2020, the National Optical Astronomy Observatory changed its name to NSF’s National Optical-Infrared Astronomy Research Laboratory, or NSF’s NOIRLab.

C hang e s in qu e stionnai r e . FFRDCs are asked to provide R&D expenditures by source of funding and type of R&D. In FY 2010, NCSES revised the survey to include three new questions requesting expenditures funded by the American Recovery and Reinvestment Act of 2009 (ARRA), expenditures by type of cost, and total operating budget. In FY 2015, NCSES revised the survey to exclude the question requesting expenditures funded by ARRA. In FY 2016, NCSES added a question (Question 2) asking for R&D expenditures funded by seven specific federal agencies. In FY 2019, NCSES added a question (Question 3) asking which federal agencies funded the expenditures reported under Other federal agencies in Question 2.

Changes in reporting procedures or classification. The FFRDC R&D Survey has been conducted annually for university-administered FFRDCs since FY 1953 and for all FFRDCs since FY 2001.

  • Expenditures by federal agency. In Question 2, FFRDCs were asked for the amount of R&D expenditures by federal funding agency. FFRDCs were asked to report expenditures funded by seven specific agencies (Department of Defense; Department of Energy; Department of Health and Human Services, including the National Institutes of Health; Department of Homeland Security; Department of Transportation; National Aeronautics and Space Administration; and National Science Foundation). Any expenditures funded by other federal agencies were reported under Other federal agencies. In Question 3, FFRDCs were asked to list the specific agencies and corresponding expenditures included in Other federal agencies.
  • E x p e nditur e s by sour c e . In Question 1, FFRDCs were asked to report their total R&D expenditures by funding source, as defined below:
  • U.S. federal government. Any agency of the U.S. government. Federal funds that were passed through to the reporting institution from another institution were included.
  • State and local government. Any state, county, municipality, or other local government entity in the United States, including state health agencies.
  • Business. Domestic or foreign for-profit organizations. Funds from a company’s nonprofit foundation were not reported here; they were reported under Nonprofit organizations.
  • Nonprofit organizations. Domestic or foreign nonprofit foundations and organizations.
  • All other sources. Sources not reported in other categories, such as funds from foreign governments.
  • E x p e nditur e s by t y p e of c ost. In Question 5, FFRDCs were asked for expenditures by type of cost, as defined below:
  • Salari e s, w ag e s, and fri n ge b e n e fits. Included compensation for all R&D personnel, whether full time or part time, temporary or permanent, including salaries, wages, and fringe benefits paid from institution funds and from external support.
  • Soft w are pur c has e s. Included payments for all software, both purchases of software packages and license fees for systems.
  • Equipm e nt. Included payments for movable equipment, including ancillary costs such as delivery and setup.
  • Sub c ontra c ts. Payments to subcontractors or subrecipients for services on R&D projects.
  • Oth e r dir ec t c osts. Other costs that did not fit into one of the above categories, including (but not limited to) travel, computer usage fees, and supplies.
  • I ndir ec t c osts. Included all indirect costs (overhead) associated with R&D projects.
  • E x p e nditur e s by type of R&D . In Question 4, FFRDCs were asked for the amount of federal and nonfederal R&D expenditures by type of R&D, as defined below:
  • Basic r e s e ar c h. Experimental or theoretical work undertaken primarily to acquire new knowledge of the underlying foundations of phenomena and observable facts, without any particular application or use in view.
  • Appli e d r e s e ar c h. Original investigation undertaken to acquire new knowledge. It is directed primarily toward a specific, practical aim or objective.
  • Experimental d eve lop m e nt. Systematic work, drawing on knowledge gained from research and practical experience and producing additional knowledge, which is directed to producing new products or processes or to improving existing products or processes.
  • Fis c al ye ar. FFRDCs were asked to report data for their fiscal year (or financial year).
  • R & D . The FFRDC R&D Survey requested data from FFRDCs on their R&D, defined as systematic study directed toward fuller knowledge or understanding of the subject studied. R&D included basic research, applied research, and experimental development (see E x p e nditu re s b y type of R&D above for additional information). R&D did not include outreach or nonresearch training programs. Respondents were also asked to exclude capital projects (i.e., construction or renovation of research facilities) from reported expenditures.
  • R & D e x p e nditur e s. FFRDCs were asked to report all current operating expenditures for activities specifically organized to produce R&D outcomes, including those funded by external sponsors or separately budgeted and accounted for by the organization using internal funds. Expenditures included indirect costs, equipment, software, clinical trials, and subcontract expenditures.
  • T otal op e rating budg e t. Total executed operating budget for the FFRDC excluding capital construction costs.

Acknowledgments and Suggested Citation

Acknowledgments, suggested citation.

Michael T. Gibbons of the National Center for Science and Engineering Statistics (NCSES) developed and coordinated this report under the guidance of John Jankowski, NCSES Program Director, and under the leadership of Emilda B. Rivers, NCSES Director; Vipin Arora, NCSES Deputy Director; and Matthew Williams, NCSES Acting Chief Statistician.

Under contract to NCSES, ICF conducted the survey and prepared the tables. ICF staff members who made significant contributions include Kathryn Harper, Project Director; Sherri Mamon, Deputy Project Director; Jennifer Greer, Data Management Lead; Sindhura Geda, Data Management Specialist; Bridget Beavers, Data Management Specialist; Carolyn Bennett, Data Collection Manager; Cameron Shanton, Data Collection Specialist; Melinda Scott, Data Collection Specialist; David Greene, Survey Systems Lead; Vladimer Shioshvili, Software Application Engineer. Publication processing support was provided by Devi Mishra, Catherine Corlies, and Tanya Gore (NCSES).

NCSES thanks the FFRDCs that provided information for this report.

National Center for Science and Engineering Statistics (NCSES). 2020. FFRDC Research and Development Expenditures: Fiscal Year 20 20 . NSF 22-304. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/pubs/nsf22304/ .

Report Author

Michael T. Gibbons Survey Manager Research and Development Statistics Program, NCSES Tel: (703) 292-4590 E-mail: [email protected]

National Center for Science and Engineering Statistics Directorate for Social, Behavioral and Economic Sciences National Science Foundation 2415 Eisenhower Avenue, Suite W14200 Alexandria, VA 22314 Tel: (703) 292-8780 FIRS: (800) 877-8339 TDD: (800) 281-8749 E-mail [email protected]

Read more about the source: FFRDC Research and Development Survey .

Browse the data collection for R&D Expenditures at Federally Funded R&D Centers .

Can't Find What You Are Looking For?

  • Search NCSES to find data and reports.
  • Privacy Policy

Buy Me a Coffee

Research Method

Home » Research Methodology – Types, Examples and writing Guide

Research Methodology – Types, Examples and writing Guide

Table of Contents

Research Methodology

Research Methodology

Definition:

Research Methodology refers to the systematic and scientific approach used to conduct research, investigate problems, and gather data and information for a specific purpose. It involves the techniques and procedures used to identify, collect , analyze , and interpret data to answer research questions or solve research problems . Moreover, They are philosophical and theoretical frameworks that guide the research process.

Structure of Research Methodology

Research methodology formats can vary depending on the specific requirements of the research project, but the following is a basic example of a structure for a research methodology section:

I. Introduction

  • Provide an overview of the research problem and the need for a research methodology section
  • Outline the main research questions and objectives

II. Research Design

  • Explain the research design chosen and why it is appropriate for the research question(s) and objectives
  • Discuss any alternative research designs considered and why they were not chosen
  • Describe the research setting and participants (if applicable)

III. Data Collection Methods

  • Describe the methods used to collect data (e.g., surveys, interviews, observations)
  • Explain how the data collection methods were chosen and why they are appropriate for the research question(s) and objectives
  • Detail any procedures or instruments used for data collection

IV. Data Analysis Methods

  • Describe the methods used to analyze the data (e.g., statistical analysis, content analysis )
  • Explain how the data analysis methods were chosen and why they are appropriate for the research question(s) and objectives
  • Detail any procedures or software used for data analysis

V. Ethical Considerations

  • Discuss any ethical issues that may arise from the research and how they were addressed
  • Explain how informed consent was obtained (if applicable)
  • Detail any measures taken to ensure confidentiality and anonymity

VI. Limitations

  • Identify any potential limitations of the research methodology and how they may impact the results and conclusions

VII. Conclusion

  • Summarize the key aspects of the research methodology section
  • Explain how the research methodology addresses the research question(s) and objectives

Research Methodology Types

Types of Research Methodology are as follows:

Quantitative Research Methodology

This is a research methodology that involves the collection and analysis of numerical data using statistical methods. This type of research is often used to study cause-and-effect relationships and to make predictions.

Qualitative Research Methodology

This is a research methodology that involves the collection and analysis of non-numerical data such as words, images, and observations. This type of research is often used to explore complex phenomena, to gain an in-depth understanding of a particular topic, and to generate hypotheses.

Mixed-Methods Research Methodology

This is a research methodology that combines elements of both quantitative and qualitative research. This approach can be particularly useful for studies that aim to explore complex phenomena and to provide a more comprehensive understanding of a particular topic.

Case Study Research Methodology

This is a research methodology that involves in-depth examination of a single case or a small number of cases. Case studies are often used in psychology, sociology, and anthropology to gain a detailed understanding of a particular individual or group.

Action Research Methodology

This is a research methodology that involves a collaborative process between researchers and practitioners to identify and solve real-world problems. Action research is often used in education, healthcare, and social work.

Experimental Research Methodology

This is a research methodology that involves the manipulation of one or more independent variables to observe their effects on a dependent variable. Experimental research is often used to study cause-and-effect relationships and to make predictions.

Survey Research Methodology

This is a research methodology that involves the collection of data from a sample of individuals using questionnaires or interviews. Survey research is often used to study attitudes, opinions, and behaviors.

Grounded Theory Research Methodology

This is a research methodology that involves the development of theories based on the data collected during the research process. Grounded theory is often used in sociology and anthropology to generate theories about social phenomena.

Research Methodology Example

An Example of Research Methodology could be the following:

Research Methodology for Investigating the Effectiveness of Cognitive Behavioral Therapy in Reducing Symptoms of Depression in Adults

Introduction:

The aim of this research is to investigate the effectiveness of cognitive-behavioral therapy (CBT) in reducing symptoms of depression in adults. To achieve this objective, a randomized controlled trial (RCT) will be conducted using a mixed-methods approach.

Research Design:

The study will follow a pre-test and post-test design with two groups: an experimental group receiving CBT and a control group receiving no intervention. The study will also include a qualitative component, in which semi-structured interviews will be conducted with a subset of participants to explore their experiences of receiving CBT.

Participants:

Participants will be recruited from community mental health clinics in the local area. The sample will consist of 100 adults aged 18-65 years old who meet the diagnostic criteria for major depressive disorder. Participants will be randomly assigned to either the experimental group or the control group.

Intervention :

The experimental group will receive 12 weekly sessions of CBT, each lasting 60 minutes. The intervention will be delivered by licensed mental health professionals who have been trained in CBT. The control group will receive no intervention during the study period.

Data Collection:

Quantitative data will be collected through the use of standardized measures such as the Beck Depression Inventory-II (BDI-II) and the Generalized Anxiety Disorder-7 (GAD-7). Data will be collected at baseline, immediately after the intervention, and at a 3-month follow-up. Qualitative data will be collected through semi-structured interviews with a subset of participants from the experimental group. The interviews will be conducted at the end of the intervention period, and will explore participants’ experiences of receiving CBT.

Data Analysis:

Quantitative data will be analyzed using descriptive statistics, t-tests, and mixed-model analyses of variance (ANOVA) to assess the effectiveness of the intervention. Qualitative data will be analyzed using thematic analysis to identify common themes and patterns in participants’ experiences of receiving CBT.

Ethical Considerations:

This study will comply with ethical guidelines for research involving human subjects. Participants will provide informed consent before participating in the study, and their privacy and confidentiality will be protected throughout the study. Any adverse events or reactions will be reported and managed appropriately.

Data Management:

All data collected will be kept confidential and stored securely using password-protected databases. Identifying information will be removed from qualitative data transcripts to ensure participants’ anonymity.

Limitations:

One potential limitation of this study is that it only focuses on one type of psychotherapy, CBT, and may not generalize to other types of therapy or interventions. Another limitation is that the study will only include participants from community mental health clinics, which may not be representative of the general population.

Conclusion:

This research aims to investigate the effectiveness of CBT in reducing symptoms of depression in adults. By using a randomized controlled trial and a mixed-methods approach, the study will provide valuable insights into the mechanisms underlying the relationship between CBT and depression. The results of this study will have important implications for the development of effective treatments for depression in clinical settings.

How to Write Research Methodology

Writing a research methodology involves explaining the methods and techniques you used to conduct research, collect data, and analyze results. It’s an essential section of any research paper or thesis, as it helps readers understand the validity and reliability of your findings. Here are the steps to write a research methodology:

  • Start by explaining your research question: Begin the methodology section by restating your research question and explaining why it’s important. This helps readers understand the purpose of your research and the rationale behind your methods.
  • Describe your research design: Explain the overall approach you used to conduct research. This could be a qualitative or quantitative research design, experimental or non-experimental, case study or survey, etc. Discuss the advantages and limitations of the chosen design.
  • Discuss your sample: Describe the participants or subjects you included in your study. Include details such as their demographics, sampling method, sample size, and any exclusion criteria used.
  • Describe your data collection methods : Explain how you collected data from your participants. This could include surveys, interviews, observations, questionnaires, or experiments. Include details on how you obtained informed consent, how you administered the tools, and how you minimized the risk of bias.
  • Explain your data analysis techniques: Describe the methods you used to analyze the data you collected. This could include statistical analysis, content analysis, thematic analysis, or discourse analysis. Explain how you dealt with missing data, outliers, and any other issues that arose during the analysis.
  • Discuss the validity and reliability of your research : Explain how you ensured the validity and reliability of your study. This could include measures such as triangulation, member checking, peer review, or inter-coder reliability.
  • Acknowledge any limitations of your research: Discuss any limitations of your study, including any potential threats to validity or generalizability. This helps readers understand the scope of your findings and how they might apply to other contexts.
  • Provide a summary: End the methodology section by summarizing the methods and techniques you used to conduct your research. This provides a clear overview of your research methodology and helps readers understand the process you followed to arrive at your findings.

When to Write Research Methodology

Research methodology is typically written after the research proposal has been approved and before the actual research is conducted. It should be written prior to data collection and analysis, as it provides a clear roadmap for the research project.

The research methodology is an important section of any research paper or thesis, as it describes the methods and procedures that will be used to conduct the research. It should include details about the research design, data collection methods, data analysis techniques, and any ethical considerations.

The methodology should be written in a clear and concise manner, and it should be based on established research practices and standards. It is important to provide enough detail so that the reader can understand how the research was conducted and evaluate the validity of the results.

Applications of Research Methodology

Here are some of the applications of research methodology:

  • To identify the research problem: Research methodology is used to identify the research problem, which is the first step in conducting any research.
  • To design the research: Research methodology helps in designing the research by selecting the appropriate research method, research design, and sampling technique.
  • To collect data: Research methodology provides a systematic approach to collect data from primary and secondary sources.
  • To analyze data: Research methodology helps in analyzing the collected data using various statistical and non-statistical techniques.
  • To test hypotheses: Research methodology provides a framework for testing hypotheses and drawing conclusions based on the analysis of data.
  • To generalize findings: Research methodology helps in generalizing the findings of the research to the target population.
  • To develop theories : Research methodology is used to develop new theories and modify existing theories based on the findings of the research.
  • To evaluate programs and policies : Research methodology is used to evaluate the effectiveness of programs and policies by collecting data and analyzing it.
  • To improve decision-making: Research methodology helps in making informed decisions by providing reliable and valid data.

Purpose of Research Methodology

Research methodology serves several important purposes, including:

  • To guide the research process: Research methodology provides a systematic framework for conducting research. It helps researchers to plan their research, define their research questions, and select appropriate methods and techniques for collecting and analyzing data.
  • To ensure research quality: Research methodology helps researchers to ensure that their research is rigorous, reliable, and valid. It provides guidelines for minimizing bias and error in data collection and analysis, and for ensuring that research findings are accurate and trustworthy.
  • To replicate research: Research methodology provides a clear and detailed account of the research process, making it possible for other researchers to replicate the study and verify its findings.
  • To advance knowledge: Research methodology enables researchers to generate new knowledge and to contribute to the body of knowledge in their field. It provides a means for testing hypotheses, exploring new ideas, and discovering new insights.
  • To inform decision-making: Research methodology provides evidence-based information that can inform policy and decision-making in a variety of fields, including medicine, public health, education, and business.

Advantages of Research Methodology

Research methodology has several advantages that make it a valuable tool for conducting research in various fields. Here are some of the key advantages of research methodology:

  • Systematic and structured approach : Research methodology provides a systematic and structured approach to conducting research, which ensures that the research is conducted in a rigorous and comprehensive manner.
  • Objectivity : Research methodology aims to ensure objectivity in the research process, which means that the research findings are based on evidence and not influenced by personal bias or subjective opinions.
  • Replicability : Research methodology ensures that research can be replicated by other researchers, which is essential for validating research findings and ensuring their accuracy.
  • Reliability : Research methodology aims to ensure that the research findings are reliable, which means that they are consistent and can be depended upon.
  • Validity : Research methodology ensures that the research findings are valid, which means that they accurately reflect the research question or hypothesis being tested.
  • Efficiency : Research methodology provides a structured and efficient way of conducting research, which helps to save time and resources.
  • Flexibility : Research methodology allows researchers to choose the most appropriate research methods and techniques based on the research question, data availability, and other relevant factors.
  • Scope for innovation: Research methodology provides scope for innovation and creativity in designing research studies and developing new research techniques.

Research Methodology Vs Research Methods

About the author.

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Research Paper Citation

How to Cite Research Paper – All Formats and...

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Paper Formats

Research Paper Format – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Internet use by community type

Note: The vertical line indicates a change in mode. Polls from 2000-2021 were conducted via phone. In 2023, the poll was conducted via web and mail. For more details on this shift, please read our Q&A . Refer to the topline for more information on how question wording varied over the years. Respondents who did not give an answer are not shown.

Source: Surveys of U.S. adults conducted 2000-2023. Data for each year is based on a pooled analysis of all surveys conducted during that year.

Sign up for The Briefing

Weekly updates on the world of news & information

Most Popular

1615 L St. NW, Suite 800 Washington, DC 20036 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Age & Generations
  • Coronavirus (COVID-19)
  • Economy & Work
  • Family & Relationships
  • Gender & LGBTQ
  • Immigration & Migration
  • International Affairs
  • Internet & Technology
  • Methodological Research
  • News Habits & Media
  • Non-U.S. Governments
  • Other Topics
  • Politics & Policy
  • Race & Ethnicity
  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

Copyright 2024 Pew Research Center

Terms & Conditions

Privacy Policy

Cookie Settings

Reprints, Permissions & Use Policy

IMAGES

  1. 6-1: Types of Research Data (Source: Malhotra et al, 2002)

    data type in research method

  2. 15 Types of Research Methods (2024)

    data type in research method

  3. Research Data

    data type in research method

  4. data collection in research methodology

    data type in research method

  5. Types of Research Methodology: Uses, Types & Benefits

    data type in research method

  6. Research Methods 15

    data type in research method

VIDEO

  1. Data Type and Variable in C

  2. Different Data Type in MySQL

  3. Data Type Set None Mappings Data Types in Python Programming. Mutable and Inmutable Data Types

  4. Metho 2: Types of Research

  5. Types of Research Design

  6. 2.Types of Research in education

COMMENTS

  1. Research Data

    Research data refers to any information or evidence gathered through systematic investigation or experimentation to support or refute a hypothesis or answer a research question. It includes both primary and secondary data, and can be in various formats such as numerical, textual, audiovisual, or visual. Research data plays a critical role in ...

  2. Research Methods

    Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral part of your research design. When planning your methods, there are two key decisions you will make. First, decide how you will collect data. Your methods depend on what type of data you need to answer your research question:

  3. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  4. Research Methods--Quantitative, Qualitative, and More: Overview

    The choice of methods varies by discipline, by the kind of phenomenon being studied and the data being used to study it, by the technology available, and more. This guide is an introduction, but if you don't see what you need here, always contact your subject librarian, and/or take a look to see if there's a library research guide that will ...

  5. FIU Libraries: Research Methods Help Guide: Types of Data

    Numerical data. Quantitative variables can be continuous or discrete. Continuous: the variable can, in theory, be any value within a certain range. Can be measured. Examples: height, weight, blood pressure, cholesterol. Discrete: the variable can only have certain values, usually whole numbers. Can be counted.

  6. Research Methods

    Research methods are specific procedures for collecting and analysing data. Developing your research methods is an integral part of your research design. When planning your methods, there are two key decisions you will make. First, decide how you will collect data. Your methods depend on what type of data you need to answer your research question:

  7. Types of Research Designs Compared

    You can also create a mixed methods research design that has elements of both. Descriptive research vs experimental research. Descriptive research gathers data without controlling any variables, while experimental research manipulates and controls variables to determine cause and effect.

  8. Types of Research Data

    Data may be grouped into four main types based on methods for collection: observational, experimental, simulation, and derived. The type of research data you collect may affect the way you manage that data. For example, data that is hard or impossible to replace (e.g. the recording of an event at a specific time and place) requires extra backup ...

  9. What Is a Research Design

    Step 1: Consider your aims and approach. Step 2: Choose a type of research design. Step 3: Identify your population and sampling method. Step 4: Choose your data collection methods. Step 5: Plan your data collection procedures. Step 6: Decide on your data analysis strategies. Other interesting articles.

  10. What is Data in Statistics & Types Of Data With Examples

    Using the wrong method for a particular data type can lead to erroneous conclusions. Therefore, understanding the types of data you're working with enables you to select the appropriate method of analysis, ensuring accurate and reliable results. In statistical analysis, data is broadly categorized into two main types—qualitative data and ...

  11. Quantitative Data

    Here is a basic guide for gathering quantitative data: Define the research question: The first step in gathering quantitative data is to clearly define the research question. This will help determine the type of data to be collected, the sample size, and the methods of data analysis.

  12. Choosing the Right Research Methodology: A Guide

    Choosing an optimal research methodology is crucial for the success of any research project. The methodology you select will determine the type of data you collect, how you collect it, and how you analyse it. Understanding the different types of research methods available along with their strengths and weaknesses, is thus imperative to make an ...

  13. Qualitative vs Quantitative Research Methods & Data Analysis

    The main difference between quantitative and qualitative research is the type of data they collect and analyze. Quantitative research collects numerical data and analyzes it using statistical methods. The aim is to produce objective, empirical data that can be measured and expressed in numerical terms.

  14. Data Analysis: Types, Methods & Techniques (a Complete List)

    Quantitative data analysis then splits into mathematical analysis and artificial intelligence (AI) analysis. Mathematical types then branch into descriptive, diagnostic, predictive, and prescriptive. Methods falling under mathematical analysis include clustering, classification, forecasting, and optimization.

  15. Data Collection

    Data collection is the process of gathering and collecting information from various sources to analyze and make informed decisions based on the data collected. This can involve various methods, such as surveys, interviews, experiments, and observation. In order for data collection to be effective, it is important to have a clear understanding ...

  16. How to use and assess qualitative research methods

    The pre-defined topics in the interview guide can be derived from the literature, previous research or a preliminary method of data collection, e.g. document study or observations. ... Types of research problems: Data collection: Data analysis • Assessing complex multi-component interventions or systems (of change)

  17. Qualitative Study

    Qualitative research is a type of research that explores and provides deeper insights into real-world problems.[1] Instead of collecting numerical data points or intervene or introduce treatments just like in quantitative research, qualitative research helps generate hypotheses as well as further investigate and understand quantitative data. Qualitative research gathers participants ...

  18. Research Methods In Psychology

    Olivia Guy-Evans, MSc. Research methods in psychology are systematic procedures used to observe, describe, predict, and explain behavior and mental processes. They include experiments, surveys, case studies, and naturalistic observations, ensuring data collection is objective and reliable to understand and explain psychological phenomena.

  19. Data Sources for Demographic Research

    Other sources. Pew Research Center uses a wide variety of other data in carrying out its demographic research. For a report on " Changing Patterns of Global Migration and Remittances " researchers used migration data from the United Nations and remittance inflow data from the World Bank. For a report on young adults in the wake of the ...

  20. Redacting identifying information with computational methods in large

    When one method failed to detect the school district name in a statement, another was able to identify it. Combining these methods ensured the removal of the school district names from all mission statements. School district name matching. One way to identify school district names in the text was to match them against outside data sources.

  21. Primary Research

    Primary Research | Definition, Types, & Examples. Published on January 14, 2023 by Tegan George.Revised on January 12, 2024. Primary research is a research method that relies on direct data collection, rather than relying on data that's already been collected by someone else.In other words, primary research is any type of research that you undertake yourself, firsthand, while using data that ...

  22. How we keep our online surveys from running too long

    The longer a survey is, the fewer people are willing to complete it. This is especially true for surveys conducted online, where attention spans can be short. While there is no magic length that an online survey should be, Pew Research Center caps the length of its online American Trends Panel (ATP) surveys at 15 minutes, based on prior research.

  23. A method for identifying different types of university research teams

    Identifying research teams constitutes a fundamental step in team science research, and universities harbor diverse types of such teams. This study introduces a method and proposes algorithms for ...

  24. Research Methods

    Quantitative research methods are used to collect and analyze numerical data. This type of research is useful when the objective is to test a hypothesis, determine cause-and-effect relationships, and measure the prevalence of certain phenomena. Quantitative research methods include surveys, experiments, and secondary data analysis.

  25. SPADE: spatial deconvolution for domain specific cell-type ...

    Understanding gene expression in different cell types within their spatial context is a key goal in genomics research. SPADE (SPAtial DEconvolution), our proposed method, addresses this by ...

  26. Analysis of the impact of terrain factors and data fusion methods on

    Current research on deep learning-based intelligent landslide detection modeling has focused primarily on improving and innovating model structures. However, the impact of terrain factors and data fusion methods on the prediction accuracy of models remains underexplored. To clarify the contribution of terrain information to landslide detection modeling, 1022 landslide samples compiled from ...

  27. Data Collection

    Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem. While methods and aims may differ between fields, the overall process of ...

  28. FFRDC Research and Development Expenditures: Fiscal Year 2020

    These tables provide data on R&D expenditures for federally funded research and development centers (FFRDCs). Expenditures data are displayed for individually named FFRDCs, by source of funding, and type of R&D. Data are from the FY 2020 FFRDC Research and Development Survey, by the National Center for Science and Engineering Statistics within the National Science Foundation.

  29. Research Methodology

    Research Methodology Types. Types of Research Methodology are as follows: Quantitative Research Methodology. This is a research methodology that involves the collection and analysis of numerical data using statistical methods. This type of research is often used to study cause-and-effect relationships and to make predictions.

  30. Internet use by community type

    Internet use by community type. % of U.S. adults who say they use the internet, by community type. Change in survey mode --- 0% 20% 40% 60% 80% 100% 2000 2005 2010 2015 2020 2023. Urban. Suburban. Rural. Note: The vertical line indicates a change in mode. Polls from 2000-2021 were conducted via phone. In 2023, the poll was conducted via web and ...