• Evaluation Research Design: Examples, Methods & Types

busayo.longe

As you engage in tasks, you will need to take intermittent breaks to determine how much progress has been made and if any changes need to be effected along the way. This is very similar to what organizations do when they carry out  evaluation research.  

The evaluation research methodology has become one of the most important approaches for organizations as they strive to create products, services, and processes that speak to the needs of target users. In this article, we will show you how your organization can conduct successful evaluation research using Formplus .

What is Evaluation Research?

Also known as program evaluation, evaluation research is a common research design that entails carrying out a structured assessment of the value of resources committed to a project or specific goal. It often adopts social research methods to gather and analyze useful information about organizational processes and products.  

As a type of applied research , evaluation research typically associated  with real-life scenarios within organizational contexts. This means that the researcher will need to leverage common workplace skills including interpersonal skills and team play to arrive at objective research findings that will be useful to stakeholders. 

Characteristics of Evaluation Research

  • Research Environment: Evaluation research is conducted in the real world; that is, within the context of an organization. 
  • Research Focus: Evaluation research is primarily concerned with measuring the outcomes of a process rather than the process itself. 
  • Research Outcome: Evaluation research is employed for strategic decision making in organizations. 
  • Research Goal: The goal of program evaluation is to determine whether a process has yielded the desired result(s). 
  • This type of research protects the interests of stakeholders in the organization. 
  • It often represents a middle-ground between pure and applied research. 
  • Evaluation research is both detailed and continuous. It pays attention to performative processes rather than descriptions. 
  • Research Process: This research design utilizes qualitative and quantitative research methods to gather relevant data about a product or action-based strategy. These methods include observation, tests, and surveys.

Types of Evaluation Research

The Encyclopedia of Evaluation (Mathison, 2004) treats forty-two different evaluation approaches and models ranging from “appreciative inquiry” to “connoisseurship” to “transformative evaluation”. Common types of evaluation research include the following: 

  • Formative Evaluation

Formative evaluation or baseline survey is a type of evaluation research that involves assessing the needs of the users or target market before embarking on a project.  Formative evaluation is the starting point of evaluation research because it sets the tone of the organization’s project and provides useful insights for other types of evaluation.  

  • Mid-term Evaluation

Mid-term evaluation entails assessing how far a project has come and determining if it is in line with the set goals and objectives. Mid-term reviews allow the organization to determine if a change or modification of the implementation strategy is necessary, and it also serves for tracking the project. 

  • Summative Evaluation

This type of evaluation is also known as end-term evaluation of project-completion evaluation and it is conducted immediately after the completion of a project. Here, the researcher examines the value and outputs of the program within the context of the projected results. 

Summative evaluation allows the organization to measure the degree of success of a project. Such results can be shared with stakeholders, target markets, and prospective investors. 

  • Outcome Evaluation

Outcome evaluation is primarily target-audience oriented because it measures the effects of the project, program, or product on the users. This type of evaluation views the outcomes of the project through the lens of the target audience and it often measures changes such as knowledge-improvement, skill acquisition, and increased job efficiency. 

  • Appreciative Enquiry

Appreciative inquiry is a type of evaluation research that pays attention to result-producing approaches. It is predicated on the belief that an organization will grow in whatever direction its stakeholders pay primary attention to such that if all the attention is focused on problems, identifying them would be easy. 

In carrying out appreciative inquiry, the research identifies the factors directly responsible for the positive results realized in the course of a project, analyses the reasons for these results, and intensifies the utilization of these factors. 

Evaluation Research Methodology 

There are four major evaluation research methods, namely; output measurement, input measurement, impact assessment and service quality

  • Output/Performance Measurement

Output measurement is a method employed in evaluative research that shows the results of an activity undertaking by an organization. In other words, performance measurement pays attention to the results achieved by the resources invested in a specific activity or organizational process. 

More than investing resources in a project, organizations must be able to track the extent to which these resources have yielded results, and this is where performance measurement comes in. Output measurement allows organizations to pay attention to the effectiveness and impact of a process rather than just the process itself. 

Other key indicators of performance measurement include user-satisfaction, organizational capacity, market penetration, and facility utilization. In carrying out performance measurement, organizations must identify the parameters that are relevant to the process in question, their industry, and the target markets. 

5 Performance Evaluation Research Questions Examples

  • What is the cost-effectiveness of this project?
  • What is the overall reach of this project?
  • How would you rate the market penetration of this project?
  • How accessible is the project? 
  • Is this project time-efficient? 

performance-evaluation-survey

  • Input Measurement

In evaluation research, input measurement entails assessing the number of resources committed to a project or goal in any organization. This is one of the most common indicators in evaluation research because it allows organizations to track their investments. 

The most common indicator of inputs measurement is the budget which allows organizations to evaluate and limit expenditure for a project. It is also important to measure non-monetary investments like human capital; that is the number of persons needed for successful project execution and production capital. 

5 Input Evaluation Research Questions Examples

  • What is the budget for this project?
  • What is the timeline of this process?
  • How many employees have been assigned to this project? 
  • Do we need to purchase new machinery for this project? 
  • How many third-parties are collaborators in this project? 

evaluation research design type

  • Impact/Outcomes Assessment

In impact assessment, the evaluation researcher focuses on how the product or project affects target markets, both directly and indirectly. Outcomes assessment is somewhat challenging because many times, it is difficult to measure the real-time value and benefits of a project for the users. 

In assessing the impact of a process, the evaluation researcher must pay attention to the improvement recorded by the users as a result of the process or project in question. Hence, it makes sense to focus on cognitive and affective changes, expectation-satisfaction, and similar accomplishments of the users. 

5 Impact Evaluation Research Questions Examples

  • How has this project affected you? 
  • Has this process affected you positively or negatively?
  • What role did this project play in improving your earning power? 
  • On a scale of 1-10, how excited are you about this project?
  • How has this project improved your mental health? 

evaluation research design type

  • Service Quality

Service quality is the evaluation research method that accounts for any differences between the expectations of the target markets and their impression of the undertaken project. Hence, it pays attention to the overall service quality assessment carried out by the users. 

It is not uncommon for organizations to build the expectations of target markets as they embark on specific projects. Service quality evaluation allows these organizations to track the extent to which the actual product or service delivery fulfils the expectations. 

5 Service Quality Evaluation Questions

  • On a scale of 1-10, how satisfied are you with the product?
  • How helpful was our customer service representative?
  • How satisfied are you with the quality of service?
  • How long did it take to resolve the issue at hand?
  • How likely are you to recommend us to your network?

evaluation research design type

Uses of Evaluation Research 

  • Evaluation research is used by organizations to measure the effectiveness of activities and identify areas needing improvement. Findings from evaluation research are key to project and product advancements and are very influential in helping organizations realize their goals efficiently.     
  • The findings arrived at from evaluation research serve as evidence of the impact of the project embarked on by an organization. This information can be presented to stakeholders, customers, and can also help your organization secure investments for future projects. 
  • Evaluation research helps organizations to justify their use of limited resources and choose the best alternatives. 
  •  It is also useful in pragmatic goal setting and realization. 
  • Evaluation research provides detailed insights into projects embarked on by an organization. Essentially, it allows all stakeholders to understand multiple dimensions of a process, and to determine strengths and weaknesses. 
  • Evaluation research also plays a major role in helping organizations to improve their overall practice and service delivery. This research design allows organizations to weigh existing processes through feedback provided by stakeholders, and this informs better decision making. 
  • Evaluation research is also instrumental to sustainable capacity building. It helps you to analyze demand patterns and determine whether your organization requires more funds, upskilling or improved operations.

Data Collection Techniques Used in Evaluation Research

In gathering useful data for evaluation research, the researcher often combines quantitative and qualitative research methods . Qualitative research methods allow the researcher to gather information relating to intangible values such as market satisfaction and perception. 

On the other hand, quantitative methods are used by the evaluation researcher to assess numerical patterns, that is, quantifiable data. These methods help you measure impact and results; although they may not serve for understanding the context of the process. 

Quantitative Methods for Evaluation Research

A survey is a quantitative method that allows you to gather information about a project from a specific group of people. Surveys are largely context-based and limited to target groups who are asked a set of structured questions in line with the predetermined context.

Surveys usually consist of close-ended questions that allow the evaluative researcher to gain insight into several  variables including market coverage and customer preferences. Surveys can be carried out physically using paper forms or online through data-gathering platforms like Formplus . 

  • Questionnaires

A questionnaire is a common quantitative research instrument deployed in evaluation research. Typically, it is an aggregation of different types of questions or prompts which help the researcher to obtain valuable information from respondents. 

A poll is a common method of opinion-sampling that allows you to weigh the perception of the public about issues that affect them. The best way to achieve accuracy in polling is by conducting them online using platforms like Formplus. 

Polls are often structured as Likert questions and the options provided always account for neutrality or indecision. Conducting a poll allows the evaluation researcher to understand the extent to which the product or service satisfies the needs of the users. 

Qualitative Methods for Evaluation Research

  • One-on-One Interview

An interview is a structured conversation involving two participants; usually the researcher and the user or a member of the target market. One-on-One interviews can be conducted physically, via the telephone and through video conferencing apps like Zoom and Google Meet. 

  • Focus Groups

A focus group is a research method that involves interacting with a limited number of persons within your target market, who can provide insights on market perceptions and new products. 

  • Qualitative Observation

Qualitative observation is a research method that allows the evaluation researcher to gather useful information from the target audience through a variety of subjective approaches. This method is more extensive than quantitative observation because it deals with a smaller sample size, and it also utilizes inductive analysis. 

  • Case Studies

A case study is a research method that helps the researcher to gain a better understanding of a subject or process. Case studies involve in-depth research into a given subject, to understand its functionalities and successes. 

How to Formplus Online Form Builder for Evaluation Survey 

  • Sign into Formplus

In the Formplus builder, you can easily create your evaluation survey by dragging and dropping preferred fields into your form. To access the Formplus builder, you will need to create an account on Formplus. 

Once you do this, sign in to your account and click on “Create Form ” to begin. 

formplus

  • Edit Form Title

Click on the field provided to input your form title, for example, “Evaluation Research Survey”.

evaluation research design type

Click on the edit button to edit the form.

Add Fields: Drag and drop preferred form fields into your form in the Formplus builder inputs column. There are several field input options for surveys in the Formplus builder. 

evaluation research design type

Edit fields

Click on “Save”

Preview form.

  • Form Customization

With the form customization options in the form builder, you can easily change the outlook of your form and make it more unique and personalized. Formplus allows you to change your form theme, add background images, and even change the font according to your needs. 

evaluation-research-from-builder

  • Multiple Sharing Options

Formplus offers multiple form sharing options which enables you to easily share your evaluation survey with survey respondents. You can use the direct social media sharing buttons to share your form link to your organization’s social media pages. 

You can send out your survey form as email invitations to your research subjects too. If you wish, you can share your form’s QR code or embed it on your organization’s website for easy access. 

Conclusion  

Conducting evaluation research allows organizations to determine the effectiveness of their activities at different phases. This type of research can be carried out using qualitative and quantitative data collection methods including focus groups, observation, telephone and one-on-one interviews, and surveys. 

Online surveys created and administered via data collection platforms like Formplus make it easier for you to gather and process information during evaluation research. With Formplus multiple form sharing options, it is even easier for you to gather useful data from target markets.

Logo

Connect to Formplus, Get Started Now - It's Free!

  • characteristics of evaluation research
  • evaluation research methods
  • types of evaluation research
  • what is evaluation research
  • busayo.longe

Formplus

You may also like:

Assessment vs Evaluation: 11 Key Differences

This article will discuss what constitutes evaluations and assessments along with the key differences between these two research methods.

evaluation research design type

What is Pure or Basic Research? + [Examples & Method]

Simple guide on pure or basic research, its methods, characteristics, advantages, and examples in science, medicine, education and psychology

Recall Bias: Definition, Types, Examples & Mitigation

This article will discuss the impact of recall bias in studies and the best ways to avoid them during research.

Formal Assessment: Definition, Types Examples & Benefits

In this article, we will discuss different types and examples of formal evaluation, and show you how to use Formplus for online assessments.

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

Learn how to develop a ToC-based evaluation

http://A%20group%20of%20ideas%20discussing%20and%20exploring%20ideas

  • Evaluation Goals and Planning
  • Identify ToC-Based Questions
  • Choose an Evaluation Design
  • Select Measures

Choose an Appropriate Evaluation Design

Once you’ve identified your questions, you can select an appropriate evaluation design. Evaluation design refers to the overall approach to gathering information or data to answer specific research questions.

There is a spectrum of research design options—ranging from small-scale feasibility studies (sometimes called road tests) to larger-scale studies that use advanced scientific methodology. Each design option is suited to answer particular research questions.

The appropriate design for a specific project depends on what the project team hopes to learn from a particular implementation and evaluation cycle. Generally, as projects and programs move from small feasibility tests to later stage studies, methodological rigor increases.

Evaluation Design Studies Graphic

In other words, you’ll use more advanced tools and processes that allow you to be more confident in your results. Sample sizes get larger, the number of measurement tools increases, and assessments are often standardized and norm-referenced (designed to compare an individual’s score to a particular population).

In the IDEAS Framework, evaluation is an ongoing, iterative process. The idea is to investigate your ToC one domain at a time, beginning with program strategies and gradually expanding your focus until you’re ready to test the whole theory. Returning to the domino metaphor, we want to see if each domino in the chain is falling the way we expect it to.

Feasibility Study

Begin by asking:

“Are the program strategies feasible and acceptable?”

If you’re designing a program from scratch and implementing it for the first time, you’ll almost always need to begin by establishing feasibility and acceptability. However, suppose you’ve been implementing a program for some time, even without a formal evaluation. In that case, you may have already established feasibility and acceptability simply by demonstrating that the program is possible to implement and that participants feel it’s a good fit. If that’s the case, you might be able to skip over this step, so to speak, and turn your attention to the impact on targets, which we’ll go over in more detail below. On the other hand, for a long-standing program being adapted for a new context or population, you may need to revisit its feasibility and acceptability.

The appropriate evaluation design for answering questions about feasibility and acceptability is typically a feasibility study with a relatively small sample and a simple data collection process.

In this phase, you would collect data on program strategies, including:

  • Fidelity data (is the program being implemented as intended?)
  • Feedback from participants and program staff (through surveys, focus groups, and interviews)
  • Information about recruitment and retention
  • Participant demographics (to learn about who you’re serving and whether you’re serving who you intended to serve)

Through fast-cycle iteration, you can use what you learn from a feasibility study to improve the program strategies.

Pilot Study

Once you have evidence to suggest that your strategies are feasible and acceptable, you can take the next step and turn your attention to the impact on targets by asking:

“Is there evidence to suggest that the targets are changing in the anticipated direction?”

The appropriate evaluation design to begin to investigate the impact on targets is usually a pilot study. With a somewhat larger sample and more complex design, pilot studies often gather information from participants before and after they participate in the program. In this phase, you would collect data on program strategies and targets. Note that in each phase, the focus of your evaluation expands to include more domains of your ToC. In a pilot study, in addition to data on targets (your primary focus), you’ll want to gather information on strategies to continue looking at feasibility and acceptability.

In this phase, you would collect data on:

  • Program strategies

Later State Study

Once you’ve established feasibility and acceptability and have evidence to suggest your targets are changing in the expected direction, you’re ready to ask:

“Is there evidence to support our full theory of change?”

In other words, you’ll simultaneously ask:

  • Do our strategies continue to be feasible and acceptable?
  • Are the targets changing in the anticipated direction?
  • Are the outcomes changing in the anticipated direction?
  • Do the moderators help explain variability in impact?

The appropriate evaluation design for investigating your entire theory of change is a later-stage study, with a larger sample and more sophisticated study design, often including some kind of control or comparison group. In this phase, you would collect data on all domains of your ToC: strategies, targets, outcomes, and moderators

Common Questions

There may be cases where it does make sense to skip the earlier steps and move right to a later-stage study. But in most cases, investigating your ToC one domain at a time has several benefits. First, later-stage studies are typically costly in terms of time and money. By starting with a relatively small and low-cost feasibility study and working toward more rigorous evaluation, you can ensure that time and money will be well spent on a program that’s more likely to be effective. If you were to skip ahead to a later-stage study, you might be disappointed to find that your outcomes aren’t changing because of problems with feasibility and acceptability, or because your targets aren’t changing (or aren’t changing enough).

Many programs do gather data on outcomes without looking at strategies and targets. One challenge with that approach is that if you don’t see evidence of impact on program outcomes, you won’t be able to know why that’s the case. Was there a problem with feasibility, and the people implementing the program weren’t able to deliver the program as it was intended? Was there an issue with acceptability, and participants tended to skip sessions or drop out of the program early? Maybe the implementation went smoothly, but the strategies just weren’t effective at changing your targets, and that’s where the causal chain broke down. Unless you gather data on strategies and targets, it’s hard to know what went wrong and what you can do to improve the program’s effectiveness.  

evaluation research design type

Search form

evaluation research design type

  • Table of Contents
  • Troubleshooting Guide
  • A Model for Getting Started
  • Justice Action Toolkit
  • Best Change Processes
  • Databases of Best Practices
  • Online Courses
  • Ask an Advisor
  • Subscribe to eNewsletter
  • Community Stories
  • YouTube Channel
  • About the Tool Box
  • How to Use the Tool Box
  • Privacy Statement
  • Workstation/Check Box Sign-In
  • Online Training Courses
  • Capacity Building Training
  • Training Curriculum - Order Now
  • Community Check Box Evaluation System
  • Build Your Toolbox
  • Facilitation of Community Processes
  • Community Health Assessment and Planning
  • Section 4. Selecting an Appropriate Design for the Evaluation

Chapter 37 Sections

  • Section 1. Choosing Questions and Planning the Evaluation
  • Section 2. Information Gathering and Synthesis
  • Section 3. Data Collection: Designing an Observational System
  • Section 5. Collecting and Analyzing Data
  • Section 6. Gathering and Interpreting Ethnographic Information
  • Section 7. Collecting and Using Archival Data
  • Main Section

When you hear the word “experiment,” it may call up pictures of people in long white lab coats peering through microscopes. In reality, an experiment is just trying something out to see how or why or whether it works. It can be as simple as putting a different spice in your favorite dish, or as complex as developing and testing a comprehensive effort to improve child health outcomes in a city or state.

Academics and other researchers in public health and the social sciences conduct experiments to understand how environments affect behavior and outcomes, so their experiments usually involve people and aspects of the environment. A new community program or intervention is an experiment, too, one that a governmental or community organization engages in to find out a better way to address a community issue. It usually starts with an assumption about what will work – sometimes called a theory of change - but that assumption is no guarantee. Like any experiment, a program or intervention has to be evaluated to see whether it works and under what conditions.

In this section, we’ll look at some of the ways you might structure an evaluation to examine whether your program is working, and explore how to choose the one that best meets your needs. These arrangements for discovery are known as experimental (or evaluation) designs.

What do we mean by a design for the evaluation?

Every evaluation is essentially a research or discovery project. Your research may be about determining how effective your program or effort is overall, which parts of it are working well and which need adjusting, or whether some participants respond to certain methods or conditions differently from others. If your results are to be reliable, you have to give the evaluation a structure that will tell you what you want to know. That structure – the arrangement of discovery- is the evaluation’s design.

The design depends on what kinds of questions your evaluation is meant to answer.

Some of the most common evaluation (research) questions :

  • Does a particular program or intervention – whether an instructional or motivational program, improving access and opportunities, or a policy change – cause a particular change in participants’ or others’ behavior, in physical or social conditions, health or development outcomes, or other indicators of success?
  • What component(s) and element(s) of the program or intervention were responsible for the change?
  • What are the unintended effects of an intervention, and how did they influence the outcomes?
  • If you try a new method or activity, what happens?
  • Will the program that worked in another context, or the one that you read about in a professional journal, work in your community, or with your population, or with your issue?

If you want reliable answers to evaluation questions like these, you have to ask them in a way that will show you whether you actually got results, and whether those results were in fact due to your actions or the circumstances you created, or to other factors.  In other words, you have to create a design for your research – or evaluation – to give you clear answers to your questions.  We’ll discuss how to do that later in the section.

Why should you choose a design for your evaluation?

An evaluation may seem simple: if you can see progress toward your goal by the end of the evaluation period, you’re doing OK; if you can’t, you need to change. Unfortunately, it’s not that simple at all. First, how do you measure progress? Second, if there seems to be none, how do you know what you should change in order to increase your effectiveness? Third, if there is progress, how do you know it was caused by ( or contributed to) your program, and not by something else? And finally, even if you’re doing well, how will you decide what you could do better, and what elements of your program can be changed or eliminated without affecting success? A good design for your evaluation will help you answer important questions like these.

Some specific reasons for spending the time to design your evaluation carefully include:

  • So your evaluation will be reliable.  A good design will give you accurate results. If you design your evaluation well, you can trust it to tell you whether you’re actually having an effect, and why. Understanding your program to this extent makes it easier to achieve and maintain success.
  • So you can pinpoint areas you need to work on, as well as those that are successful . A good design can help you understand exactly where the strong and weak points of your program or intervention are, and give you clues as to how they can be further strengthened or changed for the greatest impact.
  • So your results are credible . If your evaluation is designed properly, others will take your results seriously. If a well-designed evaluation shows that your program is effective, you’re much more likely to be able to convince others to use similar methods, and to convince funders that your organization is a good investment.
  • So you can identify factors unrelated to what you’re doing that have an effect – positive or negative – on your results and on the lives of participants.  Participants’ histories, crucial local or national events, the passage of time, personal crises, and many other factors can influence the outcome of a program or intervention for better or worse. A good evaluation design can help you to identify these, and either correct for them if you can, or devise methods to deal with or incorporate them.
  • So you can identify unintended consequences (both positive and negative) and correct for them . A good design can show you all of what resulted from your program or intervention, not just what you expected. If you understand that your work has consequences that are negative as well as positive, or that it has more and/or different positive consequences than you anticipated, you can adjust accordingly.
  • So you’ll have a coherent plan and organizing structure for your evaluation . It will be much easier to conduct your evaluation if it has an appropriate design. You’ll know better what you need to do in order to get the information you need. Spending the time to choose and organize an evaluation design will pay off in the time you save later and in the quality of the information you get.

When should you choose a design for your evaluation?

Once you’ve determined your evaluation questions and gathered and organized all the information you can about the issue and ways to approach it, the next step is choosing a design for the evaluation. Ideally, this all takes place at the beginning of the process of putting together a program or intervention.  Your evaluation should be an integral part of your program , and its planning should therefore be an integral part of the program planning.

That’s the ideal; now let’s talk about reality. If you’re reading this, the chances are probably at least 50-50 that you’re connected to an underfunded government agency or to a community-based or non-governmental organization, and that you’re planning an evaluation of a program or intervention that’s been running for some time – months or even years.

Even if that’s true, the same guidelines apply. Choose your questions, gather information, choose a design, and then go on through the steps presented in this chapter. Evaluation is important enough that you won’t really be accomplishing anything by taking shortcuts in planning it. If your program has a cycle, then it probably makes sense to start your evaluation at the beginning of it – the beginning of a year or a program phase, where all participants are starting from the same place, or from the beginning of their involvement.

If that’s not possible – if your program has a rolling admissions policy, or provides a service whenever people need it – and participants are all at different points, that can sometimes present research problems. You may want to evaluate the program’s effects only with new participants, or with  another specific group. On the other hand, if your program operates without a particular beginning and end, you may get the best picture of its effectiveness by evaluating it as it is, starting whenever you’re ready. Whatever the case, your design should follow your information gathering and synthesis.

Who should be involved in choosing a design?

If you’re a regular Tool Box user, and particularly if you’ve been reading this chapter, you know that the Tool Box team generally recommends a participatory process – involving both research and community partners, including all those with an interest in or who are affected with the program in planning and implementation. Choosing a design for evaluation presents somewhat of an exception to this policy, since scientific or evaluation partners may have a much clearer understanding of what is required to conduct research, and of the factors that may interfere with it.

As we’ll see in the “how-to” part of this section, there are a number of considerations that have to be taken into account to gain accurate information that actually tells you what you want to know. Graduate students generally take courses to gain the knowledge they need to conduct research well, and even some veteran researchers have difficulty setting up an appropriate research design. That doesn’t mean a community group can’t learn to do it, but rather that the time they would have to spend on acquiring background knowledge might be too great. Thus, it makes the most sense to assign this task (or at the very least its coordination) to an individual or small group with experience in research and evaluation design. Such a person can not only help you choose among possible designs, but explain what each design entails, in time, resources, and necessary skills, so that you can judge its appropriateness and feasibility for your context.

How do you choose a design for your evaluation?

How do you go about deciding what kind of research design will best serve the purposes of your evaluation.

The answer to that question involves an examination of four areas:

  • The nature of the research questions you are trying to answer
  • The challenges to the research, and the ways they can be resolved or reduced
  • The kinds of research designs that are generally used, and what each design entails
  • The possibility of adapting a particular research design to your program or situation – what the structure of your program will support, what participants will consent to, and what your resources and time constraints are

We’ll begin this part of the section with an examination of the concerns research designs should address, go on to considering some common designs and how well they address those concerns, and end with some guidelines for choosing a design that will both be possible to implement and give you the information you need about your program.

Note : in this part of the section, we’re looking at evaluation as a research project.  As a result, we’ll use the term “research” in many places where we could just as easily have said, for the purposes of this section, “evaluation.”  Research is more general, and some users of this section may be more concerned with research in general than evaluation in particular.

Concerns research designs should address

The most important consideration in designing a research project – except perhaps for the value of the research itself – is whether your arrangement will provide you with valid information. If you don’t design and set up your research project properly, your findings won’t give you information that is accurate and likely to hold true with other situations.  In the case of an evaluation, that means that you won’t have a basis for adjusting what you do to strengthen and improve it.

Here’s a far-fetched example that illustrates this point.  If you took children’s heights at age six, then fed them large amounts of a specific food for three years – say carrots – and measured them again at the end of the period, you’d probably find that most of them were considerably taller at nine years than at six.  You might conclude that it was eating carrots that made the children taller because your research design gave you no basis for comparing these children’s growth to that of other children.

There are two kinds of threats to the validity of a piece of research . They are usually referred to as threats to internal validity (whether the intervention produced the change) and threats to external validity (whether the results are likely to apply to other people and situations).

Threats to internal validity

These are threats (or alternative explanations) to your claim that what you did caused changes in the direction you were aiming for. They are generally posed by factors operating at the same time as your program or intervention that might have an effect on the issue you’re trying to address. If you don’t have a way of separating their effects from those of your program, you can’t tell whether the observed changes were caused by your work, or by one or more of these other factors.They’re called threats to internal validity because they’re internal to the study – they have to do with whether your intervention – and not something else – accounted for the difference.

There are several kinds of threats to internal validity:

  • History.  Both participants’ personal histories – their backgrounds, cultures, experiences, education, etc. – and external events that occur during the research period – a disaster, an election, conflict in the community, a new law – may influence whether or not there’s any change in the outcomes you’re concerned with.
  • Maturation . This refers to the natural physical, psychological, and social processes that take place as time goes by. The growth of the carrot-eating children in the example above is a result of maturation, for instance, as might be a decline in risky behavior as someone passed from adolescence to adulthood, the development of arthritis in older people, or participants becoming tired during learning activities towards the end of the day.
  • The effects of testing or observation on participants . The mere fact of a program’s existence, or of their taking part in it, may affect participants’ behavior or attitudes, as may the experience of being tested, videotaped, or otherwise observed or measured.
  • Changes in measurement .  An instrument – a blood pressure cuff or a scale, for instance – can change over time, or different ones may not give the same results. By the same token, observers – those gathering information – may change their standards over time, or two or more observers may disagree on the observations.
  • Regression toward the mean . This is a statistical term that refers to the fact that, over time, the very high and very low scores on a measure (a test, for instance) often tend to drift back toward the average for the group. If you start a program with participants who, by definition, have very low or high levels of whatever you’re measuring – reading skill, exposure to domestic violence, particular behavior toward people of other races or backgrounds, etc. – their scores may end up closer to the average over the course of the evaluation period even without any program.
  • The selection of participants . Those who choose participants may slant their selection toward a particular group that is more or less likely to change than a cross-section of the population from which the group was selected. (A good example is that of employment training programs that get paid according to the number of people they place in jobs. They’re more likely to select participants who already have all or most of the skills they need to become employed, and neglect those who have fewer skills... and who therefore most need the service.) Selection can play a part when participants themselves choose to enroll in a program (self-selection), since those who decide to participate are probably already motivated to make changes. It may also be a matter of chance: members of a particular group may, simply by coincidence, share a characteristic that will set their results on your measures apart from the norm of the population you’re drawing from.
Selection can also be a problem when two groups being compared are chosen by different standards.  We’ll discuss this further below when we deal with control or comparison groups.
  • The loss of data or participants . If too little information is collected about participants, or if too many drop out well before the research period is over, your results may be based on too little data to be reliable. This also arises when two groups are being compared. If their losses of data or participants are significantly different, comparing them may no longer give you valid information.
  • The nature of change . Often, change isn’t steady and even. It can involve leaps forward and leaps backward before it gets to a stable place – if it ever does. (Think of looking at the performance of a sports team halfway through the season. No matter what its record is at that moment, you won’t know how well it will finish until the season is over.)  Your measurements may take place over too short a period or come at the wrong times to track the true course of the change or lack of change that’s occurring.
  • A combination of the effects of two or more of these . Two or more of these factors may combine to produce or prevent the changes your program aims to produce. A language-study curriculum that is tested only on students who already speak two or more languages runs into problems with both participants’ history – all the students have experience learning languages other than their own – and selection – you’ve chosen students who are very likely to be successful at language learning.

Threats to external validity

These are factors that affect your ability to apply your research results in other circumstances – to increase the chances that your program and its results can be reproduced elsewhere or with other populations. If, for instance, you offer parenting classes only to single mothers, you can’t assume, no matter how successful they appear to be, that the same classes will work as well with men.

Threats to external validity (or generalizability) may be the result of the interactions of other factors with the program or intervention itself, or may be due to particular conditions of the program.

Some examples :

  • Interaction of testing or data collection and the program or intervention .  An initial test or observation might change the way participants react to the program, making a difference in final outcomes.  Since you can’t assume that another group will have the same reaction or achieve similar final outcomes as a result, external validity or generalizability of the findings becomes questionable.
  • Interaction of selection procedures and the program or intervention .  If the participants selected or self-selected are particularly sensitive to the methods or purpose of the program, it can’t be assumed to be effective with participants who are less sensitive or ready for the program.
Parents who’ve been threatened by the government with the loss of their children due to child abuse may be more receptive to learning techniques for improving their parenting, for example, than parents who are under no such pressure.
  • The effects of the research arrangements . Participants may change behavior as a result of being observed, or may react to particular individuals in ways they would be unlikely to react to others.
A classic example here is that of a famous baboon researcher, Irven DeVore, who after years of observing troupes of baboons, realized that they behaved differently when he was there than when he wasn’t.  Although his intent was to observe their natural behavior, his presence itself constituted an intervention, making the behavior of the baboons he was observing different from that of a troupe that was not observed.
  • The interference of multiple treatments or interventions . The effects of a particular program can be changed when participants are exposed to it beforehand in a different context, or are exposed to another before or at the same time as the one being evaluated. This may occur when participants are receiving services from different sources, or being treated simultaneously for two or more health issues or other conditions.
Given the range of community programs that exist, there are many possibilities here. Adults might be members of a high school completion class while participating in a substance use recovery program.  A diabetic might be treated with a new drug while at the same time participating in a nutrition and physical activity program to deal with obesity.  Sometimes, the sequence of treatments or services in a single program can have the same effect, with one influencing how participants respond to those that follow, even though each treatment is being evaluated separately.

Common research designs

Many books have been written on the subject of research design. While they contain too much material to summarize here, there are some basic designs that we can introduce. The important differences among them come down to how many measurements you’ll take, when you will take them, and how many groups of what kind will be involved.

Program evaluations generally look for the answers to three basic questions:

  • Was there any change – in participants’ or others’ behavior, in physical or social conditions, or in outcomes or indicators of success– during the evaluation period?
  • Was whatever change took place – or the lack of change – caused by your program, intervention, or effort?
  • What, in your program or outside it, actually caused or prevented the change?

As we’ve discussed, changes and improvement in outcomes may have been caused by some or all of your intervention, or by external factors. Participants’ or the community’s history might have been crucial. Participants may have changed as a result of simply getting older and more mature or more experienced in the world – often an issue when working with children or adolescents. Environmental factors – events, policy change, or conditions in participants’ lives – can often facilitate or prevent change as well. Understanding exactly where the change came from or where the barriers to change reside, gives you the opportunity to adjust your program to take advantage of or combat those factors.

If all you had to do was to measure whatever behavior or condition you wanted to influence at the beginning and end of the evaluation, choosing a design would be an easy task. Unfortunately, it’s not quite that simple – there are those nasty threats to validity to worry about. We have to keep them in mind as we look at some common research designs.

Research designs, in general, differ in one or both of two ways: the number and timing of the measurements they use; and whether they look at single or multiple groups.  We’ll look at single-group designs first, then go on to multiple groups.

Before we go any further, it is helpful to have an understanding of some basic research terms that we will be using in our discussion.

Researchers usually refer to your first measurement(s) or observation(s) – the ones you take before you start your program or intervention – as a baseline measure or baseline observation , because it establishes a baseline – a known level – to which you compare future measurements or observations. Some other important research terms: Independent variables are the program itself and/or the methods or conditions that the researcher – in this case, you – wants to evaluate. They’re called variables because they can change – you might have chosen (and might still choose) other methods.  They’re independent because their existence doesn’t depend on whether something else occurs: you’ve chosen them, and they’ll stay consistent throughout the evaluation period. Dependent variables are whatever may or may not change as a result of the presence of the independent variable(s).  In an evaluation, your program or intervention is the independent variable.  (If you’re evaluating a number of different methods or conditions, each of them is an independent variable.)  Whatever you’re trying to change is the dependent variable.  (If you’re aiming at change in more than one behavior or outcome, each type of change is a different dependent variable.)  They’re called dependent variables because changes in them depend on the action of the independent variable...or something else. Measures are just that – measurements of the dependent variables.  They usually refer to procedures that have results that can be translated into numbers, and may take the form of community assessments, observations, surveys, interviews, or tests. They may also count incidents or measure the amount of the dependent variable (number or percentage of children who are overweight or obese, violent crimes per 100,000 population, etc.) Observations might involve measurement, or they might simply record what happens in specific circumstances: the ways in which people use a space, the kinds of interactions children have in a classroom, the character of the interactions during an assessment.  For convenience, researchers often use “observation” to refer to any kind of measurement and we’ll use the same convention here.

Pre- and post- single-group design

The simplest design is also probably the least accurate and desirable: the pre (before) and post (after) measurement or observation. This consists of simply measuring whatever you’re concerned with in one group – the infant mortality rate, unemployment, water pollution – applying your intervention to that group or community, and then observing again. This type of design assumes that a difference in the two observations will tell you whether there was a change over the period between them, and also assumes that any positive change was caused by the intervention.

In most cases, a pre-post design won’t tell you much, because it doesn’t really address any of the research concerns we’ve discussed. It doesn’t account for the influence of other factors on the dependent variable, and it doesn’t tell you anything about trends of change or the progress of change during the evaluation period – only where participants were at the beginning and where they were at the end. It can help you determine whether certain kinds of things have happened – whether there’s been a reduction in the level of educational attainment or the amount of environmental pollution in a river, for instance – but it won’t tell you why. Despite its limitations, taking measures before and after the intervention is far better than no measures.

Even looking at something as seemingly simple to measure pre and post as blood pressure (in a heart disease prevention program) is questionable.  Blood pressure may be lower at the final observation than at the initial one, but that tells you nothing about how much it may have gone up and down in between. If the readings were taken by different people, the change may be due in part to differences in their skill, or to how relaxed each was able to make participants feel.  Familiarity with the program could also have reduced most participants’ blood pressure from the pre- to the post-measurement, as could some other factor that wasn’t specifically part of the independent variable being evaluated.

Interrupted time series design with a single group (simple time series)

An interrupted time series used repeated measures before and after delayed implementation of the independent variable (e.g., the program, etc.) to help rule out other explanations. This relatively strong design – with comparisons within the group – addresses most threats to internal validity.

The simplest form of this design is to take repeated observations, implement the program or intervention, and observe a number of times during the evaluation period, including at the end. This method is a great improvement over the pre- and post- design in that it tracks the trend of change, and can therefore, help see whether it was actually the independent variable that caused any change. It can also help to identify the influence of external factors such as when the dependent variable shows significant change before the intervention is implemented.

Another possibility for this design is to implement more than one independent variable, either by trying two or more, one after another (often with a break in between), or by adding each to what came before.This gives a picture not only of the progress of change, but can show very clearly what causes change. That gives an evaluator the opportunity not only to adjust the program, but to drop elements that have no effect.

There are a number of variations on the interrupted time series theme, including varying the observation times; implementing the independent variable repeatedly; and implementing one independent variable, then another, then both together to evaluate their interaction.

In any variety of interrupted time series design, it’s important to know what you’re looking for.  In an evaluation of a traffic fatality control program in the United Kingdom that focused on reducing drunk driving, monthly measurements seemed to show only a small decline in fatal accidents. When the statistics for weekends, when there were most likely to be drunk drivers on the road, were separated out, however, they showed that the weekend fatality rate dropped sharply with the implementation of the program, and stayed low thereafter. Had the researchers not realized that that might be the case, the program might have been stopped, and the weekend accident rate would not have been reduced.

Interrupted time series design with multiple groups (multiple baseline/time series)

This has the same possibilities as the single time series design, with the added wrinkle of using repeated measures with one or more other groups (so-called multiple baselines). By using multiple baselines (groups), the external validity or generality of the findings is enhanced – we can see if the effects occur with different groups or under different conditions.

This multiple time series design – typically staggered introduction of the intervention with different groups or communities – gives the researcher more opportunities :

  • You can try a method or program with two or more groups from the same
  • You can try a particular method or program with different populations, to see if it’s effective with others
  • You can vary the timing or intensity of an intervention with different groups
  • You can test different interventions at the same time
  • You can try the same two or more interventions with each of two groups, but reverse their order to see if sequencing it makes any difference

Again, there are more variations possible here.

Control group design

A common way to evaluate the effects of an independent variable is to use a control group.  This group is usually similar to the participant group, but either receives no intervention at all, or receives a different intervention with the same goal as that offered to the participant group. A control group design is usually the most difficult to set up – you have to find appropriate groups, observe both on a regular basis, etc. – but is generally considered to be the most reliable.

The term control group comes from the attempt to control outside and other influences on the dependent variable. If everything about the two groups except their exposure to the program being evaluated averages out to be the same, then any differences in results must be due to that exposure. The term comparison group is more modest; it typically offers a community watched for similar levels of the problem/goal and relevant characteristics of the community or population (e.g., education, poverty).

The gold standard here is the randomized control group, one that is selected totally at random, either from among the population the program or intervention is concerned with – those at risk for heart disease, unemployed males, young parents – or, if appropriate, the population at large. A random group eliminates the problems of selection we discussed above, as well as issues that might arise from differences in culture, race, or other factors.

A control group that’s carefully chosen will have the same characteristics as the intervention group (the focus of the evaluation). If, for instance, the two groups come from the same pool of people with a particular health condition, and are chosen at random either to be treated in the conventional way or to try a new approach, it can be assumed that – since they were chosen at random from the same population – both groups will be subject, on average, to the same outside influences, and will have the same diversity of backgrounds. Thus, if there is a significant difference in their results, it is fairly safe to assume that the difference comes from the independent variable – the type of intervention, and not something else.

The difficulty for governmental and community-based organizations is to find or create a randomized control group. If the program has a long waiting list, it may be able to create a control by selecting those to first receive the intervention at random. That in itself creates problems, in that people often drop off waiting lists out of frustration or other reasons. Being included in the evaluation may help to keep them, on the other hand, by giving them a closer connection to the program and making them feel valued.

An ESOL (English as a Second or Other Language) program in Boston with a three-year waiting list addressed the problem by offering those on the waiting list a different option.  They received videotapes to use at home, along with biweekly tutoring by advanced students and graduates of the program.  Thus, they became a comparison group with a somewhat different intervention that, as expected, was less effective than the program itself, but was more effective than none, and kept them on the waiting list. It also gave them a head start once they got into the classes, with many starting at a middle rather than at a beginning level.

When there’s no waiting list or similar group to draw from, community organizations often end up using a comparison group - one composed of participants in another place or program and whose members’ characteristics, backgrounds, and experience may or may not be similar to those of the participant group. That circumstance can raise some of the same problems related to selection seen when there is no control group. If the only potential comparisons involve very different groups, it may be better to use a design, such as an interrupted time series design that doesn’t involve a control group at all, where the comparison is within (not between) groups.

Groups may look similar, but may differ in an important way.  Two groups of participants in a substance use intervention program, for instance, may have similar histories, but if one program is voluntary and the other is not, the results aren’t likely to be comparable.  One group will probably be more motivated and less resentful than the other, and composed of people who already know they have a potential problem. The motivation and determination of their participants, rather than the effectiveness of the two programs, may influence the amount of change observed. This issue may come up in a single-group design as well.  A program that may, on average, seem to be relatively ineffective may prove, on close inspection, to be quite effective with certain participants – those of a specific educational background, for instance, or with particular life experiences. Looking at results with this in mind can be an important part of an evaluation, and give you valuable and usable information.

Choosing a design

This section’s discussion of research designs is in no way complete.  It’s meant to provide an introduction to what’s available.  There are literally thousands of books and articles written on this topic, and you’ll probably want more information.  There are a number of statistical methods that can compensate for less-than-perfect designs, for instance: few community groups have the resources to assemble a randomized control group, or to implement two or more similar programs to see which works better. Given this, the material that follows is meant only as broad guidelines.  We don’t attempt to be specific about what kind of design you need in what circumstances, but only try to suggest some things to think about in different situations. Help is available from a number of directions: Much can be found on the Internet (see the “Resources” part of this section for a few sites); there are numerous books and articles (the classic text on research design is also cited in “Resources”); and universities are a great resource, both through their libraries and through faculty and graduate students who might be interested in what you’re doing, and be willing to help with your evaluation.  Use any and all of these to find what will work best for you.  Funders may also be willing either to provide technical assistance for evaluations, or to include money in your grant or contract specifically to pay for a professional evaluation.

Your goal in evaluating your effort is to get the most reliable and accurate information possible, given your evaluation questions, the nature of your program, what your participants will consent to, your time constraints, and your resources. The important thing here is not to set up a perfect research study, but to design your evaluation to get real information, and to be able to separate the effects of external factors from the effects of your program.  So how do you go about choosing the best design that will be workable for you? The steps are in the first sentence of this paragraph.

Consider your evaluation questions

What do you need to know? If the intent of your evaluation is simply to see whether something specific happened, it’s possible that a simple pre-post design will do. If, as is more likely, you want to know both whether change has occurred, and if it has, whether it has in fact been caused by your program, you’ll need a design that helps to screen out the effects of external influences and participants’ backgrounds.

For many community programs, a control or comparison group is helpful, but not absolutely necessary. Think carefully about the frequency and timing of your observations and the amount of different kinds of information you can collect. With repeated measures, you can get you quite an accurate picture of the effectiveness of your program from a simple time series design. Single group interrupted time series designs, which are often the most workable for small organizations, can give you a very reliable evaluation if they’re structured well. That generally means obtaining multiple baseline observations (enough to set a trend) before the program begins; observing often and documenting your observations carefully (often with both quantitative – expressed in numbers – and qualitative – expressed in records of incidents and of what participants did and said – data); and  including during intervention and  follow-up observations to see whether effects are maintained.

In many of these situations, a multiple-group interrupted time series design is quite possible, but of a “naturally-occurring” experiment. If your program includes two or more groups or classes, each working toward the same goals, you have the opportunity to stagger the introduction of the intervention across the groups. This comparison with (and across) groups allows you to screen out such factors as the facilitator’s ability and community influences (assuming all participants come from the same general population.)  You could also try different methods or time sequences, to see which works best.

In some cases, the real question is not whether your method or program works, but whether it works better than other methods or programs you could be using. Teaching a skill – for instance, employment training, parenting, diabetes management, conflict resolution – often falls into this category.  Here, you need a comparison of some sort.  While evaluations of some of these – medical treatment, for example – may require a control group, others can be compared to data from the field, to published results of other programs, or, by using community-level indicators , from measurements in other communities.

There are community programs where the bottom line is very simple.  If you’re working to control water pollution, your main concern may be the amount of pollution coming out of effluent pipes, or the amount found in the river.  Your only measure of success may be keeping pollution below a certain level, which means that regular monitoring of water quality is the only evaluation you need.  There are probably relatively few community programs where evaluation is this easy – you might, for instance, want to know which of your pollution-control activities is most effective – but if yours is one, a simple design may be all you need.

Consider the nature of your program

What does your program look like, and what is it meant to do?  Does it work with participants in groups, or individually, for instance? Does it run in cycles – classes or workshops that begin and end on certain dates, or a time-limited program that participants go through only once? Or can participants enter whenever they are ready and stay until they reach their goals?  How much of the work of the program is dependent on staff, and how much do participants do on their own?  How important is the program context – the way staff, participants, and others treat one another, the general philosophy of the program, the physical setting, the organizational culture?  (The culture of an organization consists of accepted and traditional ways of doing things, patterns of relationships, how people dress, how they act toward and communicate with one another, etc.)

  • If you work with participants in groups, a multiple-group design – either interrupted time series or control group – might be easier to use.  If you work with participants individually, perhaps a simple time series or a single group design would be appropriate.
  • If your program is time-limited – either one-time-only, or with sessions that follow one another – you’ll want a design that fits into the schedule, and that can give you reliable results in the time you have. One possibility is to use a multiple group design, with groups following one another session by session. The program for each group might be adjusted, based on the results for the group before, so that you could test new ideas each session.
  • If your program has no clear beginning and end, you’re more likely to need a single group design that considers participants individually, or by the level of their baseline performance. You may also have to compensate for the fact that participants may be entering the program at different levels, or with different goals.
A proverb says that you never step in the same river twice, because the water that flows past a fixed point is always changing. The same is true of most community programs. Someone coming into a program at a particular time may have a totally different experience than a similar person entering at a different time, even though the operation of the program is the same for both. A particular participant may encourage everyone around her, and create an overwhelmingly positive atmosphere different from that experienced by participants who enter the program after she has left, for example. It’s very difficult to control for this kind of difference over time, but it’s important to be aware that it can, and often does, exist, and may affect the results of a program evaluation.
  • If the organizational or program context and culture are important, then you’ll probably want to compare your results with participants to those in a control group in a similar situation where those factors are different, or are ignored.

There is, of course, a huge range of possibilities here: nearly any design can be adapted to nearly any situation in the right circumstances. This material is meant only to give you a sense of how to start thinking about the issue of design for an evaluation.

Consider what your participants (and staff) will consent to

In addition to the effect that it might have on the results of your evaluation, you might find that a lot of observation can raise protests from participants who feel their privacy is threatened, or from already-overworked staff members who see adding evaluation to their job as just another burden. You may be able to overcome these obstacles, or you may have to compromise – fewer or different kinds of observations, a less intrusive design – in order to be able to conduct the evaluation at all.

There are other reasons that participants might object to observation, or at least intense observation. Potential for embarrassment, a desire for secrecy (to keep their participation in the program from family members or others), even self-protection (in the case of domestic violence, for instance) can contribute to unwillingness to be a participant in the evaluation. Staff members may have some of the same concerns.

There are ways to deal with these issues, but there’s no guarantee that they’ll work. One is to inform participants at the beginning about exactly what you’re hoping to do, listen to their objections, and meet with them (more than once, if necessary) to come up with a satisfactory approach. Staff members are less likely to complain if they’re involved in planning the evaluation, and thus have some say over the frequency and nature of observations. The same is true for participants.Treating everyone’s concerns seriously and including them in the planning process can go a long way toward assuring cooperation.

Consider your time constraints

As we mentioned above, the important thing here is to choose a design that will give you reasonably reliable information. In general, your design doesn’t have to be perfect, but it does have to be good enough to give you a reasonably good indication that changes are actually taking place, and that they are the result of your program. Just how precise you can be is at least partially controlled by the limits on your time placed by funding, program considerations, and other factors.

Time constraints may also be imposed. Some of the most common:

  • Program structure . An evaluation may make the most sense if it’s conducted to correspond with a regular program cycle.
  • Funding .  If you are funded only for a pilot project, for example, you’ll have to conduct your evaluation within the time span of the funding, and soon enough to show that your program is successful enough to be refunded. A time schedule for evaluation may be part of your grant or contract, especially if the funder is paying for it.
  • Participants’ schedules . A rural education program may need to stop for several months a year to allow participants to plant and tend crops, for instance.
  • The seriousness of the issue   A delay in understanding whether a violence prevention program is effective may cost lives.
  • The availability of professional evaluators . Perhaps the evaluation team can only work during a particular time frame.

Consider your resources

Strategic planners often advise that groups and organizations consider resources last: otherwise they’ll reject many good ideas because they’re too expensive or difficult, rather than trying to find ways to make them work with the resources at hand. Resources include not only money, but also space, materials and equipment, personnel, and skills and expertise. Often, one of these can substitute for another: a staff person with experience in research can take the place of money that would be used to pay a consultant, for example. A partnership with a nearby university could get you not only expertise, but perhaps needed equipment as well.

The lesson here is to begin by determining the best design possible for your purposes, without regard to resources. You may have to settle for somewhat less, but if you start by aiming for what you want, you’re likely to get a lot closer to it than if you assume you can’t possibly get it.

The way you design your evaluation research will have a lot to do with how accurate and reliable your results are, and how well you can use them to improve your program or intervention. The design should be one that best addresses key threats to internal validity (whether the intervention caused the change) and external validity (the ability to generalize your results to other situations, communities, and populations).

Common research designs – such as interrupted time series or control group designs– can be adapted to various situations, and combined in various ways to create a design that is both appropriate and feasible for your program. It may be necessary to seek help from a consultant, a university partner, or simply someone with research experience to help identify a design that fits your needs.

A good design will address your evaluation questions, and take into consideration the nature of your program, what program participants and staff will agree to, your time constraints, and the resources you have available for evaluation. It often makes sense to consider resources last, so that you won’t reject good ideas because they seem too expensive or difficult. Once you’ve chosen a design, you can often find a way around a lack of resources to make it a reality.

Online Resources 

Bridging the Gap: The role of monitoring and evaluation in Evidence-based policy-making  is a document provided by UNICEF that aims to improve relevance, efficiency and effectiveness of policy reforms by enhancing the use of monitoring and evaluation.

Effective Nonprofit Evaluation is a briefing paper written for TCC Group. Pages 7 and 8 give specific information related to designing an effective evaluation.

From the Introduction to Program Evaluation for Public Health Programs, this resource from CDC on Focus the Evaluation Design offers suggestions for tailoring questions to evaluate the efficiency, cost-effectiveness, and attribution of a program. This guide offers a variety of program evaluation-related information.

Chapter 3 of the GAO Designing Evaluations handbook focuses on the process of selecting an evaluation design.  This handbook provided by the U.S. Government Accountability Office provides information on various topics related to program evaluation.

Interrupted Time Series Quasi-Experiments  is an essay by Gene Glass, from Arizona State University, on time series experiments, distinction between experimental and quasi-experimental approaches, etc.

The Magenta Book - Guidance for Evaluation  provides an in-depth look at evaluation. Part A is designed for policy makers. It sets out what evaluation is, and what the benefits of good evaluation are. It explains in simple terms the requirements for good evaluation, and some straightforward steps that policy makers can take to make a good evaluation of their intervention more feasible. Part B is more technical, and is aimed at analysts and interested policy makers. It discusses in more detail the key steps to follow when planning and undertaking an evaluation and how to answer evaluation research questions using different evaluation research designs. It also discusses approaches to the interpretation and assimilation of evaluation evidence.

Practical Challenges of Rigorous Impact Evaluation in International Governance NGOs: Experiences and Lessons from The Asia Foundation explores program evaluation at the international level.

Research Design Issues for Evaluating Complex Multicomponent Interventions in Neighborhoods and Communities  is from the Promise Neighborhoods Research Consortium. The article discusses challenges and offers approaches to evaluation that are likely to result in adoption and maintenance of effective and replicable multicomponent interventions in high-poverty neighborhoods.

Research Methods  is a text by Dr. Christopher L. Heffner that focuses on the basics of research design and the critical analysis of professional research in the social sciences from developing a theory, selecting subjects, and testing subjects to performing statistical analysis and writing the research report.

Research Methods Knowledge Base  is a comprehensive web-based textbook that provides useful, comprehensive, relatively simple explanations of how statistics work and how and when specific statistical operations are used and help to interpret data.

A Second Look at Research in Natural Settings  is a web-version of a PowerPoint presentation by Graziano and Raulin.

The W.K. Kellogg Foundation Evaluation Handbook provides a framework for thinking about evaluation as a relevant and useful program tool. Chapters 5, 6, and 7 under the “Implementation” heading provide detailed information on determining data collection methods, collecting data, and analyzing and interpreting data.

Print Resources

Campbell,  D., & Stanley. J. (1963, 1966).  Experimental and Quasi-Experimental Designs for Research .  Chicago: Rand McNally.

Fawcett, S., et. al. (2008). Community Toolbox Curriculum Module 12: Evaluating the initiative . Work Group for Community Health and Development. University of Kansas. Community Tool Box Curriculum.

Roscoe,  J.   (1969). Fundamental Research Statistics for the Behavioral Sciences . New York, NY: Holt, R., & Winston.

Shadish, W,. Cook, T., & Campbell, D. (2002).   Experimental and Quasi-experimental Designs for Generalized Causal Inference . Houghton Mifflin College Div.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Research Design | Types, Guide & Examples

What Is a Research Design | Types, Guide & Examples

Published on June 7, 2021 by Shona McCombes . Revised on November 20, 2023 by Pritha Bhandari.

A research design is a strategy for answering your   research question  using empirical data. Creating a research design means making decisions about:

  • Your overall research objectives and approach
  • Whether you’ll rely on primary research or secondary research
  • Your sampling methods or criteria for selecting subjects
  • Your data collection methods
  • The procedures you’ll follow to collect data
  • Your data analysis methods

A well-planned research design helps ensure that your methods match your research objectives and that you use the right kind of analysis for your data.

Table of contents

Step 1: consider your aims and approach, step 2: choose a type of research design, step 3: identify your population and sampling method, step 4: choose your data collection methods, step 5: plan your data collection procedures, step 6: decide on your data analysis strategies, other interesting articles, frequently asked questions about research design.

  • Introduction

Before you can start designing your research, you should already have a clear idea of the research question you want to investigate.

There are many different ways you could go about answering this question. Your research design choices should be driven by your aims and priorities—start by thinking carefully about what you want to achieve.

The first choice you need to make is whether you’ll take a qualitative or quantitative approach.

Qualitative research designs tend to be more flexible and inductive , allowing you to adjust your approach based on what you find throughout the research process.

Quantitative research designs tend to be more fixed and deductive , with variables and hypotheses clearly defined in advance of data collection.

It’s also possible to use a mixed-methods design that integrates aspects of both approaches. By combining qualitative and quantitative insights, you can gain a more complete picture of the problem you’re studying and strengthen the credibility of your conclusions.

Practical and ethical considerations when designing research

As well as scientific considerations, you need to think practically when designing your research. If your research involves people or animals, you also need to consider research ethics .

  • How much time do you have to collect data and write up the research?
  • Will you be able to gain access to the data you need (e.g., by travelling to a specific location or contacting specific people)?
  • Do you have the necessary research skills (e.g., statistical analysis or interview techniques)?
  • Will you need ethical approval ?

At each stage of the research design process, make sure that your choices are practically feasible.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

evaluation research design type

Within both qualitative and quantitative approaches, there are several types of research design to choose from. Each type provides a framework for the overall shape of your research.

Types of quantitative research designs

Quantitative designs can be split into four main types.

  • Experimental and   quasi-experimental designs allow you to test cause-and-effect relationships
  • Descriptive and correlational designs allow you to measure variables and describe relationships between them.

With descriptive and correlational designs, you can get a clear picture of characteristics, trends and relationships as they exist in the real world. However, you can’t draw conclusions about cause and effect (because correlation doesn’t imply causation ).

Experiments are the strongest way to test cause-and-effect relationships without the risk of other variables influencing the results. However, their controlled conditions may not always reflect how things work in the real world. They’re often also more difficult and expensive to implement.

Types of qualitative research designs

Qualitative designs are less strictly defined. This approach is about gaining a rich, detailed understanding of a specific context or phenomenon, and you can often be more creative and flexible in designing your research.

The table below shows some common types of qualitative design. They often have similar approaches in terms of data collection, but focus on different aspects when analyzing the data.

Your research design should clearly define who or what your research will focus on, and how you’ll go about choosing your participants or subjects.

In research, a population is the entire group that you want to draw conclusions about, while a sample is the smaller group of individuals you’ll actually collect data from.

Defining the population

A population can be made up of anything you want to study—plants, animals, organizations, texts, countries, etc. In the social sciences, it most often refers to a group of people.

For example, will you focus on people from a specific demographic, region or background? Are you interested in people with a certain job or medical condition, or users of a particular product?

The more precisely you define your population, the easier it will be to gather a representative sample.

  • Sampling methods

Even with a narrowly defined population, it’s rarely possible to collect data from every individual. Instead, you’ll collect data from a sample.

To select a sample, there are two main approaches: probability sampling and non-probability sampling . The sampling method you use affects how confidently you can generalize your results to the population as a whole.

Probability sampling is the most statistically valid option, but it’s often difficult to achieve unless you’re dealing with a very small and accessible population.

For practical reasons, many studies use non-probability sampling, but it’s important to be aware of the limitations and carefully consider potential biases. You should always make an effort to gather a sample that’s as representative as possible of the population.

Case selection in qualitative research

In some types of qualitative designs, sampling may not be relevant.

For example, in an ethnography or a case study , your aim is to deeply understand a specific context, not to generalize to a population. Instead of sampling, you may simply aim to collect as much data as possible about the context you are studying.

In these types of design, you still have to carefully consider your choice of case or community. You should have a clear rationale for why this particular case is suitable for answering your research question .

For example, you might choose a case study that reveals an unusual or neglected aspect of your research problem, or you might choose several very similar or very different cases in order to compare them.

Data collection methods are ways of directly measuring variables and gathering information. They allow you to gain first-hand knowledge and original insights into your research problem.

You can choose just one data collection method, or use several methods in the same study.

Survey methods

Surveys allow you to collect data about opinions, behaviors, experiences, and characteristics by asking people directly. There are two main survey methods to choose from: questionnaires and interviews .

Observation methods

Observational studies allow you to collect data unobtrusively, observing characteristics, behaviors or social interactions without relying on self-reporting.

Observations may be conducted in real time, taking notes as you observe, or you might make audiovisual recordings for later analysis. They can be qualitative or quantitative.

Other methods of data collection

There are many other ways you might collect data depending on your field and topic.

If you’re not sure which methods will work best for your research design, try reading some papers in your field to see what kinds of data collection methods they used.

Secondary data

If you don’t have the time or resources to collect data from the population you’re interested in, you can also choose to use secondary data that other researchers already collected—for example, datasets from government surveys or previous studies on your topic.

With this raw data, you can do your own analysis to answer new research questions that weren’t addressed by the original study.

Using secondary data can expand the scope of your research, as you may be able to access much larger and more varied samples than you could collect yourself.

However, it also means you don’t have any control over which variables to measure or how to measure them, so the conclusions you can draw may be limited.

Prevent plagiarism. Run a free check.

As well as deciding on your methods, you need to plan exactly how you’ll use these methods to collect data that’s consistent, accurate, and unbiased.

Planning systematic procedures is especially important in quantitative research, where you need to precisely define your variables and ensure your measurements are high in reliability and validity.

Operationalization

Some variables, like height or age, are easily measured. But often you’ll be dealing with more abstract concepts, like satisfaction, anxiety, or competence. Operationalization means turning these fuzzy ideas into measurable indicators.

If you’re using observations , which events or actions will you count?

If you’re using surveys , which questions will you ask and what range of responses will be offered?

You may also choose to use or adapt existing materials designed to measure the concept you’re interested in—for example, questionnaires or inventories whose reliability and validity has already been established.

Reliability and validity

Reliability means your results can be consistently reproduced, while validity means that you’re actually measuring the concept you’re interested in.

For valid and reliable results, your measurement materials should be thoroughly researched and carefully designed. Plan your procedures to make sure you carry out the same steps in the same way for each participant.

If you’re developing a new questionnaire or other instrument to measure a specific concept, running a pilot study allows you to check its validity and reliability in advance.

Sampling procedures

As well as choosing an appropriate sampling method , you need a concrete plan for how you’ll actually contact and recruit your selected sample.

That means making decisions about things like:

  • How many participants do you need for an adequate sample size?
  • What inclusion and exclusion criteria will you use to identify eligible participants?
  • How will you contact your sample—by mail, online, by phone, or in person?

If you’re using a probability sampling method , it’s important that everyone who is randomly selected actually participates in the study. How will you ensure a high response rate?

If you’re using a non-probability method , how will you avoid research bias and ensure a representative sample?

Data management

It’s also important to create a data management plan for organizing and storing your data.

Will you need to transcribe interviews or perform data entry for observations? You should anonymize and safeguard any sensitive data, and make sure it’s backed up regularly.

Keeping your data well-organized will save time when it comes to analyzing it. It can also help other researchers validate and add to your findings (high replicability ).

On its own, raw data can’t answer your research question. The last step of designing your research is planning how you’ll analyze the data.

Quantitative data analysis

In quantitative research, you’ll most likely use some form of statistical analysis . With statistics, you can summarize your sample data, make estimates, and test hypotheses.

Using descriptive statistics , you can summarize your sample data in terms of:

  • The distribution of the data (e.g., the frequency of each score on a test)
  • The central tendency of the data (e.g., the mean to describe the average score)
  • The variability of the data (e.g., the standard deviation to describe how spread out the scores are)

The specific calculations you can do depend on the level of measurement of your variables.

Using inferential statistics , you can:

  • Make estimates about the population based on your sample data.
  • Test hypotheses about a relationship between variables.

Regression and correlation tests look for associations between two or more variables, while comparison tests (such as t tests and ANOVAs ) look for differences in the outcomes of different groups.

Your choice of statistical test depends on various aspects of your research design, including the types of variables you’re dealing with and the distribution of your data.

Qualitative data analysis

In qualitative research, your data will usually be very dense with information and ideas. Instead of summing it up in numbers, you’ll need to comb through the data in detail, interpret its meanings, identify patterns, and extract the parts that are most relevant to your research question.

Two of the most common approaches to doing this are thematic analysis and discourse analysis .

There are many other ways of analyzing qualitative data depending on the aims of your research. To get a sense of potential approaches, try reading some qualitative research papers in your field.

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

A research design is a strategy for answering your   research question . It defines your overall approach and determines how you will collect and analyze data.

A well-planned research design helps ensure that your methods match your research aims, that you collect high-quality data, and that you use the right kind of analysis to answer your questions, utilizing credible sources . This allows you to draw valid , trustworthy conclusions.

Quantitative research designs can be divided into two main categories:

  • Correlational and descriptive designs are used to investigate characteristics, averages, trends, and associations between variables.
  • Experimental and quasi-experimental designs are used to test causal relationships .

Qualitative research designs tend to be more flexible. Common types of qualitative design include case study , ethnography , and grounded theory designs.

The priorities of a research design can vary depending on the field, but you usually have to specify:

  • Your research questions and/or hypotheses
  • Your overall approach (e.g., qualitative or quantitative )
  • The type of design you’re using (e.g., a survey , experiment , or case study )
  • Your data collection methods (e.g., questionnaires , observations)
  • Your data collection procedures (e.g., operationalization , timing and data management)
  • Your data analysis methods (e.g., statistical tests  or thematic analysis )

A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

In statistics, sampling allows you to test a hypothesis about the characteristics of a population.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

A research project is an academic, scientific, or professional undertaking to answer a research question . Research projects can take many forms, such as qualitative or quantitative , descriptive , longitudinal , experimental , or correlational . What kind of research approach you choose will depend on your topic.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, November 20). What Is a Research Design | Types, Guide & Examples. Scribbr. Retrieved April 5, 2024, from https://www.scribbr.com/methodology/research-design/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, guide to experimental design | overview, steps, & examples, how to write a research proposal | examples & templates, ethical considerations in research | types & examples, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Step 3: Focus the Evaluation Design

  • Introduction to Program Evaluation for Public Health Programs: A Self-Study Guide

‹ View Table of Contents

Types of Evaluations

Exhibit 3.1, exhibit 3.2, determining the evaluation focus.

  • Are You Ready to Evaluate Outcomes?

Illustrating Evaluation Focus Decisions

Defining the specific evaluation questions, deciding on the evaluation design, standards for step 3: focus the evaluation design, checklist for step 3: focusing the evaluation design.

  • Worksheet 3A - Focusing the Evaluation in the Logic Model
  • Worksheet 3B - “Reality Checking” the Evaluation Focus

After completing Steps 1 and 2, you and your stakeholders should have a clear understanding of the program and have reached consensus. Now your evaluation team will need to focus the evaluation. This includes determining the most important evaluation questions and the appropriate design for the evaluation. Focusing the evaluation assumes that the entire program does not need to be evaluated at any point in time. Rather, the right evaluation of the program depends on what question is being asked, who is asking the question, and what will be done with the information.

Since resources for evaluation are always limited, this chapter provides a series of decision criteria to help you determine the best evaluation focus at any point in time. These criteria are inspired by the evaluation standards: specifically, utility (who will use the results and what information will be most useful to them) and feasibility (how much time and resources are available for the evaluation).

The logic models developed in Step 2set the stage for determining the best evaluation focus. The approach to evaluation focus in the CDC Evaluation Framework differs slightly from traditional evaluation approaches. Rather than a summative evaluation, conducted when the program had run its course and asking “Did the program work?” the CDC framework views evaluation as an ongoing activity over the life of a program that asks,” Is the program working?”

Hence, a program is always ready for some evaluation. Because the logic model displays the program from inputs through activities/outputs through to the sequence of outcomes from short-term to most distal, it can guide a discussion of what you can expect to achieve at a given point in the life of your project. Should you focus on distal outcomes, or only on short- or mid-term ones? Or conversely, does a process evaluation make the most sense right now?

  Top of Page

Many different questions can be part of a program evaluation; depending on how long the program has been in existence, who is asking the question, and why the evaluation information is needed. In general, evaluation questions for an existing program [17] fall into one of the following groups:

Implementation/Process

Implementation evaluations (process evaluations) document whether a program has been implemented as intended—and why or why not? In process evaluations, you might examine whether the activities are taking place, who is conducting the activities, who is reached through the activities, and whether sufficient inputs have been allocated or mobilized. Process evaluation is important to help distinguish the causes of poor program performance—was the program a bad idea, or was it a good idea that could not reach the standard for implementation that you set? In all cases, process evaluations measure whether actual program performance was faithful to the initial plan. Such measurements might include contrasting actual and planned performance along all or some of the following:

  • The locale where services or programs are provided (e.g., rural, urban)
  • The number of people receiving services
  • The economic status and racial/ethnic background of people receiving services
  • The quality of services
  • The actual events that occur while the services are delivered
  • The amount of money the project is using
  • The direct and in-kind funding for services
  • The staffing for services or programs
  • The number of activities and meetings
  • The number of training sessions conducted

When evaluation resources are limited, only the most important issues of implementation can be included. Here are some “usual suspects” that compromise implementation and might be considered for inclusion in the process evaluation focus:

  • Transfers of Accountability: When a program’s activities cannot produce the intended outcomes unless some other person or organization takes appropriate action, there is a transfer of accountability.
  • Dosage: The intended outcomes of program activities (e.g., training, case management, counseling) may presume a threshold level of participation or exposure to the intervention.
  • Access: When intended outcomes require not only an increase in consumer demand but also an increase in supply of services to meet it, then the process evaluation might include measures of access.
  • Staff Competency: The intended outcomes may presume well-designed program activities delivered by staff that are not only technically competent but also matched appropriately to the target audience. Measures of the match of staff and target audience might be included in the process evaluation.

Our childhood lead poisoning logic model illustrates such potential process issues. Reducing EBLL presumes the house will be cleaned, medical care referrals will be fulfilled, and specialty medical care will be provided. These are transfers of accountability beyond the program to the housing authority, the parent, and the provider, respectively. For provider training to achieve its outcomes, it may presume completion of a three-session curriculum, which is a dosage issue. Case management results in medical referrals, but it presumes adequate access to specialty medical providers. And because lead poisoning tends to disproportionately affect children in low-income urban neighborhoods, many program activities presume cultural competence of the caregiving staff. Each of these components might be included in a process evaluation of a childhood lead poisoning prevention program.

Effectiveness/Outcome

Outcome evaluations assess progress on the sequence of outcomes the program is to address. Programs often describe this sequence using terms like short-term, intermediate, and long-term outcomes, or proximal (close to the intervention) or distal (distant from the intervention). Depending on the stage of development of the program and the purpose of the evaluation, outcome evaluations may include any or all of the outcomes in the sequence, including

  • Changes in people’s attitudes and beliefs
  • Changes in risk or protective behaviors
  • Changes in the environment, including public and private policies, formal and informal enforcement of regulations, and influence of social norms and other societal forces
  • Changes in trends in morbidity and mortality

While process and outcome evaluations are the most common, there are several other types of evaluation questions that are central to a specific program evaluation. These include the following:

  • Efficiency: Are your program’s activities being produced with minimal use of resources such as budget and staff time? What is the volume of outputs produced by the resources devoted to your program?
  • Cost-Effectiveness: Does the value or benefit of your program’s outcomes exceed the cost of producing them?
  • Attribution: Can the outcomes be related to your program, as opposed to other things going on at the same time?

All of these types of evaluation questions relate to part, but not all, of the logic model. Exhibits 3.1 and 3.2 show where in the logic model each type of evaluation would focus. Implementation evaluations would focus on the inputs, activities, and outputs boxes and not be concerned with performance on outcomes. Effectiveness evaluations would do the opposite—focusing on some or all outcome boxes , but not necessarily on the activities that produced them. Efficiency evaluations care about the arrows linking inputs to activities/outputs —how much output is produced for a given level of inputs/resources. Attribution would focus on the arrows between specific activities/outputs and specific outcomes —whether progress on the outcome is related to the specific activity/output.

Determining the correct evaluation focus is a case-by-case decision. Several guidelines inspired by the “utility” and “feasibility” evaluation standards can help determine the best focus.

Utility Considerations

1) what is the purpose of the evaluation.

Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for the evaluation questions, design, and methods. Some common purposes:

  • Gain new knowledge about program activities
  • Improve or fine-tune existing program operations (e.g., program processes or strategies)
  • Determine the effects of a program by providing evidence concerning the program’s contributions to a long-term goal
  • Affect program participants by acting as a catalyst for self-directed change (e.g., teaching)

2) Who will use the evaluation results?

Users are the individuals or organizations that will employ the evaluation findings. The users will likely have been identified during Step 1 in the process of engaging stakeholders. In this step, you need to secure their input into the design of the evaluation and the selection of evaluation questions. Support from the intended users will increase the likelihood that the evaluation results will be used for program improvement.

3) How will they use the evaluation results?

Many insights on use will have been identified in Step 1. Information collected may have varying uses, which should be described in detail when designing the evaluation. Some examples of uses of evaluation information:

  • To document the level of success in achieving objectives
  • To identify areas of the program that need improvement
  • To decide how to allocate resources
  • To mobilize community support
  • To redistribute or expand the locations where the intervention is carried out
  • To improve the content of the program’s materials
  • To focus program resources on a specific population
  • To solicit more funds or additional partners

4) What do other key stakeholders need from the evaluation?

Of course, the most important stakeholders are those who request or who will use the evaluation results. Nevertheless, in Step 1, you may also have identified stakeholders who, while not using the findings of the current evaluation, have key questions that may need to be addressed in the evaluation to keep them engaged. For example, a particular stakeholder may always be concerned about costs, disparities, or attribution. If so, you may need to add those questions to your evaluation focus.

Feasibility Considerations

The first four questions help identify the most useful focus of the evaluation, but you must also determine whether it is a realistic/feasible one. Three questions provide a reality check on your desired focus:

5) What is the stage of development of the program?

During Step 2, you will have identified the program’s stage of development. There are roughly three stages in program development –planning, implementation, and maintenance — that suggest different focuses. In the planning stage, a truly formative evaluation—who is your target, how do you reach them, how much will it cost—may be the most appropriate focus. An evaluation that included outcomes would make little sense at this stage. Conversely, an evaluation of a program in maintenance stage would need to include some measurement of progress on outcomes, even if it also included measurement of implementation.

Here are some handy rules to decide whether it is time to shift the evaluation focus toward an emphasis on program outcomes:

  • Sustainability: Political and financial will exists to sustain the intervention while the evaluation is conducted.
  • Fidelity: Actual intervention implementation matches intended implementation. Erratic implementation makes it difficult to know what “version” of the intervention was implemented and, therefore, which version produced the outcomes.
  • Stability: Intervention is not likely to change during the evaluation. Changes to the intervention over time will confound understanding of which aspects of the intervention caused the outcomes.
  • Reach: Intervention reaches a sufficiently large number of clients (sample size) to employ the proposed data analysis. For example, the number of clients needed may vary with the magnitude of the change expected in the variables of interest (i.e., effect size) and the power needed for statistical purposes.
  • Dosage: Clients have sufficient exposure to the intervention to result in the intended outcomes. Interventions with limited client contact are less likely to result in measurable outcomes, compared to interventions that provide more in-depth intervention.

6) How intensive is the program?

Some programs are wide-ranging and multifaceted. Others may use only one approach to address a large problem. Some programs provide extensive exposure (“dose”) of the program, while others involve participants quickly and superficially. Simple or superficial programs, while potentially useful, cannot realistically be expected to make significant contributions to distal outcomes of a larger program, even when they are fully operational.

7) What are relevant resource and logistical considerations?

Resources and logistics may influence decisions about evaluation focus. Some outcomes are quicker, easier, and cheaper to measure, while others may not be measurable at all. These facts may tilt the decision about evaluation focus toward some outcomes as opposed to others.

Early identification of inconsistencies between utility and feasibility is an important part of the evaluation focus step. But we must also ensure a “meeting of the minds” on what is a realistic focus for program evaluation at any point in time.

The affordable housing example shows how the desired focus might be constrained by reality. The elaborated logic model was important in this case. It clarified that, while program staff were focused on production of new houses, important stakeholders like community-based organizations and faith-based donors were committed to more distal outcomes such as changes in life outcomes of families, or on the outcomes of outside investment in the community. The model led to a discussion of reasonable expectations and, in the end, to expanded evaluation indicators that included some of the more distal outcomes, that led to stakeholders’ greater appreciation of the intermediate milestones on the way to their preferred outcomes.

Because the appropriate evaluation focus is case-specific, let’s apply these focus issues to a few different evaluation scenarios for the CLPP program.

At the 1-year mark, a neighboring community would like to adopt your program but wonders, “What are we in for?” Here you might determine that questions of efficiency and implementation are central to the evaluation. You would likely conclude this is a realistic focus, given the stage of development and the intensity of the program. Questions about outcomes would be premature.

At the 5-year mark, the auditing branch of your government funder wants to know, “Did you spend our money well?” Clearly, this requires a much more comprehensive evaluation, and would entail consideration of efficiency, effectiveness, possibly implementation, and cost-effectiveness. It is not clear, without more discussion with the stakeholder, whether research studies to determine causal attribution are also implied. Is this a realistic focus? At year 5, probably yes. The program is a significant investment in resources and has been in existence for enough time to expect some more distal outcomes to have occurred.

Note that in either scenario, you must also consider questions of interest to key stakeholders who are not necessarily intended users of the results of the current evaluation. Here those would be advocates, concerned that families not be blamed for lead poisoning in their children, and housing authority staff, concerned that amelioration include estimates of costs and identification of less costly methods of lead reduction in homes. By year 5, these look like reasonable questions to include in the evaluation focus. At year 1, stakeholders might need assurance that you care about their questions, even if you cannot address them yet.

These focus criteria identify the components of the logic model to be included in the evaluation focus, i.e., these activities, but not these; these outcomes, but not these. At this point, you convert the components of your focus into specific questions, i.e., implementation, effectiveness, efficiency, and attribution. Were my activities implemented as planned? Did my intended outcomes occur? Were the outcomes due to my activities as opposed to something else? If the outcomes occurred at some but not all sites, what barriers existed at less successful locations and what factors were related to success? At what cost were my activities implemented and my outcomes achieved?

Besides determining the evaluation focus and specific evaluation questions, at this point you also need to determine the appropriate evaluation design. Of chief interest in choosing the evaluation design is whether you are being asked to monitor progress on outcomes or whether you are also asked to show attribution—that progress on outcomes is related to your program efforts. Attribution questions may more appropriately be viewed as research as opposed to program evaluation, depending on the level of scrutiny with which they are being asked.

Three general types of research designs are commonly recognized: experimental, quasi-experimental, and non-experimental/observational. Traditional program evaluation typically uses the third type, but all three are presented here because, over the life of the program, traditional evaluation approaches may need to be supplemented with other studies that look more like research.

Experimental designs use random assignment to compare the outcome of an intervention on one or more groups with an equivalent group or groups that did not receive the intervention. For example, you could select a group of similar schools, and then randomly assign some schools to receive a prevention curriculum and other schools to serve as controls. All schools have the same chance of being selected as an intervention or control school. Random assignment, reduces the chances that the control and intervention schools vary in any way that could influence differences in program outcomes. This allows you to attribute change in outcomes to your program. For example, if the students in the intervention schools delayed onset or risk behavior longer than students in the control schools, you could attribute the success to your program. However, in community settings it is hard, or sometimes even unethical, to have a true control group.

While there are some solutions that preserve the integrity of experimental design, another option is to use a quasi-experimental design . These designs make comparisons between nonequivalent groups and do not involve random assignment to intervention and control groups.

An example would be to assess adults’ beliefs about the harmful outcomes of environmental tobacco smoke (ETS) in two communities, then conduct a media campaign in one of the communities. After the campaign, you would reassess the adults and expect to find a higher percentage of adults believing ETS is harmful in the community that received the media campaign. Critics could argue that other differences between the two communities caused the changes in beliefs, so it is important to document that the intervention and comparison groups are similar on key factors such as population demographics and related current or historical events.

Related to quasi-experimental design , comparison of outcomes/outcome data among states and between one state and the nation as a whole are common ways to evaluate public health efforts. Such comparisons will help you establish meaningful benchmarks for progress. States can compare their progress with that of states with a similar investment in their area of public health, or they can contrast their outcomes with the results to expect if their programs were similar to those of states with a larger investment.

Comparison data are also useful for measuring indicators in anticipation of new or expanding programs. For example, noting a lack of change in key indicators over time prior to program implementation helps demonstrate the need for your program and highlights the comparative progress of states with comprehensive public health programs already in place. A lack of change in indicators can be useful as a justification for greater investment in evidence-based, well-funded, and more comprehensive programs. Between-state comparisons can be highlighted with time–series analyses. For example, questions on many of the larger national surveillance systems have not changed in several years, so you can make comparisons with other states over time, using specific indicators. Collaborate with state epidemiologists, surveillance coordinators, and statisticians to make state and national comparisons an important component of your evaluation.

Observational designs include, but are not limited to, time–series analysis, cross-sectional surveys, and case studies. Periodic cross-sectional surveys (e.g.., the YTS or BRFSS) can inform your evaluation. Case studies may be particularly appropriate for assessing changes in public health capacity in disparate population groups. Case studies are applicable when the program is unique, when an existing program is used in a different setting, when a unique outcome is being assessed, or when an environment is especially unpredictable. Case studies can also allow for an exploration of community characteristics and how these may influence program implementation, as well as identifying barriers to and facilitators of change.

This issue of “causal attribution,” while often a central research question, may or may not need to supplement traditional program evaluation. The field of public health is under increasing pressure to demonstrate that programs are worthwhile, effective, and efficient. During the last two decades, knowledge and understanding about how to evaluate complex programs have increased significantly. Nevertheless, because programs are so complex, these traditional research designs described here may not be a good choice. As the World Health Organization notes, “the use of randomized control trials to evaluate health promotion initiatives is, in most cases, inappropriate, misleading, and unnecessarily expensive.” [18]

Consider the appropriateness and feasibility of less traditional designs (e.g., simple before–after [pretest–posttest] or posttest-only designs). Depending on your program’s objectives and the intended use(s) for the evaluation findings, these designs may be more suitable for measuring progress toward achieving program goals. Even when there is a need to prove that the program was responsible for progress on outcomes, traditional research designs may not be the only or best alternative. Depending on how rigorous the proof needs to be, proximity in time between program implementation and progress on outcomes, or systematic elimination of alternative explanations may be enough to persuade key stakeholders that the program is making a contribution. While these design alternatives often cost less and require less time, keep in mind that saving time and money should not be the main criteria selecting an evaluation design. It is important to choose a design that will measure what you need to measure and that will meet both your immediate and long-term needs.

Another alternative to experimental and quasi-experimental models is a goal-based evaluation model, that uses predetermined program goals and the underlying program theory as the standards for evaluation, thus holding the program accountable to prior expectations. The CDC Framework’s emphasis on program description and the construction of a logic model sets the stage for strong goal-based evaluations of programs. In such cases, evaluation planning focuses on the activities; outputs; and short-term, intermediate, and long-term outcomes outlined in a program logic model to direct the measurement activities.

The design you select influences the timing of data collection, how you analyze the data, and the types of conclusions you can make from your findings. A collaborative approach to focusing the evaluation provides a practical way to better ensure the appropriateness and utility of your evaluation design.

  • Define the purpose(s) and user(s) of your evaluation.
  • Identify the use(s) of the evaluation results.
  • Consider stage of development, program intensity, and logistics and resources.
  • Determine the components of your logic model that should be part of the focus given these utility and feasibility considerations.
  • Formulate the evaluation questions to be asked of the program components in your focus, i.e., implementation, effectiveness, efficiency, and attribution questions.
  • Review evaluation questions with stakeholders, program managers, and program staff.
  • Review options for the evaluation design, making sure that the design fits the evaluation questions.

Worksheet 3A – Focusing the Evaluation in the Logic Model

Worksheet 3b – “reality checking” the evaluation focus.

[17] There is another type of evaluation—“formative” evaluation—where the purpose of the evaluation is to gain insight into the nature of the problem so that you can “formulate” a program or intervention to address it. While many steps of the Framework will be helpful for formative evaluation, the emphasis in this manual is on instances wherein the details of the program/intervention are already known even though it may not yet have been implemented.

[18] WHO European Working Group on Health Promotion Evaluation. op cit.

Pages in this Report

  • Acknowledgments
  • Guide Contents
  • Executive Summary
  • Introduction
  • Step 1: Engage Stakeholders
  • Step 2: Describe the Program
  • › Step 3: Focus the Evaluation Design
  • Step 4: Gather Credible Evidence
  • Step 5: Justify Conclusions
  • Step 6: Ensure Use of Evaluation Findings and Share Lessons Learned
  • Program Evaluation Resources

E-mail: [email protected]

To receive email updates about this page, enter your email address:

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

Integrations

What's new?

Prototype Testing

Live Website Testing

Feedback Surveys

Interview Studies

Card Sorting

Tree Testing

In-Product Prompts

Participant Management

Automated Reports

Templates Gallery

Choose from our library of pre-built mazes to copy, customize, and share with your own users

Browse all templates

Financial Services

Tech & Software

Product Designers

Product Managers

User Researchers

By use case

Concept & Idea Validation

Wireframe & Usability Test

Content & Copy Testing

Feedback & Satisfaction

Content Hub

Educational resources for product, research and design teams

Explore all resources

Question Bank

Research Maturity Model

Guides & Reports

Help Center

Future of User Research Report

The Optimal Path Podcast

Maze Guides | Resources Hub

What is UX Research: The Ultimate Guide for UX Researchers

0% complete

Evaluative research: Key methods, types, and examples

In the last chapter, we learned what generative research means and how it prepares you to build an informed solution for users. Now, let’s look at evaluative research for design and user experience (UX).

evaluative research illustration

What is evaluative research?

Evaluative research is a research method used to evaluate a product or concept and collect data to help improve your solution. It offers many benefits, including identifying whether a product works as intended and uncovering areas for improvement.

Also known as evaluation research or program evaluation, this kind of research is typically introduced in the early phases of the design process to test existing or new solutions. It continues to be employed in an iterative way until the product becomes ‘final’. “With evaluation research, we’re making sure the value is there so that effort and resources aren’t wasted,” explains Nannearl LeKesia Brown , Product Researcher at Figma.

According to Mithila Fox , Senior UX Researcher at Stack Overflow, the evaluation research process includes various activities, like content testing , assessing accessibility or desirability. During UX research , evaluation can also be conducted on competitor products to understand what solutions work well in the current market before you start building your own.

“Even before you have your own mockups, you can start by testing competitors or similar products,” says Mithila. “There’s a lot we can learn from what is and isn't working about other products in the market.”

However, evaluation research doesn’t stop when a new product is launched. For the best user experience, solutions need to be monitored after release and improved based on customer feedback.

Turn insights into impact with Maze

Create better product experiences with evaluative research powered by actionable insights from your users.

evaluation research design type

Why is evaluative research important?

Evaluative research is crucial in UX design and research, providing insights to enhance user experiences, identify usability issues, and inform iterative design improvements. It helps you:

  • Refine and improve UX: Evaluative research allows you to test a solution and collect valuable feedback to refine and improve the user experience. For example, you can A/B test the copy on your site to maximize engagement with users.
  • Identify areas of improvement: Findings from evaluative research are key to assessing what works and what doesn't. You might, for instance, run usability testing to observe how users navigate your website and identify pain points or areas of confusion.
  • Align your ideas with users: Research should always be a part of the design and product development process . By allowing users to evaluate your product early and often you'll know whether you're building the right solution for your audience.
  • Get buy-in: The insights you get from this type of research can demonstrate the effectiveness and impact of your project. Show this information to stakeholders to get buy-in for future projects.

Evaluative vs. Generative research

The difference between generative research and evaluative research lies in their focus: generative methods investigate user needs for new solutions, while evaluative research assesses and validates existing designs for improvements.

Generative and evaluative research are both valuable decision-making tools in the arsenal of a researcher. They should be similarly employed throughout the product development process as they both help you get the evidence you need.

When creating the research plan , study the competitive landscape, target audience, needs of the people you’re building for, and any existing solutions. Depending on what you need to find out, you’ll be able to determine if you should run generative or evaluative research.

Mithila explains the benefits of using both research methodologies: “Generative research helps us deeply understand our users and learn their needs, wants, and challenges. On the other hand, evaluative research helps us test whether the solutions we've come up with address those needs, wants, and challenges.”

Use generative research to bring forth new ideas during the discovery phase. And use evaluation research to test and monitor the product before and after launch.

The two types of evaluative research

There are two types of evaluative studies you can tap into: summative and formative research. Although summative evaluations are often quantitative, they can also be part of qualitative research.

Summative evaluation research

A summative evaluation helps you understand how a design performs overall. It’s usually done at the end of the design process to evaluate its usability or detect overlooked issues. You can also use a summative evaluation to benchmark your new solution against a prior one, or that of a competitor’s, and understand if the final product needs assessment. Summative evaluation can be used for outcome-focused evaluation to assess impact and effectiveness for specific outcomes—for example, how design influences conversion.

Formative evaluation research

On the other hand, formative research is conducted early and often during the design process to test and improve a solution before arriving at the final design. Running a formative evaluation allows you to test and identify issues in the solutions as you’re creating them, and improve them based on user feedback.

TL;DR: Run formative research to test and evaluate solutions during the design process, and conduct a summative evaluation at the end to evaluate the final product.

Looking to conduct UX research? Check out our list of the top UX research tools to run an effective research study.

5 Key evaluative research methods

“Evaluation research can start as soon as you understand your user’s needs,” says Mithila. Here are five typical UX research methods to include in your evaluation research process:

Evaluative research methods

User surveys can provide valuable quantitative insights into user preferences, satisfaction levels, and attitudes toward a design or product. By gathering a large amount of data efficiently, surveys can identify trends, patterns, and user demographics to make informed decisions and prioritize design improvements.

Closed card sorting

Closed card sorting helps evaluate the effectiveness and intuitiveness of an existing or proposed navigation structure. By analyzing how participants group and categorize information, researchers can identify potential issues, inconsistencies, or gaps in the design's information architecture, leading to improved navigation and findability.

Tree testing

Tree testing , also known as reverse card sorting, is a research method used to evaluate the findability and effectiveness of information architecture. Participants are given a text-based representation of the website's navigation structure (without visual design elements) and are asked to locate specific items or perform specific tasks by navigating through the tree structure. This method helps identify potential issues such as confusing labels, unclear hierarchy, or navigation paths that hinder users' ability to find information.

Usability testing

Usability testing involves observing and collecting qualitative and/or quantitative data on how users interact with a design or product. Participants are given specific tasks to perform while their interactions, feedback, and difficulties are recorded. This approach helps identify usability issues, areas of confusion, or pain points in the user experience.

A/B testing

A/B testing , also known as split testing, is an evaluative research approach that involves comparing two or more versions of a design or feature to determine which one performs better in achieving a specific objective. Users are randomly assigned to different variants, and their interactions, behavior, or conversion rates are measured and analyzed. A/B testing allows researchers to make data-driven decisions by quantitatively assessing the impact of design changes on user behavior, engagement, or conversion metrics.

This is the value of having a UX research plan before diving into the research approach itself. If we were able to answer the evaluative questions we had, in addition to figuring out if our hypotheses were valid (or not), I’d count that as a successful evaluation study. Ultimately, research is about learning in order to make more informed decisions—if we learned, we were successful.

Nannearl LeKesia Brown, Product Researcher at Figma

Nannearl LeKesia Brown , Product Researcher at Figma

Evaluative research question examples

To gather valuable data and make better design decisions, you need to ask the right research questions . Here are some examples of evaluative research questions:

Usability questions

  • How would you go about performing [task]?
  • How was your experience completing [task]?
  • How did you find navigating to [X] page?
  • Based on the previous task, how would you prefer to do this action instead?

Get inspired by real-life usability test examples and discover more usability testing questions in our guide to usability testing.

Product survey questions

  • How often do you use the product/feature?
  • How satisfied are you with the product/feature?
  • Does the product/feature help you achieve your goals?
  • How easy is the product/feature to use?

Discover more examples of product survey questions in our article on product surveys .

Closed card sorting questions

  • Were there any categories you were unsure about?
  • Which categories were you unsure about?
  • Why were you unsure about the [X] category?

Find out more in our complete card sorting guide .

Evaluation research examples

Across UX design, research, and product testing, evaluative research can take several forms. Here are some ways you can conduct evaluative research:

Comparative usability testing

This example of evaluative research involves conducting usability tests with participants to compare the performance and user satisfaction of two or more competing design variations or prototypes.

You’ll gather qualitative and quantitative data on task completion rates, errors, user preferences, and feedback to identify the most effective design option. You can then use the insights gained from comparative usability testing to inform design decisions and prioritize improvements based on user-centered feedback .

Cognitive walkthroughs

Cognitive walkthroughs assess the usability and effectiveness of a design from a user's perspective.

You’ll create evaluators to identify potential points of confusion, decision-making challenges, or errors. You can then gather insights on user expectations, mental models, and information processing to improve the clarity and intuitiveness of the design .

Diary studies

Conducting diary studies gives you insights into users' experiences and behaviors over an extended period of time.

You provide participants with diaries or digital tools to record their interactions, thoughts, frustrations, and successes related to a product or service. You can then analyze the collected data to identify usage patterns, uncover pain points, and understand the factors influencing the user experience .

In the next chapters, we'll learn more about quantitative and qualitative research, as well as the most common UX research methods . We’ll also share some practical applications of how UX researchers use these methods to conduct effective research.

Generate product insights with Maze

Make actionable decisions powered by user feedback, evaluation, and research.

user testing data insights

In the next chapters, we'll learn more about quantitative and qualitative research, as well as the most common research approaches, and share some practical applications of how UX researchers use them to conduct effective research.

Frequently asked questions

Evaluative research, also known as evaluation research or program evaluation, is a type of research you can use to evaluate a product or concept and collect data that helps improve your solution.

Quantitative vs. qualitative UX research: An overview of UX research methods

Home

Chapter 2 | Methodological Principles of Evaluation Design

Evaluation of International Development Interventions

Evaluation approaches and methods do not exist in a vacuum. Stakeholders who commission or use evaluations and those who manage or conduct evaluations all have their own ideas and preferences about which approaches and methods to use. An individual’s disciplinary background, experience, and institutional role influence such preferences; other factors include internalized ideas about rigor and applicability of methods. This guide will inform evaluation stakeholders about a range of approaches and methods that are used in evaluative analysis and provide a quick overview of the key features of each. It thus will inform them about the approaches and methods that work best in given situations.

Before we present the specific approaches and methods in chapter 3, let us consider some of the key methodological principles of evaluation design that provide the foundations for the selection, adaptation, and use of evaluation approaches and methods in an IEO evaluation setting. To be clear, we focus only on methodological issues here and do not discuss other key aspects of design, such as particular stakeholders’ intended use of the evaluation. The principles discussed in this chapter pertain also to evaluation in general, but they are especially pertinent for designing independent evaluations in an international development context. We consider the following methodological principles to be important for developing high-quality evaluations:

  • Giving due consideration to methodological aspects of evaluation quality in design: focus, consistency, reliability, and validity
  • Matching evaluation design to the evaluation questions
  • Using effective tools for evaluation design
  • Balancing scope and depth in multilevel, multisite evaluands
  • Mixing methods for analytical depth and breadth
  • Dealing with institutional opportunities and constraints of budget, data, and time
  • Building on theory

Let us briefly review each of these in turn.

Giving Due Consideration to Methodological Aspects of Evaluation Quality in Design

Evaluation quality is complex. It may be interpreted in different ways and refer to one or more aspects of quality in terms of process, use of methods, team composition, findings, and so on. Here we will talk about quality of inference: the quality of the findings of an evaluation as underpinned by clear reasoning and reliable evidence. We can differentiate among four broad, interrelated sets of determinants:

  • The budget, data, and time available for an evaluation (see the Dealing with Institutional Opportunities and Constraints of Budget, Data, and Time section);
  • The institutional processes and incentives for producing quality work;
  • The expertise available within the evaluation team in terms of different types of knowledge and experience relevant to the evaluation: institutional, subject matter, contextual (for example, country), methodological, project management, communication; and
  • Overarching principles of quality of inference in evaluation research based on our experience and the methodological literature in the social and behavioral sciences. 1

Here we briefly discuss the final bullet point. From a methodological perspective, quality can be broken down into four aspects: focus, consistency, reliability, and validity.

Focus concerns the scope of the evaluation. Given the nature of the evaluand and the type of questions, how narrowly or widely does one cast the net? Does one look at both relevance and effectiveness issues? How far down the causal chain does the evaluation try to capture the causal contribution of an intervention? Essentially, the narrower the focus of an evaluation, the greater the concentration of financial and human resources on a particular aspect and consequently the greater the likelihood of high-quality inference.

Consistency here refers to the extent to which the different analytical steps of an evaluation are logically connected. The quality of inference is enhanced if there are logical connections among the initial problem statement, rationale and purpose of the evaluation, questions and scope, use of methods, data collection and analysis, and conclusions of an evaluation.

Reliability concerns the transparency and replicability of the evaluation process. 2 The more systematic the evaluation process and the higher the levels of clarity and transparency of design and implementation, the higher the confidence of others in the quality of inference.

Finally, validity is a property of findings. There are many classifications of validity. A widely used typology is the one developed by Cook and Campbell (1979) and slightly refined by Hedges (2017):

  • Internal validity: To what extent is there a causal relationship between, for example, outputs and outcomes?
  • External validity: To what extent can we generalize findings to other contexts, people, or time periods?
  • Construct validity: To what extent is the element that we have measured a good representation of the phenomenon we are interested in?
  • Data analysis validity: To what extent are methods applied correctly and the data used in the analysis adequate for drawing conclusions?

Matching Evaluation Design to the Evaluation Questions

Although it may seem obvious that evaluation design should be matched to the evaluation questions, in practice much evaluation design is still too often methods driven. Evaluation professionals have implicit and explicit preferences and biases toward the approaches and methods they favor. The rise in randomized experiments for causal analysis is largely the result of a methods-driven movement. Although this guide is not the place to discuss whether methods-driven evaluation is justified, there are strong arguments against it. One such argument is that in IEOs (and in many similar institutional settings), one does not have the luxury of being too methods driven. In fact, the evaluation questions, types of evaluands, or types of outcomes that decision makers or other evaluation stakeholders are interested in are diverse and do not lend themselves to one singular approach or method for evaluation. Even for a subset of causal questions, given the nature of the evaluands and outcomes of interest (for example, the effect of technical assistance on institutional reform versus the effect of microgrants on health-seeking behavior of poor women), the availability and cost of data, and many other factors, there is never one single approach or method that is always better than others. For particular types of questions there are usually several methodological options with different requirements and characteristics that are better suited than others. Multiple classifications of questions can be helpful to evaluators in thinking more systematically about this link, such as causal versus noncausal questions, descriptive versus analytical questions, normative versus nonnormative questions, intervention-focused versus systems-based questions, and so on. Throughout this guide, each guidance note presents what we take to be the most relevant questions that the approach or method addresses.

Using Effective Tools for Evaluation Design

Over the years, the international evaluation community in general and institutionalized evaluation functions (such as IEOs) in particular have developed and used a number of tools to improve the quality and efficiency of evaluation design. 3 Let us briefly discuss four prevalent tools.

First, a common tool in IEOs (and similar evaluation functions) is some type of multicriteria approach to justify the strategic selectivity of topics or interventions for evaluation. This could include demand-driven criteria such as potential stakeholder use or supply-driven criteria such as the financial volume or size of a program or portfolio of interventions. Strategic selectivity often goes hand in hand with evaluability assessment (Wholey 1979), which covers such aspects as stakeholder interest and potential use, data availability, and clarity of the evaluand (for example, whether a clear theory of change underlies the evaluand).

A second important tool is the use of approach papers or inception reports. These are stand-alone documents that describe key considerations and decisions regarding the rationale, scope, and methodology of an evaluation. When evaluations are contracted out, the terms of reference for external consultants often contain similar elements. Terms of reference are, however, never a substitute for approach papers or inception reports.

As part of approach papers and inception reports, a third tool is the use of a design matrix. For each of the main evaluation questions, this matrix specifies the sources of evidence and the use of methods. Design matrixes may also be structured to reflect the multilevel nature (for example, global, selected countries, selected interventions) of the evaluation.

A fourth tool is the use of external peer reviewers or a reference group. Including external methodological and substantive experts in the evaluation design process can effectively reduce bias and enhance quality.

Balancing Scope and Depth in Multilevel, Multisite Evaluands

Although project-level evaluation continues to be important, at the same time and for multiple reasons international organizations and national governments are increasingly commissioning and conducting evaluations at higher programmatic levels of intervention. Examples of the latter are sector-level evaluations, country program evaluations, and regional or global thematic evaluations. These evaluations tend to have the following characteristics:

  • They often cover multiple levels of intervention, multiple sites (communities, provinces, countries), and multiple stakeholder groups at different levels and sites.
  • They are usually more summative and are useful for accountability purposes, but they may also contain important lessons for oversight bodies, management, operations, or other stakeholders.
  • They are characterized by elaborate evaluation designs.

A number of key considerations for evaluation design are specific to higher-level programmatic evaluations. The multilevel nature of the intervention (portfolio) requires a multilevel design with multiple methods applied at different levels of analysis (such as country or intervention type). For example, a national program to support the health sector in a given country may have interventions relating to policy dialogue, policy advisory support, and technical capacity development at the level of the line ministry while supporting particular health system and health service delivery activities across the country. Multilevel methods choice goes hand in hand with multilevel sampling and selection issues. A global evaluation of an international organization’s support to private sector development may involve data collection and analysis at the global level (for example, global institutional mapping), the level of the organization’s portfolio (for example, desk review), the level of selected countries (for example, interviews with representatives of selected government departments or agencies and industry leaders), and the level of selected interventions (for example, theory-based causal analysis of advisory services in the energy sector). For efficiency, designs are often “nested”; for example, the evaluation covers selected interventions in selected countries. Evaluation designs may encompass different case study levels, with within-case analysis in a specific country (or regarding a specific intervention) and cross-case (comparative) analysis across countries (or interventions). A key constraint in this type of evaluation is that one cannot cover everything. Even for one evaluation question, decisions on selectivity and scope are needed. Consequently, strategic questions should address the desired breadth and depth of analysis. In general, the need for depth of analysis (determined by, for example, the time, resources, and triangulation among methods needed to understand and assess one particular phenomenon) must be balanced by the need to generate generalizable claims (through informed sampling and selection). In addition to informed sampling and selection, generalizability of findings is influenced by the degree of convergence of findings from one or more cases with available existing evidence or of findings across cases. In addition, there is a clear need for breadth of analysis in an evaluation (looking at multiple questions, phenomena, and underlying factors) to adequately cover the scope of the evaluation. All these considerations require careful reflection in what can be a quite complicated evaluation design process.

Mixing Methods for Analytical Depth and Breadth

Multilevel, multisite evaluations are by definition multimethod evaluations. But the idea of informed evaluation design, or the strategic mixing of methods applies to essentially all evaluations. According to Bamberger (2012, 1), “Mixed methods evaluations seek to integrate social science disciplines with predominantly quantitative and predominantly qualitative approaches to theory, data collection, data analysis and interpretation. The purpose is to strengthen the reliability of data, validity of the findings and recommendations, and to broaden and deepen our understanding of the processes through which program outcomes and impacts are achieved, and how these are affected by the context within which the program is implemented.” The evaluator should always strive to identify and use the best-suited methods for the specific purposes and context of the evaluation and consider how other methods may compensate for any limitations of the selected methods. Although it is difficult to truly integrate different methods within a single evaluation design, the benefits of mixed methods designs are worth pursuing in most situations. The benefits are not just methodological; through mixed designs and methods, evaluations are better able to answer a broader range of questions and more aspects of each question.

There is an extensive and growing literature on mixed methods in evaluation. One of the seminal articles on the subject (by Greene, Caracelli, and Graham) provides a clear framework for using mixed methods in evaluation that is as relevant as ever. Greene, Caracelli, and Graham (1989) identify the following five principles and purposes of mixing methods:

  • Triangulation Using different methods to compare findings. Convergence of findings from multiple methods strengthens the validity of findings. For example, a survey on investment behavior administered to a random sample of owners of small enterprises could confirm the findings obtained from semistructured interviews for a purposive sample of representatives of investment companies supporting the enterprises.
  • Initiation Using different methods to critically question a particular position or line of thought. For example, an evaluator could test two rival theories (with different underlying methods) on the causal relationships between promoting alternative livelihoods in buffer zones of protected areas and protecting biodiversity.
  • Complementarity Using one method to build on the findings from another method. For example, in-depth interviews with selected households and their individual members could deepen the findings from a quasi-experimental analysis on the relationship between advocacy campaigns and health-seeking behavior.
  • Development Using one method to inform the development of another. For example, focus groups could be used to develop a contextualized understanding of women’s empowerment and could use that information to develop a survey questionnaire.
  • Expansion Using multiple methods to look at complementary areas. For example, social network analysis could be used to understand an organization’s position in the financial landscape of all major organizations supporting a country’s education sector, while using semistructured interviews with officials from the education ministry and related agencies to assess the relevance of the organization’s support to the sector.

Dealing with Institutional Opportunities and Constraints of Budget, Data, and Time

Evaluation is applied social science research in the context of specific institutional requirements, constraints, and opportunities, and a range of other practical constraints. Addressing these all-too-common constraints, including budget, data, time, political, and other constraints, involves balancing rigor and depth of analysis with feasibility. In this sense, evaluation clearly distinguishes itself from academic research in several ways:

  • It is strongly linked to an organization’s accountability and learning processes, and there is some explicit or implicit demand-orientation in evaluation.
  • It is highly normative, and evidence is used to underpin normative conclusions about the merit and worth of an evaluand.
  • It puts the policy intervention (for example, the program, strategy, project, corporate process, thematic area of work) at the center of the analysis.
  • It is subject to institutional constraints of budget, time, and data. Even in more complicated evaluations of larger programmatic evaluands, evaluation (especially by IEOs) is essentially about “finding out fast” without compromising too much the quality of the analysis.
  • It is shaped in part by the availability of data already in the organizational system. Such data may include corporate data (financial, human resources, procurement, and so on), existing reporting (financial appraisal, monitoring, [self-] evaluation), and other data and background research conducted by the organization or its partners.

Building on Theory

Interventions are theories, and evaluation is the test (Pawson and Tilley 2001). This well-known reference indicates an influential school of thought and practice in evaluation, often called theory-driven or theory-based evaluation. Policy interventions (programs and projects) rely on underlying theories regarding how they are intended to work and contribute to processes of change. These theories (usually called program theories, theories of change, or intervention theories) are often made explicit in documents but sometimes exist only in the minds of stakeholders (for example, decision makers, evaluation commissioners, implementing staff, beneficiaries). Program theories (whether explicit or tacit) guide the design and implementation of policy interventions and also constitute an important basis for evaluation.

The important role of program theory (or variants thereof) is well established in evaluation. By describing the inner workings of how programs operate (or at least are intended to operate), the use of program theory is a fundamental step in evaluation planning and design. Regardless of the evaluation question or purpose, a central step will always be to develop a thorough understanding of the intervention that is evaluated. To this end, the development of program theories should always be grounded in stakeholder knowledge and informed to the extent possible by social scientific theories from psychology, sociology, economics, and other disciplines. Building program theories on the basis of stakeholder knowledge and social scientific theory supports more relevant and practice-grounded program theories, improves the conceptual clarity and precision of the theories, and ultimately increases the credibility of the evaluation.

Depending on the level of complexity of the evaluand (for example, a complex global portfolio on urban infrastructure support versus a specific road construction project) a program theory can serve as an overall sense-making framework; a framework for evaluation design by linking particular causal steps and assumptions to methods and data; or a framework for systematic causal analysis (for example, using qualitative comparative analysis or process tracing; see chapter 3 ). Program theories can be nested; more detailed theories of selected (sets of) interventions can be developed and used for guiding data collection, analysis, and the interpretation of findings, while the broader theory can be used to connect the different strands of intervention activities and to make sense of the broader evaluand (see also appendix B ).

Bamberger, M. 2012. Introduction to Mixed Methods in Impact Evaluation . Impact Evaluation Notes 3 (August), InterAction and the Rockefeller Foundation. https://www.interaction.org/wp-content/uploads/2019/03/Mixed-Methods-in-Impact-Evaluation-English.pdf.

Bamberger, M., J. Rugh, and L. Mabry. 2006. RealWorld Evaluation: Working under Budget, Time, Data, and Political Constraints . Thousand Oaks, CA: SAGE.

Cook, T. D., and D. T. Campbell. 1979. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Boston: Houghton Mifflin.

Greene, J., V. Caracelli, and W. Graham. 1989. “Toward a Conceptual Framework for Mixed-Method Evaluation Designs.” Educational Evaluation and Policy Analysis 11 (3): 209–21.

Hedges, L. V. 2017. “Design of Empirical Research.” In Research Methods and Methodologies in Education , 2nd ed., edited by R. Coe, M. Waring, L. V. Hedges, and J. Arthur, 25–33. Thousand Oaks, CA: SAGE.

Morra Imas, L., and R. Rist. 2009. The Road to Results . Washington, DC: World Bank.

Pawson, R., and N. Tilley. 2001. “Realistic Evaluation Bloodlines.” American Journal of Evaluation 22 (3): 317–24.

Wholey, Joseph. 1979. Evaluation—Promise and Performance. Washington, DC: Urban Institute.

  • For simplification purposes we define method as a particular technique involving a set of principles to collect or analyze data, or both. The term approach can be situated at a more aggregate level, that is, at the level of methodology, and usually involves a combination of methods within a unified framework. Methodology provides the structure and principles for developing and supporting a particular knowledge claim.
  • Development evaluation is not to be confused with developmental evaluation. The latter is a specific evaluation approach developed by Michael Patton.
  • Especially in independent evaluations conducted by independent evaluation units or departments in national or international nongovernmental, governmental, and multilateral organizations. Although a broader range of evaluation approaches may be relevant to the practice of development evaluation, we consider the current selection to be at the core of evaluative practice in independent evaluation.
  • Evaluation functions of organizations that are (to a large extent), structurally, organizationally and behaviorally independent from management. Structural independence, which is the most distinguishing feature of independent evaluation offices, includes such aspects as independent budgets, independent human resource management, and no reporting line to management, but some type of oversight body (for example, an executive board).
  • The latter are not fully excluded from this guide but are not widely covered.
  • Evaluation is defined as applied policy-oriented research and builds on the principles, theories, and methods of the social and behavioral sciences.
  • Both reliability and validity are covered by a broad literature. Many of the ideas about these two principles are contested, and perspectives differ according to different schools of thought (with different underlying ontological and epistemological foundations).
  • A comprehensive discussion of the evaluation process, including tools, processes, and standards for designing, managing, quality assuring, disseminating, and using evaluations is effectively outside of the scope of this guide (see instead, for example, Bamberger, Rugh, and Mabry 2006; Morra Imas and Rist 2009).

Research Design in Evaluation: Choices and Issues

Cite this chapter.

Book cover

  • Keith Tones ,
  • Sylvia Tilford &
  • Yvonne Keeley Robinson  

179 Accesses

1 Citations

One of the purposes of this book is to stimulate reflection on the meaning of effectiveness and efficiency in health education. Moreover, although it is not concerned to offer detailed guidance on the measurement of success, it may well be worthwhile reminding readers of what is involved in evaluating programmes. Accordingly this chapter will provide a condensed account of the most common research designs used in evaluation . In so doing it will seek to underline some of the important issues involved in selecting an evaluation strategy and perhaps help potential evaluators make a pragmatic choice of method. It should also serve as a reminder that evaluation is not a mere technical activity: it incorporates philosophical issues and requires understanding of the purpose of research and the meaning of success. Above all, the reader should recall that there are often not inconsiderable difficulties involved in gaining unequivocal answers to the apparently simple question ‘Has this health education programme been effective?’

  • Research Design
  • Summative Evaluation
  • Ethnographic Research
  • Health Education Programme
  • American Sociological Association

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Jamieson, I. (1984) Evaluation: a case of research in chains? in: The Politics and Ethics of Evaluation , (ed. C. Adelman), Croom Helm, London.

Google Scholar  

Campbell, D. T. (1969) Reforms as experiments. American Psychologist , 24 , 409–29.

Article   Google Scholar  

Fisher, R. A. (1935) The Design of Experiments , Oliver and Boyd, London.

Shaddock Evans, E. G. (1962) The design of teaching experiments in education, Educational Research , 5 , 37–52.

Roethlisberger, F. J. and Dickson, W. J. (1939) Management and the Worker , Harvard University Press, Cambridge.

Campbell, D. T. and Stanley, J.C. (1963) Experimental and Quasi-Experimental Designs for Research , Rand University, Chicago.

Cronbach, L. (1983) Designing Evaluation of Educational and Social Programmes , Jossey Bass, San Francisco.

Marsh, C. (1982) The Survey Method: The Contribution of Surveys to Sociological Explanation , George Allen and Unwin, London.

Nelson, S. C, Budd, R. J. and Eiser, J. R. (1985) The Avon prevalence study: a survey of cigarette smoking in secondary school children, Health Education Journal , 44 , 12–14.

Feuerstein, M. T. (1986) Partners in Evaluation. Evaluating Development and Community Programmes with Participants . Macmillan, London.

Parlett, M. and Hamilton, D. (1972) Evaluation as Illumination , Centre for Research in Educational Sciences, University of Edinburgh.

Hammersley, M. and Atkinson, P. (1983) Ethnography Principles in Practice , Tavistock, London.

Glaser, B. and Strauss, A. (1967) The Discovery of Grounded Theory , Aldine, Chicago.

Malinowski, B. (1922) Argonauts of the Western Pacific , Routledge and Kegan Paul, London.

Scriven, M. (1973) Goal-free evaluation, in: School Evaluation: The Politics and Process (ed. E. R. House), McCutchan, California.

Chalmers, A. F. (1982) What is This Thing Called Science? (2nd edn) Open University Press, Milton Keynes.

Kuhn, T. S. (1970) The Structure of Scientific Revolutions , University of Chicago Press, Chicago.

McQueen, D. V. (1986) Health education research: the problem of linkages, Health Education Research , 1 , 289–94.

Wright Mills, C. (1970) Appendix: on intellectual craftsmanship, in: The Sociological Imagination , Penguin, London.

Barnes, J. A. (1979) Who Should Know What? Penguin, London.

Buhner, M. (Ed) (1982) Social Research Ethics , Macmillan, London.

Fichter, J. H. and Kolb, W. L. (1953) Ethical limitations in sociological reporting, American Sociological Review , 18 , 544–50.

American Sociological Association (1968) Towards a code of ethics for sociologists, American Sociologists , 3 , 316–8.

Reason, P. and Rowan, J. (1981) Human Inquiry. A Sourcebook of New Paradigm Research , Wiley, London.

Galliher, J. (1973) The protection of human subjects: a re-examination of the professional code of ethics, American Sociologist , 8 , 93–100.

PubMed   Google Scholar  

Becker, H. (1967) Whose side are we on? Social Problems , 14 , 239–47.

Download references

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Tones, Tilford and Robinson

About this chapter

Tones, K., Tilford, S., Robinson, Y.K. (1990). Research Design in Evaluation: Choices and Issues. In: Health Education. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-3230-3_2

Download citation

DOI : https://doi.org/10.1007/978-1-4899-3230-3_2

Publisher Name : Springer, Boston, MA

Print ISBN : 978-0-412-32980-7

Online ISBN : 978-1-4899-3230-3

eBook Packages : Springer Book Archive

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Why Blitzllama?

Evaluative research: Methods, types, and examples (2024)

Master evaluative research with our guide, offering a detailed look at methods, types, and real-life examples for a complete understanding.

Product owners and user researchers often grapple with the challenge of gauging the success and impact of their products. 

The struggle lies in understanding what methods and types of evaluative research can provide meaningful insights. 

Empathy is crucial in this process, as identifying user needs and preferences requires a deep understanding of their experiences. 

In this article, we present a concise guide to evaluative research, offering practical methods, highlighting various types, and providing real-world examples. 

By delving into the realm of evaluative research, product owners and user researchers can navigate the complexities of product assessment with clarity and effectiveness.

What is evaluative research?

Evaluative research assesses the effectiveness and usability of products or services. It involves gathering user feedback to measure performance and identify areas for improvement. 

Product owners and user researchers employ evaluative research to make informed decisions. Users' experiences and preferences are actively observed and analyzed to enhance the overall quality of a product. 

A quote on evaluative research

This research method aids in identifying strengths and weaknesses, enabling iterative refinement. Through surveys, usability testing, and direct user interaction, evaluative research provides valuable insights. 

It guides product development, ensuring that user needs are met and expectations exceeded. For product owners and user researchers, embracing evaluative research is pivotal in creating successful, user-centric solutions.

Now that we understand what evaluative research entails, let's explore why it holds a pivotal role in product development and user research.

Why is evaluative research important?

Evaluative research holds immense importance for product owners and user researchers as it offers concrete data and feedback to gauge the success of a product or service. 

By identifying strengths and weaknesses, it becomes a powerful tool for informed decision-making, leading to product improvements and enhanced user experiences:

1) Unlocking product potential

Evaluative research stands as a crucial pillar in product development, offering invaluable insights into a product's effectiveness. By actively assessing user experiences, product owners gain a clearer understanding of what works and what needs improvement. 

This process facilitates targeted enhancements, ensuring that products align with user expectations and preferences. In essence, evaluative research empowers product owners to unlock their product's full potential, resulting in more satisfied users and increased market success.

2) Mitigating risk and reducing iteration cycles

For product owners navigating the competitive landscape, mitigating risks is paramount. Evaluative research serves as a proactive measure, identifying potential issues before they escalate. Through systematic testing and user feedback, product owners can pinpoint weaknesses, allowing for timely adjustments. 

This not only reduces the likelihood of costly post-launch issues but also streamlines iteration cycles. By addressing concerns early in the development phase, product owners can refine their offerings efficiently, staying agile in response to user needs and industry dynamics.

3) Enhancing user-centric design

User researchers play a pivotal role in shaping products that resonate with their intended audience. Evaluative research is the compass guiding user-centric design, ensuring that every iteration aligns with user expectations. By actively involving users in the assessment process, researchers gain firsthand insights into user behavior and preferences. 

This information is invaluable for crafting a seamless user experience, ultimately fostering loyalty and satisfaction. In the ever-evolving landscape of user preferences, ongoing evaluative research becomes a strategic tool for user researchers to consistently refine and elevate the design, fostering products that stand the test of time.

With the significance of evaluative research established, it's essential to know when is the right time to conduct it.

When should you conduct evaluative research?

Knowing the opportune moments to conduct evaluative research is vital. Whether in the early stages of development or after a product launch, this research helps pinpoint areas for enhancement:

When should you conduct evaluative research

Prototype stage

During the prototype stage, conducting evaluative research is crucial to gather insights and refine the product. 

Engage users with prototypes to identify usability issues, gauge user satisfaction, and validate design decisions. 

This early evaluation ensures that potential problems are addressed before moving forward, saving time and resources in the later stages of development. 

By actively involving users at this stage, product owners can enhance the user experience and align the product with user expectations.

Pre-launch stage

In the pre-launch stage, evaluative research becomes instrumental in assessing the final product's readiness. 

Evaluate user interactions, uncover any remaining usability concerns, and verify that the product meets user needs. 

This phase helps refine features, optimize user flows, and address any last-minute issues. 

By actively seeking user feedback before launch, product owners can make informed decisions to improve the overall quality and performance of the product, ultimately enhancing its market success.

Post-launch stage

After the product is launched, evaluative research remains essential for ongoing improvement. Monitor user behavior, gather feedback, and identify areas for enhancement. 

This active approach allows product owners to respond swiftly to emerging issues, optimize features based on real-world usage, and adapt to changing user preferences. 

Continuous evaluative research in the post-launch stage helps maintain a competitive edge, ensuring the product evolves in tandem with user expectations, thus fostering long-term success.

Now that we understand the timing of evaluative research, let's distinguish it from generative research and understand their respective roles.

Evaluative vs. generative research

While evaluative research assesses existing products, generative research focuses on generating new ideas. Understanding this dichotomy is crucial for product owners and user researchers to choose the right approach for the specific goals of their projects:

Difference between evaluative research and generative research

With the differentiation between evaluative and generative research clear, let's delve into the three primary types of evaluative research.

What are the 3 types of evaluative research?

Evaluative research can take various forms. The three main types include formative evaluation, summative evaluation, and outcome evaluation. 

Types of evaluative research

Each type serves a distinct purpose, offering valuable insights throughout different stages of a product's life cycle:

1) Formative evaluation research

Formative evaluation research is a crucial phase in the development process, focusing on improving and refining a product or program. 

It involves gathering feedback early in the development cycle, allowing product owners to make informed adjustments. 

This type of research seeks to identify strengths and weaknesses, providing insights to enhance the user experience. 

Through surveys, usability testing, and focus groups, formative evaluation guides iterative development, ensuring that the end product aligns with user expectations and needs.

2) Summative evaluation research

Summative evaluation research occurs after the completion of a product or program, aiming to assess its overall effectiveness. 

This type of research evaluates the final outcome against predefined criteria and objectives. 

Summative research is particularly relevant for product owners seeking to understand the overall impact and success of their offering. 

Through methods like surveys, analytics, and performance metrics, it provides a comprehensive overview of the product's performance, helping stakeholders make informed decisions about future developments or investments.

3) Outcome evaluation research

Outcome evaluation research delves into the long-term effects and impact of a product or program on its users. 

It goes beyond immediate outcomes, assessing whether the intended goals and objectives have been met over time. 

Product owners can utilize this research to understand the sustained benefits and challenges associated with their offerings. 

By employing methods such as longitudinal studies and trend analysis, outcome evaluation research helps in crafting strategies for continuous improvement and adaptation based on evolving user needs and market dynamics.

Now that we've identified the types, let's explore five key evaluative research methods commonly employed by product owners and user researchers.

5 Key evaluative research methods

Product owners and user researchers utilize a variety of methods to conduct evaluative research. Choosing the right method depends on the specific goals and context of the research:

Surveys represent a versatile evaluative research method for product owners and user researchers seeking valuable insights into user experiences. These structured questionnaires gather quantitative data, offering a snapshot of user opinions and preferences.

Types of surveys:

Customer satisfaction (CSAT) survey: measures users' satisfaction with a product or service through a straightforward rating scale, typically ranging from 1 to 5.

Customer satisfaction (CSAT) survey

Net promoter score (NPS) survey: evaluates the likelihood of users recommending a product or service on a scale from 0 to 10, categorizing respondents as promoters, passives, or detractors.

Net promoter score (NPS) survey

Customer effort score (CES) survey: focuses on the ease with which users can accomplish tasks or resolve issues, providing insights into the overall user experience.

Customer effort score (CES) survey

When to use surveys:

  • Product launches: Gauge initial user reactions and identify areas for improvement.
  • Post-interaction: Capture real-time feedback immediately after a user engages with a feature or completes a task.

2) Closed card sorting

Closed card sorting is a powerful method for organizing and evaluating information architecture. Participants categorize predefined content into predetermined groups, shedding light on users' mental models and expectations.

Closed card sorting for evaluative research

What closed card sorting entails:

  • Predefined categories: users sort content into categories predetermined by the researcher, allowing for targeted analysis.
  • Quantitative insights: provides quantitative data on how often participants correctly place items in designated categories.

When to employ closed card sorting:

  • Information architecture overhaul: ideal for refining and optimizing the structure of a product's content.
  • Prototyping phase: use early in the design process to inform the creation of prototypes based on user expectations.

3) Tree testing

Tree testing is a method specifically focused on evaluating the navigational structure of a product. Participants are presented with a text-based representation of the product's hierarchy and are tasked with finding specific items, highlighting areas where the navigation may fall short.

Tree testing for evaluative research

What tree testing involves:

  • Text-based navigation: users explore the product hierarchy without the influence of visual design, focusing solely on the structure.
  • Task-based evaluation: research participants complete tasks that reveal the effectiveness of the navigational structure.

When to opt for tree testing:

  • Pre-launch assessment: evaluate the effectiveness of the proposed navigation structure before a product release.
  • Redesign initiatives: use when considering changes to the existing navigational hierarchy.

4) Usability testing

Usability testing is a cornerstone of evaluative research, providing direct insights into how users interact with a product. By observing users completing tasks, product owners and user researchers can identify pain points and areas for improvement.

Usability testing for evaluative research

What usability testing entails:

  • Task performance observation: Researchers observe users as they navigate through tasks, noting areas of ease and difficulty.
  • Think-aloud protocol: Participants vocalize their thoughts and feelings during the testing process, providing additional insights.

When to conduct usability testing:

  • Early design phases: Gather feedback on wireframes and prototypes to address fundamental usability concerns.
  • Post-launch iterations: Continuously improve the user experience based on real-world usage and feedback.

5) A/B testing

A/B testing, also known as split testing, is a method for comparing two versions of a webpage or product to determine which performs better. This method allows for data-driven decision-making by comparing user responses to different variations.

A/B testing for evaluative research

What A/B testing involves:

  • Variant comparison: Users are randomly assigned to either version A or version B, and their interactions are analyzed to identify the more effective option.
  • Quantitative metrics: Metrics such as click-through rates, conversion rates, and engagement help assess the success of each variant.

When to implement A/B testing:

  • Feature optimization: Compare different versions of a specific feature to determine which resonates better with users.
  • Continuous improvement: Use A/B testing regularly to refine and enhance the product based on user preferences and behavior.

Now that we're familiar with the methods, let's see some practical evaluative research question examples to guide your research efforts.

Evaluative research question examples

The formulation of well-crafted research questions is fundamental to the success of evaluative research. Clear and targeted questions guide the research process, ensuring that valuable insights are gained to inform decision-making and improvements:

Usability evaluation questions:

Usability evaluation is a critical aspect of understanding how users interact with a product or system. It involves assessing the ease with which users can complete tasks and the overall user experience. Here are essential evaluative research questions for usability:

How was your experience completing this task? (Gain insights into the overall user experience and identify any pain points or positive aspects encountered during the task.)

What technical difficulties did you experience while completing the task? (Pinpoint specific technical challenges users faced, helping developers address potential issues affecting the usability of the product.)

How intuitive was the navigation? (Assess the user-friendliness of the navigation system, ensuring that users can easily understand and move through the product.)

How would you prefer to do this action instead? (Encourage users to provide alternative methods or suggestions, offering valuable input for enhancing user interactions and task completion.)

Were there any unnecessary features? (Identify features that users find superfluous or confusing, streamlining the product and improving overall usability.)

How easy was the task to complete? (Gauge the perceived difficulty of the task, helping to refine processes and ensure they align with user expectations.)

Were there any features missing? (Identify any gaps in the product’s features, helping the development team prioritize enhancements based on user needs and expectations.)

Product survey research questions:

Product surveys allow for a broader understanding of user satisfaction, preferences, and the likelihood of recommending a product. Here are evaluative research questions for product surveys:

Would you recommend the product to your colleagues/friends? (Measure user satisfaction and gauge the likelihood of users advocating for the product within their network.)

How disappointed would you be if you could no longer use the feature/product? (Assess the emotional impact of potential disruptions or discontinuation, providing insights into the product's perceived value.)

How satisfied are you with the product/feature? (Quantify user satisfaction levels to understand overall sentiment and identify areas for improvement.)

What is the one thing you wish the product/feature could do that it doesn’t already? (Solicit specific user suggestions for improvements, guiding the product development roadmap to align with user expectations.)

What would make you cancel your subscription? (Identify potential pain points or deal-breakers that might lead users to discontinue their subscription, allowing for proactive mitigation strategies.)

As we delve into the questions, let’s explore the case study on evaluative research.

Case study on evaluative research: Spotify

Spotify's case study on evaluative research

The case study discusses the redesign of Spotify's Your Library feature, a significant change that included the introduction of podcasts in 2020 and audiobooks in 2022. The goal was to accommodate content growth while minimizing negative effects on user experience. The study, presented at the CHI conference in 2023, emphasizes three key factors for the successful launch:

Early involvement: Data science and user research were involved early in the product development process to understand user behaviors and mental models. An ethnographic study explored users' experiences and attitudes towards library organization, revealing the Library as a personal space. Personal prototypes were used to involve users in the evaluation of new solutions, ensuring alignment with their mental models.

Evaluating safely at scale: To address the challenge of disruptive changes, the team employed a two-step evaluation process. First, a beta test allowed a small group of users to try the new experience and provide feedback. This observational data helped identify pain points and guided iterative improvements. Subsequently, A/B testing at scale assessed the impact on key metrics, using non-inferiority testing to ensure the new design was not unacceptably worse than the old one.

Mixed method studies: The study employed a combination of qualitative and quantitative methods throughout the process. This mixed methods approach provided a comprehensive understanding of user behaviors, motivations, and needs. Qualitative research, including interviews, diaries, and observational studies, was conducted alongside quantitative data collection to gain deeper insights at all stages.

More details can be found here: Minimizing change aversion through mixed methods research: a case study of redesigning Spotify’s Your Library

Ingrid Pettersson, Carl Fredriksson, Raha Dadgar, John Richardson, Lisa Shields, Duncan McKenzie

Best tools for evaluative research

Utilizing the right tools is instrumental in the success of evaluative research endeavors. From usability testing platforms to survey tools, having a well-equipped toolkit enhances the efficiency and accuracy of data collection.

Product owners and user researchers can leverage these tools to streamline processes and derive actionable insights, ultimately driving continuous improvement:

1) Blitzllama

Blitzllama

Blitzllama stands out as a powerhouse tool for evaluative research, aiding product owners and user researchers in comprehensive testing. Its user-friendly interface facilitates the quick creation of surveys and usability tests, streamlining data collection. With real-time analytics, it offers immediate insights into user behavior. The tool's flexibility accommodates both moderated and unmoderated studies, making it an invaluable asset for product teams seeking actionable feedback to enhance user experiences.

Maze

Maze emerges as a top-tier choice for evaluative research, delivering a seamless user testing experience. Product owners and user researchers benefit from its intuitive platform, allowing the creation of interactive prototypes for realistic assessments. Maze excels in remote usability testing, enabling diverse user groups to provide valuable feedback. Its robust analytics provide a deep dive into user journeys, highlighting pain points and areas of improvement. With features like A/B testing and metrics tracking, Maze empowers teams to make informed decisions and iterate rapidly based on user insights.

3) Survicate

Survicate

Survicate proves to be an essential tool in the arsenal of product owners and user researchers for evaluative research. This versatile survey and feedback platform simplifies the process of gathering user opinions and preferences. Survicate's customization options cater to specific research goals, ensuring targeted and relevant data collection. Real-time reporting and analytics enable quick interpretation of results, facilitating swift decision-making. Whether measuring user satisfaction or testing new features, Survicate’s agility makes it a valuable asset for teams aiming to refine products based on user feedback.

In conclusion, evaluative research equips product owners and user researchers with indispensable tools to enhance product effectiveness. By employing various methods such as usability testing and surveys, they gain valuable insights into user experiences. 

This knowledge empowers swift and informed decision-making, fostering continuous product improvement. The types of evaluative research, including formative, summative, and outcome evaluations, cater to diverse needs, ensuring a comprehensive understanding of user interactions. Real-world examples underscore the practical applications of these methodologies. 

In essence, embracing evaluative research is a proactive strategy for refining products, elevating user satisfaction, and ultimately achieving success in the dynamic landscape of user-centric design.

FAQs related to evaluative research

1) what is evaluative research and examples.

Evaluative research assesses the effectiveness, efficiency, and impact of programs, policies, products, or interventions. For instance, a company may conduct evaluative research to determine how well a new website design functions for users or to gauge customer satisfaction with a revamped product. Other examples include measuring the success of educational programs or evaluating the effectiveness of healthcare interventions.

2) What are the goals of evaluative research?

The primary goals of evaluative research are to determine the strengths and weaknesses of a program, product, or policy and to provide actionable insights for improvement. Through evaluative research, product owners and UX researchers aim to understand how well their offerings meet user needs, identify areas for enhancement, and make informed decisions based on data-driven findings. Ultimately, the goal is to optimize outcomes and enhance user experiences.

3) What are the three types of evaluation research methods?

Evaluation research employs three main methods: formative evaluation, summative evaluation, and developmental evaluation. Formative evaluation focuses on assessing and improving a program or product during its development stages. Summative evaluation, on the other hand, evaluates the overall effectiveness and impact of a completed program or product. Developmental evaluation is particularly useful in complex or rapidly changing environments, emphasizing real-time feedback and adaptation to emergent circumstances.

4) What is the difference between evaluative and formative research?

Evaluative research and formative research serve distinct purposes in the product development and assessment process. Evaluative research examines the outcomes and impacts of a completed program, product, or policy to determine its effectiveness and inform decision-making for future iterations or improvements. In contrast, formative research focuses on gathering insights during the developmental stages to refine and enhance the program or product before its implementation. While evaluative research assesses the end results, formative research shapes the design and development process along the way.

Latest articles

Implementing a CSAT Survey Strategy: A Guide for Product Leaders

Implementing a CSAT Survey Strategy: A Guide for Product Leaders

What is CSAT

What is CSAT

15 Essential Customer Satisfaction Survey Questions for Actionable Insights

15 Essential Customer Satisfaction Survey Questions for Actionable Insights

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

evaluation research design type

Home Market Research

Evaluation Research: Definition, Methods and Examples

Evaluation Research

Content Index

  • What is evaluation research
  • Why do evaluation research

Quantitative methods

Qualitative methods.

  • Process evaluation research question examples
  • Outcome evaluation research question examples

What is evaluation research?

Evaluation research, also known as program evaluation, refers to research purpose instead of a specific method. Evaluation research is the systematic assessment of the worth or merit of time, money, effort and resources spent in order to achieve a goal.

Evaluation research is closely related to but slightly different from more conventional social research . It uses many of the same methods used in traditional social research, but because it takes place within an organizational context, it requires team skills, interpersonal skills, management skills, political smartness, and other research skills that social research does not need much. Evaluation research also requires one to keep in mind the interests of the stakeholders.

Evaluation research is a type of applied research, and so it is intended to have some real-world effect.  Many methods like surveys and experiments can be used to do evaluation research. The process of evaluation research consisting of data analysis and reporting is a rigorous, systematic process that involves collecting data about organizations, processes, projects, services, and/or resources. Evaluation research enhances knowledge and decision-making, and leads to practical applications.

LEARN ABOUT: Action Research

Why do evaluation research?

The common goal of most evaluations is to extract meaningful information from the audience and provide valuable insights to evaluators such as sponsors, donors, client-groups, administrators, staff, and other relevant constituencies. Most often, feedback is perceived value as useful if it helps in decision-making. However, evaluation research does not always create an impact that can be applied anywhere else, sometimes they fail to influence short-term decisions. It is also equally true that initially, it might seem to not have any influence, but can have a delayed impact when the situation is more favorable. In spite of this, there is a general agreement that the major goal of evaluation research should be to improve decision-making through the systematic utilization of measurable feedback.

Below are some of the benefits of evaluation research

  • Gain insights about a project or program and its operations

Evaluation Research lets you understand what works and what doesn’t, where we were, where we are and where we are headed towards. You can find out the areas of improvement and identify strengths. So, it will help you to figure out what do you need to focus more on and if there are any threats to your business. You can also find out if there are currently hidden sectors in the market that are yet untapped.

  • Improve practice

It is essential to gauge your past performance and understand what went wrong in order to deliver better services to your customers. Unless it is a two-way communication, there is no way to improve on what you have to offer. Evaluation research gives an opportunity to your employees and customers to express how they feel and if there’s anything they would like to change. It also lets you modify or adopt a practice such that it increases the chances of success.

  • Assess the effects

After evaluating the efforts, you can see how well you are meeting objectives and targets. Evaluations let you measure if the intended benefits are really reaching the targeted audience and if yes, then how effectively.

  • Build capacity

Evaluations help you to analyze the demand pattern and predict if you will need more funds, upgrade skills and improve the efficiency of operations. It lets you find the gaps in the production to delivery chain and possible ways to fill them.

Methods of evaluation research

All market research methods involve collecting and analyzing the data, making decisions about the validity of the information and deriving relevant inferences from it. Evaluation research comprises of planning, conducting and analyzing the results which include the use of data collection techniques and applying statistical methods.

Some of the evaluation methods which are quite popular are input measurement, output or performance measurement, impact or outcomes assessment, quality assessment, process evaluation, benchmarking, standards, cost analysis, organizational effectiveness, program evaluation methods, and LIS-centered methods. There are also a few types of evaluations that do not always result in a meaningful assessment such as descriptive studies, formative evaluations, and implementation analysis. Evaluation research is more about information-processing and feedback functions of evaluation.

These methods can be broadly classified as quantitative and qualitative methods.

The outcome of the quantitative research methods is an answer to the questions below and is used to measure anything tangible.

  • Who was involved?
  • What were the outcomes?
  • What was the price?

The best way to collect quantitative data is through surveys , questionnaires , and polls . You can also create pre-tests and post-tests, review existing documents and databases or gather clinical data.

Surveys are used to gather opinions, feedback or ideas of your employees or customers and consist of various question types . They can be conducted by a person face-to-face or by telephone, by mail, or online. Online surveys do not require the intervention of any human and are far more efficient and practical. You can see the survey results on dashboard of research tools and dig deeper using filter criteria based on various factors such as age, gender, location, etc. You can also keep survey logic such as branching, quotas, chain survey, looping, etc in the survey questions and reduce the time to both create and respond to the donor survey . You can also generate a number of reports that involve statistical formulae and present data that can be readily absorbed in the meetings. To learn more about how research tool works and whether it is suitable for you, sign up for a free account now.

Create a free account!

Quantitative data measure the depth and breadth of an initiative, for instance, the number of people who participated in the non-profit event, the number of people who enrolled for a new course at the university. Quantitative data collected before and after a program can show its results and impact.

The accuracy of quantitative data to be used for evaluation research depends on how well the sample represents the population, the ease of analysis, and their consistency. Quantitative methods can fail if the questions are not framed correctly and not distributed to the right audience. Also, quantitative data do not provide an understanding of the context and may not be apt for complex issues.

Learn more: Quantitative Market Research: The Complete Guide

Qualitative research methods are used where quantitative methods cannot solve the research problem , i.e. they are used to measure intangible values. They answer questions such as

  • What is the value added?
  • How satisfied are you with our service?
  • How likely are you to recommend us to your friends?
  • What will improve your experience?

LEARN ABOUT: Qualitative Interview

Qualitative data is collected through observation, interviews, case studies, and focus groups. The steps for creating a qualitative study involve examining, comparing and contrasting, and understanding patterns. Analysts conclude after identification of themes, clustering similar data, and finally reducing to points that make sense.

Observations may help explain behaviors as well as the social context that is generally not discovered by quantitative methods. Observations of behavior and body language can be done by watching a participant, recording audio or video. Structured interviews can be conducted with people alone or in a group under controlled conditions, or they may be asked open-ended qualitative research questions . Qualitative research methods are also used to understand a person’s perceptions and motivations.

LEARN ABOUT:  Social Communication Questionnaire

The strength of this method is that group discussion can provide ideas and stimulate memories with topics cascading as discussion occurs. The accuracy of qualitative data depends on how well contextual data explains complex issues and complements quantitative data. It helps get the answer of “why” and “how”, after getting an answer to “what”. The limitations of qualitative data for evaluation research are that they are subjective, time-consuming, costly and difficult to analyze and interpret.

Learn more: Qualitative Market Research: The Complete Guide

Survey software can be used for both the evaluation research methods. You can use above sample questions for evaluation research and send a survey in minutes using research software. Using a tool for research simplifies the process right from creating a survey, importing contacts, distributing the survey and generating reports that aid in research.

Examples of evaluation research

Evaluation research questions lay the foundation of a successful evaluation. They define the topics that will be evaluated. Keeping evaluation questions ready not only saves time and money, but also makes it easier to decide what data to collect, how to analyze it, and how to report it.

Evaluation research questions must be developed and agreed on in the planning stage, however, ready-made research templates can also be used.

Process evaluation research question examples:

  • How often do you use our product in a day?
  • Were approvals taken from all stakeholders?
  • Can you report the issue from the system?
  • Can you submit the feedback from the system?
  • Was each task done as per the standard operating procedure?
  • What were the barriers to the implementation of each task?
  • Were any improvement areas discovered?

Outcome evaluation research question examples:

  • How satisfied are you with our product?
  • Did the program produce intended outcomes?
  • What were the unintended outcomes?
  • Has the program increased the knowledge of participants?
  • Were the participants of the program employable before the course started?
  • Do participants of the program have the skills to find a job after the course ended?
  • Is the knowledge of participants better compared to those who did not participate in the program?

MORE LIKE THIS

ux research software

Top 17 UX Research Software for UX Design in 2024

Apr 5, 2024

Healthcare Staff Burnout

Healthcare Staff Burnout: What it Is + How To Manage It

Apr 4, 2024

employee retention software

Top 15 Employee Retention Software in 2024

employee development software

Top 10 Employee Development Software for Talent Growth

Apr 3, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

This website may not work correctly because your browser is out of date. Please update your browser .

Evaluation design

An evaluation design describes how data will be collected and analysed to answer the Key Evaluation Questions. 

There are different pathways for you as manager depending on who will develop the evaluation design. In most cases your evaluator will develop the evaluation design.  In some cases you will – if you have evaluation expertise and/or the evaluation design has already been developed (for example, in an evaluation that is intended to match an earlier evaluation).

Take into account the following important factors when developing an evaluation design

  • Available resources and constraints Resources including money, existing data, expertise, technical equipment. Constraints including requirements to use certain common indicators, limits to availability of key informants or barriers to accessing existing data.  Manager's guide to evaluation :  Consider the implications of the resources available and specific constraints

Diagram showing the three important factors flowing into appropriate evaluation methods and designs

If an  EVALUATOR  will develop the evaluation design

  • Engage a competent evaluation expert - internal, external or a combination. (See ' Select an evaluator / evaluation team ' for advice).
  • Work with the expert(s) to ensure they understand important factors that should be taken into account in the evaluation design (see section above)
  • The design should provide details of how data will be collected analysed. It is often useful to do this in the form of an Evaluation Matrix which shows how each Key Evaluation Question will be answered.

If   YOU  (as manager) will develop the evaluation design

  • Understand important factors that should be taken into account in the evaluation design (see section above)
  • Develop an evaluation design that addresses these important factors.
  • Summarise the design in the form of an Evaluation Matrix which shows how each Key Evaluation Question will be answered.

Subsequently, arrange for a  technical review of the evaluation design  and arrange for a  review of the design by the evaluation management structure  (e.g., steering committee). Ideally this will include representation from primary intended users.

Arranging technical review of the evaluation design

Before finalizing the design, it can be helpful to have a technical review of it by one or more independent evaluators.  It might be necessary to involve more than one reviewer in order to provide expert advice on the specific methods proposed, including specific indicators and measures to be used.  Ensure that the reviewer is experienced in using a range of methods and designs, and well briefed on the context, to ensure they can provide situation specific advice.

Arranging review of the design by the evaluation management structure

In addition to being considered technically sound by experts, it is essential for the evaluation design to be seen as credible by those who are expected to use it.

Get formal organisational review and endorsement of the design by an evaluation steering committee (see ' Identify who will be involved in decisions and what their roles will be ' for possible structures, processes and terms of reference for a steering committee)

Where possible do data rehearsal of possible findings with primary intended. This is a powerful strategy for checking the appropriateness of the design by presenting mock-ups of tables, graphs and quotes that the design might produce. It is best to produce at least 2 different versions – one that would show the program working well and one that would show it not working.  Ideally the primary intended users of the evaluation will review these and either confirm the suitability of the design or request amendments to make the potential findings more relevant and credible. (For more information see Patton, MQ (2011) Essentials of Utilization-Focused Evaluation. pp. 309-321).

In this section

Consider important elements of what is being evaluated, consider important aspects of the evaluation, consider the implications of the resources available and specific constraints.

  • << Orient the evaluator / evaluation team
  • Consider important elements of what is being evaluated >>

Expand to view all resources related to 'Evaluation design'

  • Hiring an external evaluator

'Evaluation design' is referenced in:

Framework/guide.

  • Communication for Development (C4D) :  C4D: Develop planning documents (evaluation plans and M&E frameworks)
  • Communication for Development (C4D) :  C4D: Review R,M&E systems and studies (meta evaluation)
  • Manager's guide to evaluation :  Terms of reference

Back to top

© 2022 BetterEvaluation. All right reserved.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Step 3 of EBP: Part 1—Evaluating Research Designs

James w. drisko.

4 School for Social Work, Smith College, Northampton, MA USA

Melissa D. Grady

5 School of Social Service, Catholic University of America, Washington, DC, USA

Step 3 of the EBP process involves evaluating the quality and client relevance of research results you have located to inform treatment planning. While some useful clinical resources include careful appraisals of research quality, clinicians must critically evaluate the content both included in these summaries and what is excluded or omitted from them. For individual research studies, clinicians must first identify and evaluate the research designs and methods reported. The terminology used to describe research designs in EBM/EBP may not always be consistent with that used in most social work research courses. This chapter provides a review of the key research designs used in EBM and EBP in order to orient clinicians to core terminology found in EBP summaries and reports.

Once you have located some research that can help answer your practice question, Step 3 in the evidence-based medicine (EBM) and evidence-based practice (EBP) decision-making model is to appraise the quality of this research. An initial inspection of materials should help differentiate those that are generally relevant for your purposes from those that are not. Relevance may be initially determined by examining the research question that each study addresses. Studies should have clear and relevant research questions, fitting your practice needs. Once these “apparently relevant” studies are identified, the appraisal shifts to issues of research methodology. Even studies that appear quite relevant initially may later on prove to have important limitations as the details of their methods are explored.

Evaluating the quality of research reports can be a complex process. It involves several components. We will begin by reviewing research designs used in EBP. While many of these designs should be familiar to social workers, they may be described using different terminologies in EBM and EBP research reports (Drisko, 2011 ). Chapter 10.1007/978-3-030-15224-6_7 will review several other methodological steps in appraising research (sampling, defining the treatment or other intervention, test and measures, and statistics). These provide the basis for examining meta-analysis and systematic reviews, two widely used methods for aggregating research results in EBM and EBP, examined in Chap. 10.1007/978-3-030-15224-6_8.

Research design is the first methodological issue a clinical social worker must identify in appraising the quality of a research study. A research design is the orienting plan that shapes and organizes a research project. Different research designs are used for research projects with distinct goals and purposes. Sometimes this is a researcher-determined choice, and other times practical and ethical issues force the use of specific research designs. In EBM/EBP, research designs are one key part of appraising study quality.

While all clinical social workers are introduced to research methods as part of their required course work, most do not make much use of this knowledge after graduation. Doing EBP, however, will require that clinical social workers and other mental health professionals make greater use of their knowledge about evaluating research for practice.

Research designs are so important to EBM/EBP that this chapter will focus on them exclusively. Other very important—and very closely related—aspects of research methods will be examined in the following chapter (sampling, measures, definitions of treatments, and analysis). Our goal is to provide a useful refresher and reference for clinical social workers. For readers who have a basic grasp of research designs and methods, this chapter can serve as a brief review and resource. Still, some terminology, drawn from medicine, will no doubt be unfamiliar. For others who need only an update, this chapter offers it. Many excellent follow-up resources are identified in each section of the chapter.

Research Designs

This review of research designs has three main purposes. First, it will introduce the variety of terminology used in EBP research, which is often drawn from medical research. This terminology sometimes differs from the terminology used in most social work research texts that draw on social sciences research terminology. Second, the strengths and limitations of each research design are examined and compared. Third, the research designs are rank ordered from “strongest” to “weakest” following the EBM/EBP research hierarchy. This allows readers to quickly understand why some research designs are favored in the EBM/EBP literature.

Thyer ( 2011 ) states, quite accurately, that the EBP practice decision-making process does not include any hierarchy of research designs. This is indeed correct. The EBP practice decision-making process states that clinicians should use the “best available evidence.” It does not state that only the results of research with certain types of research designs are to be valued. That is, it is entirely appropriate to use the results of case study research or even “practice wisdom” when no better evidence is available. Yet many organizations and institutions make quite explicit that there is a de facto hierarchy of evidence within EBP. This hierarchy is even clearly stated in the early writing of Dr. Archie Cochrane ( 1972 ), who promoted the use of experimental research knowledge to inform contemporary practice decision-making. Littell ( 2011 ) notes that the Cochrane Collaboration publishes “empty reviews” that report no research results deemed to be of sufficient design quality to guide practice decision-making. This practice contradicts the idea of identifying the best available evidence. In effect, the best available evidence is reduced to evidence generated by experimental research designs. This practice creates confusion about what constitutes the best available evidence for clinicians, policy planners, and researchers.

Some EBP/EBM authors do not report all the best available evidence, but instead report only the experimental evidence that they deem worthy of guiding practice. They make this choice because only well-designed experiments allow attribution of causal relationships to say that an intervention caused observed changes with minimal error. Still, this practice represents some academic and economic politics within EBP research summaries. As discussed in Chap. 10.1007/978-3-030-15224-6_2, there are good arguments for and against this position, but it is not entirely consistent with the stated EBM/EBP practice decision-making model. Clinical social workers should be aware that this difference in viewpoints about the importance of research design quality is not always clearly stated in the EBP literature. Critical, and well-informed, thinking by the clinician is always necessary.

Research designs differ markedly. They have different purposes, strengths, and limitations. Some seek to explore and clarify new disorders or concerns and to illustrate innovative practices. Others seek to describe the characteristics of client populations. Some track changes in clients over time. Still others seek to determine if a specific intervention caused a specific change. While we agree that the EBP practice decision-making process states that clinicians should use “the best available evidence” and not solely evidence derived from experimental results, we will present research designs in a widely used hierarchy drawn from the Oxford University’s Centre for Evidence-Based Medicine ( 2009 , 2016 ). This hierarchy does very clearly give greater weight to experimental, randomized controlled trial [RCT] research results. It should be seen as representing a specific point of view, applied for specific purposes. At the same time, such research designs do provide a strong basis for arguing that a treatment caused any changes found, so long as the measures are appropriate, valid, and reliable and the sample tested is of adequate size and variety. Due to the strong interval validity offered by experimental research designs, results based on RCTs design are often privileged in EBM/EBP reports. We will begin this listing with the experimental research designs that allow causal attribution. We will then progress from experiments to quasi-experiments, then move to observational or descriptive research, and end with case studies. The organization of this section follows the format of the research evidence hierarchy created by Oxford University’s Centre for Evidence-Based Medicine ( 2009 , 2100; 2016 , 2018).

Types of Clinical Studies

Part 1: experimental studies or rcts.

EBP researchers view properly conceptualized and executed experimental studies. These are also called randomized controlled trials or RCTs. RCTs provide internally valid empirical evidence of treatment effectiveness. They are prospective in nature as they start at the beginning of treatment and follow changes over time (Anastas, 1999 ). Random assignment of participants symmetrically distributes potential confounding variables and sources of error to each group. Probability samples further provide a suitable foundation for most statistical analytic procedures.

The key benefit of an experimental research design is that they minimize threats to internal validity (Campbell & Stanley, 1963 ). This means the conclusions of well-done experiments allow researchers to say an intervention caused the observed changes. This is why experiments are highly regarded in the EBM/EBP model. The main limitations of experiments are their high cost in money, participation, effort, and time. They may be ethically inappropriate for some studies where random assignment is inappropriate. A final disadvantage is that volunteers willing to participate may not reflect clinical populations well. This may lead to bias in external validity or how well results from controlled experiments can be generalized to less controlled practice settings (Oxford Centre for Evidence-Based Medicine, 2019 ).

In the European medical literature, experiments and quasi-experiments may alternately be called analytic studies . This is to distinguish them from descriptive studies that, as the name implies, simply describe clinical populations. Analytic studies are those that quantify the relationship between identified variables. Such analytic studies fit well with the PICO or PICOT treatment decision-making model (Oxford Centre for Evidence-Based Medicine, 2019 ).

The Randomized Controlled Trial (RCT) or Classic Experiment

It is a quantitative, prospective, group-based study based on primary data from the clinical environment (Solomon, Cavanaugh, & Draine, 2009 ). Researchers randomly assign individuals who have the same disorder or problem at the start to one of two (or more) groups. Later, the outcomes for each group are compared at the completion of treatment. Since researchers create the two groups by random assignment to generate two very similar groups, the RCT is sometimes called a parallel group design . Usually one group is treated and the other is used as an untreated control group. Researchers sometimes use placebo interventions with the control group. However, researchers may alternately design experiments comparing two or more different treatments where one has been previously demonstrated to produce significantly better results than does an untreated control group. Pre- to post-comparisons demonstrate the changes for each group. Comparison of post-scores across the treated groups allows for demonstration of any greater improvement due to the treatment. Follow-up comparisons may also be undertaken, but this is not a requirement of an experiment.

The experiment or RCT can be summarized graphically as:

where R stands for random assignment of participants, O 1 stands from the pretest assessment (most often with a standardized measure), X represents the intervention given to just one group, and O 2 stands for the posttest, done after treatment, but using the same measure. There may also be additional follow-up posttests to document how results vary over time. These would be represented as O 3, O 4, etc. There may be two or more groups under comparison in an RCT. Further, more than one measure of outcome may be used in the same experiment.

In medical studies, particularly of medications or devices, it is possible to blind participants, clinicians, and even researchers to their experimental group assignments. The goal is to reduce differences in expectancies that might lead to different outcomes. In effect, either conscious or unconscious bias is limited to strengthen the internal validity of the study results. A double blind RCT design keeps even group assignments unknown to participants and to the treating clinicians. Single blind experiments keep only the participants unaware of group assignments. Blinding is more possible where placebo pills or devices can be used to hide the nature of the intervention. Blinding is much more difficult in mental health and social service research where interactions between clients and providers over time are common.

While blinding is common in EBM studies of medications and devices, it is rare in mental health research. There is, however, research that shows that clinical practitioners and researchers may act consciously or unconsciously to favor treatment theories and models that they support (Dana & Loewenstein, 2003 ). This phenomenon is known as attribution bias , in which people invested in a particular theory or treatment model view it more positively than do others. Attribution bias may work consciously or unconsciously to influence study implementation and results. In turn, it is stronger research evidence if clinicians and researchers who do outcome studies are not the originators or promoters of the treatment under study.

The American Psychological Association standards for empirically supported treatments (ESTs) require that persons other than the originators of a treatment do some of the outcome studies used to designate an EST. That is, at least one study not done by the originator of a treatment is required for the EST label. How clinician and researcher biases are assessed in the EBM/EBP model is less clear. However, most Cochrane and Campbell Collaboration systematic reviews do assess and evaluate the potential for bias when the originators of treatments are the only sources of outcome research on their treatments (Higgins & Green, 2018 ; Littell, Corcoran, & Pillai, 2008 ). In addition, all Cochrane and Campbell Collaboration systematic reviews must include a statement of potential conflicts of interest by each of the authors.

It is important to keep in mind that experiments may have serious limitations despite their use of a “strong” research design. Sample size is one such issue. Many clinical studies compare small groups (roughly under 20 people in a group). Studies using small samples may lack the statistical power to identify any differences across the groups correctly and fully. That is, for group differences to be identified, a specific sample size is required. The use of an experimental research design alone does not mean that the results will always be valid and meaningful. (We will examine issue beyond research design that impacts research quality later in the next two chapters.) Still, done carefully, the experimental research design or RCT has many merits in allowing cause-effect attribution.

The CONSORT Statement ( 2010 ) established standards for the reporting of RCTs. CONSORT is an acronym for “CONsolidated Standards of Reporting Trials.” The people who make up the CONSORT group are an international organization of physicians, researchers, methodologists, and publishers. To aid in the reporting of RCTs, CONSORT provides a free 37-item checklist for reporting or assessing the quality of RCTs online at http://www.consort-statement.org/ . The CONSORT Statement is available in many different languages. The CONSORT group also provides a free template for a flow chart of the RCT process and statement. These tools can be very helpful to the consumer of experimental research since they serve as guides for assessing the quality of RCTs. A CONSORT flow chart (also called a Quorum chart) is often found in published reports of recent RCTs.

The Randomized Crossover Clinical Trial

It is a prospective, group-based, quantitative, experimental study based on primary data from the clinical environment. Individuals with the same disorder, most often of a chronic or long-term type, are randomly assigned to one of two groups, and treatment is begun for both groups. After a designated period of treatment (sufficient to show positive results), groups are assessed and a “washout” phase is begun in which all treatments are withheld. After the washout period is completed, the treatments for the groups are then switched so that each group receives both treatments. After the second treatment is completed, a second assessment is undertaken. Comparison of outcomes for each treatment at both end points allows for determination of treatment effectiveness on the same groups of patients/clients for both treatments. This strengthens the internal validity of the study. A comparison of active treatment outcomes for all patients is possible. However, if the washout period is not sufficient, there may be carry-over effects from the initial treatment that in turn undermines the validity of the second comparison. Used with medications, there are often lab tests that allow determination of effective washout periods. Secondary effects, such as learning or behavior changes that occur during the initial treatment, may not wash out. Similarly, it may not be possible to wash out learned or internalized cognitions, skills, attitudes, or behaviors. This is a limitation of crossover research designs in mental health and social services.

The merit of crossover designs is that each participant serves as his or her own control which reduces variance due to individual differences among participants. This may also allow use of smaller sample sizes while generating a large enough sample to demonstrate differences, known as statistical power. All participants receive both treatments, which benefits them. Random assignment provides a solid foundation for statistical tests. Disadvantage of crossover studies includes that all participants receive a placebo or less effective treatment at some point which may not benefit them immediately. Further, washout periods can be lengthy and curtail active treatment for the washout period. Finally, crossover designs cannot be used where the effects of treatment are permanent, such as in educational programs or surgeries.

Crossover trials may also be undertaken with single cases (rather than groups of participants). These are called single-case crossover trials. The basic plan of the single-case crossover trial mimics that used for groups but is used with just a single case. The crossover trial may be represented graphically as:

where A 1 stands for the initial assessment, B 1 represents the first intervention given, the second A 2 represents the next assessment which is made at the end of the first intervention after washout, and B 2 stands for second type of intervention or the crossover. Finally, A 3 represents the assessment of the second intervention done when it is completed. Note that a washout period is not specifically included in this design but may be if the researchers chose to do so. Comparison of treatment outcomes for each intervention with the initial baseline assessment allows determination of the intervention effects. More than one measure may be used in the same crossover study.

Since random assignment is not possible with single cases, the results of single-case crossover studies are often viewed as “weaker” than are group study results. However, each individual, each case, serves as its own control. Since the same person is studied, there is usually little reason to assume confounding variables arise due to physiologic changes, personal history, or social circumstances.

It is possible to aggregate the results of single-case designs. This is done by closely matching participants and replicating the single-case study over a number of different participants and settings. This model is known as replication logic , in which similar outcomes over many cases build confidence in the results (Anastas, 1999 ). It is in contrast to sampling logic used in group experimental designs in which potentially confounding variables are assumed to be equally distributed across the study groups through random assignment of participants. In replication logic, repetition over many cases is assumed to include and address potentially confounding variables. If treatment outcomes are positive over many cases, treatment effectiveness may be inferred. In EBM, single-case studies are not designated as providing strong research evidence, but consistent findings from more than ten single-case study outcomes are rated as strong evidence in the American Psychological Association’s designation of empirically supported treatments (ESTs).

The Randomized Controlled Laboratory Study

It is a prospective, group, quantitative, experimental study based on laboratory rather than direct clinical data. These are called analog studies since the lab situation is a good, but not necessarily perfect, replication of the clinical situation. Laboratory studies are widely used in “basic” research since all other variables of influences except the one under study can be controlled or identified. This allows testing of single variables but is unlike the inherent variation found in real-world clinical settings. Randomized controlled laboratory studies are often conducted on animals where genetics can be controlled or held constant. Ethical issues, of course, limit laboratory tests on humans. Applying the results of laboratory studies in clinical practice has some limitations, as single, “pure” forms of disorders or problems are infrequent and contextual factors can impact of treatment delivery and outcome.

Effectiveness vs. Efficacy Studies: Experiments Done in Different Settings

In mental health research, a distinction is drawn between clinical research done in the real-world clinical settings and that done much more selectively for research purposes. Experimental studies done in everyday clinical practice setting are called effectiveness studies. Such studies have some potentially serious limitations in that they often include comorbid disorders and may not be able to ensure that treatments are provided fully and consistently. This reduces their interval validity for research purposes. On the other hand, using real-world settings enhances their external validity, meaning that the results are more likely to fit with actual practice with everyday clients and settings. In contrast, more carefully controlled studies that ensure experimental study of just a single disorder are known as efficacy studies. Efficacy studies carefully document that a fully applied treatment for a single, carefully screened disorder is effective (or are not effective).

One well-known example of a clinical efficacy study is the NIMH Cross-site Study of Depression (Elkin, Shea, Watkins, et al., 1989 ). This study rigorously compared medication and two forms of psychotherapy for depression. Strict exclusion criteria targeted only people with depression and no other comorbid disorders. Medication “washouts” were required of all participants. Such efficacy studies emphasize internal validity; they focus on showing that the treatment alone caused any change. The limitations of applying efficacy studies results are that real-world practice settings may not be able to take the time and effort needed to identify only clients with a single disorder. Such efforts might make treatment unavailable to people with comorbid disorders, which may not be practical or ethical in many clinical settings. Further, the careful monitoring of treatment fidelity required in efficacy studies may not be possible to provide in many clinical settings (often for reasons of funding and time).

Efficacy studies are somewhat like laboratory research, but the similarity is not quite exact since they are done in clinical settings, just with extra steps. Efficacy studies add an extra measure of rigor to clinical research. They do show with great precision that a treatment works for a specific disorder. However, results of efficacy studies may be very difficult to apply fully in everyday clinical practice (given its ethical, funding, and practical limitations).

Part 2: Quasi-experimental and Cohort Studies—Comparisons Without Random Participant Assignment

Random assignment of participants to treated versus control groups is a way to strengthen internal validity and to limit bias in research results. Random assignment ideally generates (two or more) equivalent groups for the comparison of treatment effects versus an untreated control group. Quasi-experimental research designs lack random assignment but do seek to limit other threats to the interval validity of study results. They are often used where random assignment is unethical or is not feasible for practical reasons.

The Quasi-experimental Study or Cohort Study

In studies of clinical practice in mental health, it is sometimes unethical or impractical to randomly assign participants to treated or control groups. For example, policy-makers may only fund a new type of therapy or a new prevention program for a single community or with payment by only certain types of insurance. In such situations, researchers use existing groups or available groups to examine the impact of interventions. The groups, settings, or communities to be compared are chosen to be as similar as possible in their key characteristics. The goal is to approximate the equivalent groups created by random assignment. Where pre- and post-comparisons are done on such similar groups, such a research design is called a quasi-experiment. The key difference from a true experiment is the lack of random assignment of participants to the treated or control groups.

The quasi-experiment can be summarized graphically as:

Once again, O 1 stands from the pretest assessment (most often with a standardized measure), X represents the intervention given to just one group, and O 2 stands for the posttest, done after treatment, but using the same measure. More than two groups may be included in a quasi-experimental study. There may also be additional follow-up posttests to document how results vary over time. More than one measure may be used in the same quasi-experiment. Note carefully that the key difference from a true experiment is the lack of random assignment of participants.

The lack of random assignment in a quasi-experiment introduces some threats to the internal validity of the study. That is, it may introduce unknown differences across the groups that ultimately affect study outcomes. The purpose of random assignment is to distribute unknown variables or influences to each groups as equally as possible. Without random assignment, the studied groups may have important differences that are not equally distributed across the groups. Say, for example, that positive social supports interact with a treatment to enhance its outcome. Without random assignment, the treated group might be biased in that it includes more people with strong social supports than does the control group. The interaction of the treatment with the impact of social supports might make the results appear better than they might have been if random assignment was used. Thus in some EBM/EBP hierarchies of research evidence, quasi-experimental study results are rated as “weaker” than are results of true experiments or RCTs. That said, they are still useful sources of knowledge and are often the best available research evidence for some treatments and service programs. To reduce potential assignment bias, quasi-experimental studies use “matching” in which as many characteristics of participants in each group are matched as closely as possible. Of course, matching is only possible where the variables are fully known at the start of the study.

Advantages of quasi-experimental or cohort studies include their ethical appropriateness in that participants are not assigned to groups and can make their own personal treatment choices on an informed basis. Cohort studies are usually less expensive in cost than are true experiments, though they may both be financially costly. Disadvantages of cohort studies are potentially confounding variables may be operative but unknown. Further, comparison groups can be difficult to identify. For rare disorders, large samples are required which can be difficult to obtain and may take a long time to complete.

The “All or None” Study

The Centre for Evidence-Based Medicine at Oxford University ( 2009 , B13) includes in its rating of evidence the “All or None” research design. This is a research design in which, in very difficult circumstances, clinicians give an intervention to a group of people at high risk of serious harm or death. If essentially all the people who received the intervention improve or survive, while those who do not receive it continue to suffer or die, the inference is that the intervention caused the improvement. This is actually an observational research design, but given the nature of the groups compared, all or none results are viewed as strong evidence that the treatment caused the change. However, given their very important effects, such research results are highly valued so long as all or a large fraction of people who receive the intervention improve. Such designs fit crisis medical issues much better than most mental health issues, so all or none design is extremely rare in the mental health literature. They do have a valuable role in informing practice in some situations.

Part 3: Non-interventive Research Designs and Their Purposes

Not all practice research is intended to show an intervention causes a change. While EBM/EBP hierarchies of research evidence rank most highly, those research designs that do show an intervention cause a change, even these studies stand on a foundation built from the results of other types of research. In the EBM/EBP hierarchy, clinicians are reminded that exploratory and descriptive research may not be the best evidence on which to make practice decisions. At the same time, exploratory and descriptive research designs are essential in setting the stage for rigorous and relevant experimental research. These types of studies may also be the “best available evidence” for EBP if experiments are lacking or are of poor quality. Critical thinking is crucial to determining just what constitutes “the best available evidence” in any clinical situation.

The Observational Study

It is a prospective, longitudinal, usually quantitative, tracking study of groups or of individuals with a single disorder or problem (Kazdin, 2010 ). Researchers follow participants over time to assess the course (progression) of symptoms. Participants may be either untreated or treated with a specified treatment. People are not randomly assigned to treated or control groups. Because participants may differ on unknown or unidentified variables, observational studies have potential for bias due to the impact of these other variables. That is, certain variables such as genetic influences or nutrition or positive social support may lead to different outcomes for participants receiving the same treatment (or even no treatment). Some scholars view observational studies as a form of descriptive clinical research that is very helpful in preparing the way for more rigorous experimental studies.

The Longitudinal Study

It is a prospective, quantitative and/or qualitative, observational study ideally based on primary data, tracking a group in which members have had, or will have, exposure or involvement with specific variables. For example, researchers might track the development of behavioral problems among people following a specific natural disaster or the development of children living in communities with high levels of street violence. In medicine, researchers might track people exposed to the SARs virus. Longitudinal studies help identify the probability of occurrence of a given condition or need within a population over a set time period. While such variables are often stressors, cohort studies may also be used to track responses to positive events, such as inoculation programs or depression screen programs.

Graphically a longitudinal study can be represented as:

Here the X stands for exposure to a risk factor and O stands for each assessment. The exposure or event X may either mark the start of the study or may occur while assessments are ongoing. Participants are not randomly assigned which may introduce biases. Note, too, that there is no control or comparison group though studies of other people without the target exposure can serve as rough comparison groups.

In contrast to experimental studies with random assignment, participants in longitudinal studies may be selected with unknown strengths or challenges that, over time, affect the study results. Thus, confounding variables can influence longitudinal study results. Over time, loss of participants may also bias study results. For instance, if the more stressed participants dropout of a study, their loss may make the study results appear more positive than they would be if all participants continued to the study’s conclusion. Because longitudinal studies are prospective in design, rather than retrospective, they are often viewed as stronger than are case-control studies. Longitudinal studies do not demonstrate cause and effect relationships but can provide strong correlational evidence.

Case-Control Study

It is a retrospective, usually quantitative, observational study often based on secondary data (or data already collected, often for different initial purposes). Looking back in time, case-control studies compare the proportion of cases with a potential risk or resiliency factor against the proportion of controls that do not have the same factor. For example, people who have very poor treatment outcomes for their anxiety disorder may be compared with a closely matched group of people who had very positive outcomes. A careful look at their demographic characteristics, medical histories, and mental health histories might identify risk factors that distinguish most people in the two groups. Rare differences in risk or resiliency factors are often identified by such studies. Case-control studies are relatively inexpensive but are subject to multiple sources of bias if used to attribute “cause” to the risk or resiliency factors they identify.

Cross-Sectional Study or Incidence Study

These are descriptive, usually quantitative , studies of the relationship between disorders or problems and other factors at a single point in time. Incidence designs are used descriptively in epidemiology. They can be useful for learning baseline information on the incidence of disorders in specific areas. Cross-sectional studies are very valuable in a descriptive manner to policy planning, but do not demonstrate cause and effect relationships. They are not highly valued in the EBM/EBP research design hierarchy. An example of a cross-sectional study would be to look at the rate of poverty in a community during 1 month of the year. It is simply a snapshot picture of how many individuals would be classified as living in poverty during that month of the study. Comparing the number of persons in poverty with the total population of the community gives the incidence rate or prevalence rate for poverty.

The Case Series

It is a descriptive, observational study of a series of cases, typically describing the manifestations, clinical course, and prognosis of a condition. Both qualitative and quantitative data are commonly included. Case series can be used as exploratory research to identify the features and progression of a new or poorly understood disorder. They can be very useful in identifying culture-bound or context-specific aspects of mental health problems. Case series are inherently descriptive in nature, but they are most often based on small and nonrandom samples. The results of case series may not generalize to all potential patients/clients.

Despite its limitations, many scholars point out that the case series is the most frequently found research design in the clinical literature. It may be the type of study most like real-world practice and is a type of study practitioners can undertake easily. In some EBM/EBP research design hierarchies, the case series are among the least valued form of clinical evidence, as they do not demonstrate that an intervention caused a specific outcome. They nonetheless offer a valuable method for making innovative information about new disorders or problems and new treatment methods available at an exploratory and descriptive level.

One example of this type of research design is the Nurses’ Health Study (Colditz, Manson, & Hankinson, 1997 ). This is a study of female nurses who worked at Brigham and Women’s Hospital in Boston and who completed a detailed questionnaire every second years on their lifestyle, hormones, exercise, and more. Researchers did not intervene with these women in any way but have used the information compiled by the study over several decades to identify trends in women’s health. These results can then be generalized to other women or used to provide information on health trends that could be explored further through more intervention-based research (Colditz et al., 1997 ).

The Case Study (or Case Report)

It is a research design using descriptive but “anecdotal” evidence drawn from a single case. The data may be qualitative and/or quantitative. Case studies may be the best research design for the identification of new clinical disorders or problems. They can be very useful forms of exploratory clinical research. They usually include the description of a single case, highlighting the manifestations of the disorder, its clinical course, and outcomes of intervention (if any). Because case studies draw on the experiences of a single case, and often a single clinician, they are often labeled “anecdotal.” This differentiates evidence collected on multiple cases from that based on just a single case. Further, case study reports often lack the systematic pre- post-assessment found in single-case research designs. The main (and often major) limitation of the case study is that the characteristics of the single case may, or may not, be similar to other cases in different people and circumstances. Another key limitation is that reporting of symptoms, interventions, course of the problem, and outcomes may be piecemeal. This may be because the disorder is unfamiliar or unique in some way (making it worth publishing about), but since there are few widely accepted standards for case studies, authors provide very different kinds and quality of information to readers.

Case studies offer a valuable method for generating innovative information about new disorders or problems, even new treatment methods, available on an exploratory or formative basis. These ideas may become the starting point for future experimental studies.

We note again that case studies may be “best available evidence” found in an EBP search. If research based on other designs is not available, case study research may be used to guide practice decision-making.

Expert Opinion or Practice Wisdom

The EBM/EBP research design hierarchy reminds clinicians that expert opinion may not (necessarily) have a strong evidence base. This is not to say that the experiences of supervisors, consultants, and talented colleagues have no valuable role in practice. It is simply to point out that they are not always systematic and may not work well for all clients in all situations. As research evidence, unwritten expert opinion lacks planned and systematic testing and control for potential biases. This is why it is the least valued form of evidence in most EBM/EBP evidence hierarchies. Such studies may still be quite useful and informative to clinicians in specific circumstances. They serve to point to new ways of thinking and intervening that may be valuable to specific clinical situations and settings.

Resources on Research Design in EBP

Many textbooks offer good introductions to research design issues and offer more illustrations than we do in this chapter. Note, however, that the terminology used in EBM/EBP studies and summaries may not be the same as is used in core social work textbooks. Resources addressing issues in research design are found in Table 6.1 .

More resources on research design

This chapter has reviewed the range of research designs used in clinical research. The different types of research designs have different purposes and different strengths. These purposes range from exploratory, discovery-oriented purposes for the least structured designs like case studies to allowing attribution of cause and effect relationships for highly structured experimental designs. This chapter has also explored the research design terminology used in EBM/EBP. Some of this terminology draws heavily on medical research and may be unfamiliar to persons trained in social work or social science research. Still, most key research design concepts can be identified despite differences in terminology. The EBM/EBP research design hierarchy places great emphasis on research designs that can document that a specific treatment caused the changes found after treatment. This is an important step in determining the effectiveness or efficacy of a treatment. Many documents portray experiments, or RCTs, as the best form of evidence upon which to base practice decisions. Critical consumers of research should pay close attention to the kind of research designs used in the studies they examine for practice application.

Key reviews of outcome research on a specific topic, such as those from the Cochrane Collaboration and Campbell Collaboration, use research design as a key selection criterion for defining high-quality research results. That is, where little or no experimental or RCT research is available, the research summary may indicate there is inadequate research knowledge to point to effective treatments. “Empty” summaries pointing to no high-quality research evidence on some disorders are found in the Cochrane Review database. This reflects their high standards and careful review. It also fails to state just what constitutes the best available evidence. Empty reviews do not aid clinicians and clients in practice decision-making. They simply indicate that clinicians should undertake an article-by-article review of research evidence on their clinical topic. Clinicians must bear in mind that the EBP practice decision-making process promotes the use of “the best available evidence.” If such evidence is not based on experimental research, it should still be used, but used with caution. It is entirely appropriate in the EBP framework to look for descriptive or case study research when there is no experimental evidence available on a specific disorder or concern.

Even when experimental or RCT research designs set the framework for establishing cause and effect relationships, a number of related methodological choices also are important to making valid knowledge claims. These include the quality of sampling, the inclusion of diverse participants in the sample, the quality of the outcome measures used, the definitions of the treatments, and the careful use of the correct statistical tests. Adequate sample size and representativeness are important to generalizing study results to other similar people and settings. Appropriately conceptualized, valid, reliable, and sensitive outcome measures document any changes. How treatments are defined and delivered will have a major impact on the merit and worth of study results. Statistics serve as a decision-making tool to determine if the results are unlikely to have happened by chance alone. All these methods work in tandem to yield valid and rigorous results. These issues will be explored in the next two chapters on Step 3 of the EBP process, further appraising some additional methodological issues in practice research.

  • Anastas JW. Research design for social work and the human services. 2. New York: Columbia University Press; 1999. [ Google Scholar ]
  • Campbell D, Stanley J. Experimental and quasi-experimental designs for research. New York: Wadsworth; 1963. [ Google Scholar ]
  • Cochrane A. Effectiveness and efficiency: Random reflections on health services. London: Nuffield Provincial Hospitals Trust; 1972. [ Google Scholar ]
  • Colditz G, Manson J, Hankinson S. The Nurses’ health study: 20-year contribution to the understanding of health among women. Journal of Women's Health. 1997; 6 (1):49–62. doi: 10.1089/jwh.1997.6.49. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • CONSORT Group. (2010). The CONSORT statement . Retrieved from http://www.consort-statement.org/home/
  • Dana J, Loewenstein G. A social science perspective on gifts to physicians from industry. Journal of the American Medical Association. 2003; 290 (2):252–255. doi: 10.1001/jama.290.2.252. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Drisko J. Researching clinical practice. In: Brandell J, editor. Theory and practice in clinical social work. 2. Thousand Oaks, CA: Sage; 2011. pp. 717–738. [ Google Scholar ]
  • Elkin I, Shea T, Watkins J, et al. National Institute of Mental Health treatment of depression collaborative research program: General effectiveness of treatments. Archives of General Psychiatry. 1989; 46 :971–982. doi: 10.1001/archpsyc.1989.01810110013002. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins J., & Green, S. (Eds.). (2018). Cochrane Handbook for Systematic Reviews of Interventions Version 6.0 [updated September 2018]. The Cochrane Collaboration, 2011. Available from http://handbook.cochrane.org
  • Kazdin A. Research design in clinical psychology. 5. New York: Pearson; 2010. [ Google Scholar ]
  • Littell, J. (2011, January 15). Evidence-based practice versus practice-based research . Paper presented at the Society for Social Work and Research, Tampa, FL.
  • Littell J, Corcoran J, Pillai V. Systematic reviews and meta-analysis. New York: Oxford University Press; 2008. [ Google Scholar ]
  • Oxford University Centre for Evidence-based Medicine. (2009, April). Asking focused questions . Retrieved from http://www.cebm.net/index.aspx?o=1036
  • Oxford University Centre for Evidence-based Medicine. (2016, May). The Oxford levels of evidence 2.1 . Retrieved from https://www.cebm.net/2016/05/ocebm-levels-of-evidence/
  • Oxford Centre for Evidence-Based Medicine. (2019). Study designs. Retrieved from https://www.cebm.net/2014/04/study-designs/
  • Solomon P, Cavanaugh M, Draine J. Randomized controlled trails. New York: Oxford University Press; 2009. [ Google Scholar ]
  • Thyer, B. (2011, January 15). Evidence-based practice versus practice-based research . Paper presented at the Society for Social Work and Research, Tampa, FL.

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 12, Issue 1
  • Research designs for studies evaluating the effectiveness of change and improvement strategies
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • M Eccles 1 ,
  • J Grimshaw 2 ,
  • M Campbell 3 ,
  • 1 Centre for Health Services Research, University of Newcastle upon Tyne, Newcastle upon Tyne, UK
  • 2 Clinical Epidemiology Programme, Ottawa Health Research Institute, Ottawa, Canada
  • 3 Health Services Research Unit, University of Aberdeen, Aberdeen, UK
  • Correspondence to:
 Professor M Eccles, Professor of Clinical Effectiveness, Centre for Health Services Research, 21 Claremont Place, Newcastle upon Tyne NE2 4AA, UK; martin.eccles{at}ncl.ac.uk

The methods of evaluating change and improvement strategies are not well described. The design and conduct of a range of experimental and non-experimental quantitative designs are considered. Such study designs should usually be used in a context where they build on appropriate theoretical, qualitative and modelling work, particularly in the development of appropriate interventions. A range of experimental designs are discussed including single and multiple arm randomised controlled trials and the use of more complex factorial and block designs. The impact of randomisation at both group and individual levels and three non-experimental designs (uncontrolled before and after, controlled before and after, and time series analysis) are also considered. The design chosen will reflect both the needs (and resources) in any particular circumstances and also the purpose of the evaluation. The general principle underlying the choice of evaluative design is, however, simple—those conducting such evaluations should use the most robust design possible to minimise bias and maximise generalisability.

  • quality improvement research
  • clinical trial design

https://doi.org/10.1136/qhc.12.1.47

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

There is a substantial literature about the design, conduct, and analysis of evaluations of relatively simple healthcare interventions such as drugs. However, the methods of evaluating complex interventions such as quality improvement interventions are less well described. Evaluation informs the choice between alternative interventions or policies by identifying, estimating and, if possible, valuing the advantages and disadvantages of each. 1

There are a number of quantitative designs that could be used to evaluate quality improvement interventions (box 1).

Box 1 Possible quantitative evaluative designs for quality improvement research

Randomised designs.

Individual patient randomised controlled trials

Cluster randomised trials

Non-randomised designs

Uncontrolled before and after studies, controlled before and after studies, time series designs.

All of these designs attempt to establish general causal relationships across a population of interest. The choice of design will be dependent upon the purpose of the evaluation and the degree of control the researchers have over the delivery of the intervention(s). In general, researchers should choose a design that minimises potential bias (any process at any stage of inference which tends to produce results or conclusions that differ systematically from the truth; also referred to as internal validity) and maximises generalisability (the degree to which the results of a study hold true for situations other than those pertaining to the study, in particular for routine clinical practice; also referred to as external validity). 2, 3

A FRAMEWORK FOR EVALUATING QUALITY IMPROVEMENT INTERVENTIONS

Campbell and colleagues 4 have suggested that the evaluation of complex interventions should follow a sequential approach involving:

development of the theoretical basis for an intervention;

definition of components of the intervention (using modelling, simulation techniques or qualitative methods);

exploratory studies to develop further the intervention and plan a definitive evaluative study (using a variety of methods);

definitive evaluative study (using quantitative evaluative methods, predominantly randomised designs).

This framework demonstrates the interrelation between quantitative evaluative methods and other methods; it also makes explicit that the design and conduct of quantitative evaluative studies should build upon the findings of other quality improvement research. However, it represents an idealised framework and, in some circumstances, it is necessary to undertake evaluations without sequentially working through the earlier stages—for example, when evaluating policy interventions that are being introduced without prior supporting evidence.

In this paper we describe quantitative approaches for evaluating quality improvement interventions, focusing on methods for estimating the magnitude of the benefits. We also focus on the evaluation of interventions within systems rather than evaluations of whole systems. We discuss several study designs for definitive evaluative studies including a range of randomised controlled trial designs and three non-randomised or quasi-experimental evaluative designs.

EVALUATIVE DESIGNS

Randomised trials are the gold standard method for evaluating healthcare interventions. 5 They estimate the impact of an intervention through direct comparison with a randomly allocated control group that either receives no intervention or an alternative intervention. 6 The randomisation process is the best way of ensuring that both known and (particularly importantly) unknown factors (confounders) that may independently affect the outcome of an intervention are likely to be distributed evenly between the trial groups. As a result, differences observed between groups can be more confidently ascribed to the effects of the intervention rather than to other factors. The same arguments that are used to justify randomised controlled trials of clinical interventions such as drugs are at least as salient to the evaluations of quality improvement interventions. In particular, given our incomplete understanding of potential confounders relating to organisational or professional performance, it is even more difficult to adjust for these in non-randomised designs.

Cluster randomisation

While it is possible to conduct randomised trials of quality improvement interventions which randomise individual patients, this may not always be ideal. If there is the possibility that the treatment given to control individuals will be affected by an organisation’s or professional’s experience of applying the intervention to other patients in the experimental group, there is a risk of contamination. For example, Morgan et al 7 investigated the effects of computerised reminders for antenatal care. Patients were randomised and physicians received reminders for intervention patients but not control patients. Compliance in intervention patients rose from 83% to 98% over 6 months, while compliance in control patients rose from 83% to 94% over 12 months. This is a probable contamination effect.

If such contamination is likely, the researcher should consider randomising organisations or healthcare professionals rather than individual patients, although data may still be collected about the process and outcome of care at the individual patient level. Such trials, which randomise at one level (organisation or professional) and collect data at a different level (patient), are known as cluster randomised trials. 8, 9 Cluster randomisation has considerable implications for the design, power, and analysis of studies which have frequently been ignored.

Design considerations

The main design considerations concern the level of randomisation and whether to include baseline measurement. Frequently researchers need to trade off the likelihood of contamination at lower levels of randomisation against decreasing numbers of clusters and increasing logistical problems at higher levels of randomisation. For example, in a study of an educational intervention in secondary care settings, potential levels of randomisation would include the individual clinician, the ward, the clinical service or directorate, and the hospital. Randomisation at the level of the hospital would minimise the risk of contamination but dramatically increase the size and complexity of the study due to the greater number of hospitals required. Randomisation at the level of the individual clinician would decrease the number of hospitals required but there may then be a risk of contamination across clinicians working in the same wards or specialty areas.

In situations where relatively few clusters (e.g. hospitals) are available for randomisation, there is increased danger of imbalance in performance between study and control groups due to the play of chance. Baseline measurements can be used to assess adequacy of the allocation process and are also useful because they provide an estimate of the magnitude of a problem. Low performance scores before the intervention may indicate that performance is poor and there is much room for improvement, whereas high performance scores may indicate that there is little room for improvement (ceiling effect). In addition, baseline measures could be used as a stratifying or matching variable or incorporated in the analysis to increase statistical power (see below). These potential benefits have to be weighed against the increased costs and duration of studies incorporating baseline measurements and concerns about testing effects (introduction of potential bias due to sensitisation of the study subjects during baseline measurement). 2

Sample size calculation

A fundamental assumption of the standard statistics used to analyse patient randomised trials is that the outcome for an individual patient is completely unrelated to that for any other patient—they are said to be “independent”. This assumption is violated, however, when cluster randomisation is adopted because two patients within any one cluster are more likely to respond in a similar manner than are two patients from different clusters. For example, the management of patients in a single hospital is more likely to be consistent than management of patients across a number of hospitals. The primary consequence of adopting a cluster randomised design is that it is not as statistically efficient and has lower statistical power than a patient randomised trial of equivalent size.

Sample sizes for cluster randomised trials therefore need to be inflated to adjust for clustering. A statistical measure of the extent of clustering is known as the “intracluster correlation coefficient” (ICC) which is based on the relationship of the between-cluster to within-cluster variance. 10 Table 1 shows a number of ICCs from a primary care study of computerising guidelines for patients with either asthma or stable angina (box 4).

  • View inline

ICCs for medical record and prescribing data.

Both the ICC and the cluster size influence the inflation required; the sample size inflation can be considerable especially if the average cluster size is large. The extra numbers of patients required can be achieved by increasing either the number of clusters in the study (the more efficient method 11 ) or the number of patients per cluster. In general, little additional power is gained from increasing the number of patients per cluster above 50. Researchers often have to trade off the logistical difficulties and costs associated with recruitment of extra clusters against those associated with increasing the number of patients per cluster. 12

Analysis of cluster randomised trials

There are three general approaches to the analysis of cluster randomised trials: analysis at cluster level; the adjustment of standard tests; and advanced statistical techniques using data recorded at both the individual and cluster level. 9, 13, 14 Cluster level analyses use the cluster as the unit of randomisation and analysis. A summary statistic (e.g. mean, proportion) is computed for each cluster and, as each cluster provides only one data point, the data can be considered to be independent, allowing standard statistical tests to be used. Patient level analyses can be undertaken using adjustments to simple statistical tests to account for the clustering effect. However, this approach does not allow adjustment for patient or practice characteristics. Recent advances in the development and use of new modelling techniques to incorporate patient level data allow the inherent correlation within clusters to be modelled explicitly, and thus a “correct” model can be obtained. These methods can incorporate the hierarchical nature of the data into the analysis. For example, in a primary care setting we may have patients (level 1) treated by general practitioner (level 2) nested within practices (level 3) and may have covariates measured at the patient level (e.g. patient age or sex), the general practitioner level (e.g. sex, time in practice), and at the practice level (e.g. practice size). Which of the methods is better to use is still a topic of debate. The main advantage of such sophisticated statistical methods is their flexibility. However, they require extensive computing time and statistical expertise, both for the execution of the procedures and in the interpretation of the results.

No consensus exists as to which approach should be used. The most appropriate analysis option will depend on a number of factors including the research question; the unit of inference; the study design; whether the researchers wish to adjust for other relevant variables at the individual or cluster level (covariates); the type and distribution of outcome measure; the number of clusters randomised; the size of cluster and variability of cluster size; and statistical resources available in the research team. Campbell et al 15 and Mollison et al 16 present worked examples comparing these different analytical strategies.

Possible types of cluster randomised trials

Two arm trials.

The simplest randomised design is the two arm trial where each subject is randomised to study or control groups. Observed differences in performance between the groups are assumed to be due to the intervention. Such trials are relatively straightforward to design and conduct and they maximise statistical power (half the sample is allocated to the intervention and half to the control). However, they only provide information about the effectiveness of a single intervention compared with control (or the relative effectiveness of two interventions without reference to a control). Box 2 shows an example of a two arm trial.

Box 2 Two arm trial 17

The trial aimed to assess whether the quality of cardiovascular preventive care in general practice could be improved through a comprehensive intervention implemented by an educated outreach visitor. After baseline measurements, 124 general practices (in the southern half of the Netherlands) were randomly allocated to either intervention or control. The intervention, based on the educational outreach model, comprised 15 practice visits over a period of 21 months and addressed a large number of issues around task delegation, availability of instruments and patient leaflets, record keeping, and follow up routines. Twenty one months after the start of the intervention, post-intervention measurements were performed. The difference between ideal and actual practice in each aspect of organising preventive care was defined as a deficiency score. The primary outcome measure was the difference in deficiency scores before and after the intervention. All practices completed both baseline and post-intervention measurements. The difference in change between intervention and control groups, adjusted for baseline, was statistically significant (p<0.001) for each aspect of organising preventive care. The largest absolute improvement was found for the number of preventive tasks performed by the practice assistant.

Multiple arm trials

The simplest extension to the two arm trial is to randomise groups of professionals to more than two groups—for example, two or more study groups and a control group. Such studies are relatively simple to design and use, and allow head-to-head comparisons of interventions or levels of intervention under similar circumstances. These benefits are, however, compromised by a loss of statistical power; for example, to achieve the same power as a two arm trial, the sample size for a three arm trial needs to be increased by up to 50%.

Factorial designs

Factorial designs allow the evaluation of the relative effectiveness of more than one intervention compared with control. For example, in a 2 × 2 factorial design evaluating two interventions against control, participants are randomised to each intervention (A and B) independently. In the first randomisation the study participants are randomised to intervention A or control. In the second randomisation the same participants are randomised to intervention B or control. This results in four groups: no intervention, intervention A alone, intervention B alone, interventions A and B.

During the analysis of factorial designs it is possible to undertake independent analyses to estimate the effect of the interventions separately 18 ; essentially this design allows the conduct of two randomised trials for the same sample size as a two arm trial. However, these trials are more difficult to operationalise and analyse, they provide only limited power for a direct head-to-head comparison of the two interventions, and the power is diminished if there is interaction between the two interventions. Box 3 shows an example of a factorial design trial that was powered to be able to detect any interaction effects.

Box 3 Factorial design 19

The trial evaluated the effectiveness of audit and feedback and educational reminder messages to change general practitioners’ radiology ordering behaviour for lumber spine and knee x rays. The design was a before and after pragmatic cluster randomised controlled trial using a 2 × 2 factorial design involving 244 practices and six radiology departments in two geographical regions. Each practice was randomised twice, to receive or not each of the two interventions. Educational reminder messages were based on national guidelines and were provided on the report of every relevant x ray ordered during the 12 month intervention period. For example, the lumbar spine message read “In either acute (less than 6 weeks) or chronic back pain, without adverse features, x ray is not routinely indicated”. The audit and feedback covered the preceding 6 month period and was delivered to individual general practitioners at the start of the intervention period and again 6 months later. It provided practice level information relating the number of requests made by the whole practice relative to the number of requests made by all practices in the study. Audit and feedback led to a non-significant reduction of around 1% x ray requests while educational reminder messages led to a relative reduction of about 20% x ray requests.

Balanced incomplete block designs

In guideline implementation research there are a number of non-specific effects which may influence the estimate of the effect of an intervention. These could be positive attention effects from participants knowing that they are the subject of a study, or negative demotivation effects from being allocated to a control rather than an intervention group. Currently, these non-specific effects are grouped together and termed the “Hawthorne effect”. If these are imbalanced across study groups in a quality improvement trial, the resulting estimates of effects may be biased and, as these effects can potentially be of the same order of magnitude as the effects that studies are seeking to demonstrate, there is an advantage to dealing with them systematically. While these effects may difficult to eliminate, balanced incomplete block designs can be used to equalise such non-specific effects and thereby minimise their impact. 18 An example is shown in box 4.

Box 4 Balanced incomplete block design 20

This study was a before and after pragmatic cluster randomised controlled trial using a 2 × 2 incomplete block design and was designed to evaluate the use of a computerised decision support system (CDSS) in implementing evidence based clinical guidelines for the primary care management of asthma in adults and angina. It was based in 60 general practices in the north east of England and the participants were general practitioners and practice nurses in the study practices and their patients aged 18 years or over and with angina or asthma. The practices were randomly allocated to two groups. The first group received computerised guidelines for the management of angina and provided intervention patients for the management of angina and control patients for the management of asthma. The second received computerised guidelines for the management of asthma and provided intervention patients for the management of asthma and control patients for the management of angina. The outcome measures were adherence to the guidelines, determined by recording of care in routine clinical records, and any subsequent impact measured by patient reported generic and condition specific measures of outcome. There were no significant effects of CDSS on consultation rates, process of care measures (including prescribing), or any quality of life domain for either condition. Levels of use of the CDSS were low.

As doctors in both groups were subject to the same level of intervention, any non-specific effects are equalised across the two groups leaving any resulting difference as being due to the intervention.

Quasi-experimental designs

Quasi-experimental designs are useful where there are political, practical, or ethical barriers to conducting a genuine (randomised) experiment. Under such circumstances, researchers have little control over the delivery of an intervention and have to plan an evaluation around a proposed intervention. A large number of potential designs have been summarised by Campbell and Stanley 2 and Cook and Campbell. 3 Here we discuss the three most commonly used designs in quality improvement studies: (1) uncontrolled before and after studies, (2) controlled before and after studies, and (3) time series designs.

Uncontrolled before and after studies measure performance before and after the introduction of an intervention in the same study site(s) and observed differences in performance are assumed to be due to the intervention. They are relatively simple to conduct and are superior to observational studies, but they are intrinsically weak evaluative designs because secular trends or sudden changes make it difficult to attribute observed changes to the intervention. There is some evidence to suggest that the results of uncontrolled before and after studies may overestimate the effects of quality improvement-like interventions. Lipsey and Wilson 21 undertook an overview of meta-analyses of psychological, educational and behavioural interventions. They identified 45 reviews that reported separately the pooled estimates from controlled and uncontrolled studies, and noted that the observed effects from uncontrolled studies were greater than those from controlled studies. In general, uncontrolled before and after studies should not be used to evaluate the effects of quality improvement interventions and the results of studies using such designs have to be interpreted with great caution.

In controlled before and after studies the researcher attempts to identify a control population of similar characteristics and performance to the study population and collects data in both populations before and after the intervention is applied to the study population. Analysis compares post-intervention performance or change scores in the study and control groups and observed differences are assumed to be due to the intervention.

While well designed before and after studies should protect against secular trends and sudden changes, it is often difficult to identify a comparable control group. Even in apparently well matched control and study groups, performance at baseline often differs. Under these circumstances, “within group” analyses are often undertaken (where change from baseline is compared within both groups separately and where the assumption is made that, if the change in the intervention group is significant and the change in the control group is not, the intervention has had an effect). Such analyses are inappropriate for a number of reasons. Firstly, the baseline imbalance suggests that the control group is not truly comparable and may not experience the same secular trends or sudden changes as the intervention group; thus any apparent effect of the intervention may be spurious. Secondly, there is no direct comparison between study and control groups. 2 Another common analytical problem in practice is that researchers fail to recognise clustering of data when interventions are delivered at an organisational level and data are collected at the individual patient level.

Time series designs attempt to detect whether an intervention has had an effect significantly greater than the underlying secular trend. 3 They are useful in quality improvement research for evaluating the effects of interventions when it is difficult to randomise or identify an appropriate control group—for example, following the dissemination of national guidelines or mass media campaigns (box 5). Data are collected at multiple time points before and after the intervention. The multiple time points before the intervention allow the underlying trend and any cyclical (seasonal) effects to be estimated, and the multiple time points after the intervention allow the intervention effect to be estimated while taking account of the underlying secular trends.

Box 5 Time series analysis 22

An interrupted time series using monthly data for 34 months before and 14 months after dissemination of the guidelines was used to evaluate the effect of postal dissemination of the third edition of the Royal College of Radiologists’ guidelines on general practitioner referrals for radiography. Data were abstracted for the period April 1994 to March 1998 from the computerised administrative systems of open access radiological services provided by two teaching hospitals in one region of Scotland. A total of 117 747 imaging requests from general practice were made during the study period. There were no significant effects of disseminating the guidelines on the total number of requests or 18 individual tests. If a simple before and after study was used, then we would have erroneously concluded that 11 of the 18 procedures had significant differences.

The most important influence on the analysis technique is the number of data points collected before the intervention. It is necessary to collect enough data points to be convinced that a stable estimate of the underlying secular trend has been obtained. There are a number of statistical techniques that can be used depending on the characteristics of the data, the number of data points available, and whether autocorrelation is present, 3 Autocorrelation refers to the situation whereby data points collected close in time are likely to be more similar to each other than to data points collected far apart. For example, for any given month the waiting times in hospitals are likely to be more similar to waiting times in adjacent months than to waiting times 12 months previously. Autocorrelation has to be allowed for in analysis and time series regression models, 23 and autoregressive integrated moving averages (ARIMA) modelling 3 and time series regression models 23 are all methods for dealing with this problem.

Well designed time series evaluations increase the confidence with which the estimate of effect can be attributed to the intervention, although the design does not provide protection against the effects of other events occurring at the same time as the study intervention, which might also improve performance. Furthermore, it is often difficult to collect sufficient data points unless routine data sources are available. It has been found that many published time series studies have been inappropriately analysed, frequently resulting in an overestimation of the effect of the intervention. 24, 25

Randomised trials should only be considered when there is genuine uncertainty about the effectiveness of an intervention. Whilst they are the optimal design for evaluating quality improvement interventions, they are not without their problems. They can be logistically difficult, especially if the researchers are using complex designs to evaluate more than one intervention or if cluster randomisation—requiring the recruitment of large numbers of clusters—is planned. They are undoubtedly methodologically challenging and require a multidisciplinary approach to adequately plan and conduct. They can also be time consuming and expensive; in our experience a randomised trial of a quality improvement intervention can rarely be completed in less than 2 years.

Critics of randomised trials frequently express concerns that tight inclusion criteria of trials or artificial constraints placed upon participants limit the generalisability of the findings. While this is a particular concern in efficacy (explanatory) studies of drugs, it is likely to be less of a problem in quality improvement evaluations that are likely to be inherently pragmatic. 26 Pragmatic studies aim to test whether an intervention is likely to be effective in routine practice by comparing the new procedure against the current regimen; as such they are the most useful trial design for developing policy recommendations. Such studies attempt to approximate normal conditions and do not attempt to equalise contextual factors and other effect modifiers in the intervention and study groups. In pragmatic studies, the contextual and effect modifying factors therefore become part of the interventions. Such studies are usually conducted on a predefined study population and withdrawals are included within an “intention to treat” analysis; all subjects initially allocated to the intervention group would be analysed as intervention subjects irrespective of whether they received the intervention or not. For example, in an evaluation of a computerised decision support system as a method of delivering clinical guidelines in general practice (box 4), some general practitioners may not have had the computing skills to work the intervention. In an intention to treat analysis, data from all general practitioners would be included in the analysis irrespective of whether they could use the system or not; as a result, the estimates of effect would more likely reflect the effectiveness of the intervention in real world settings.

The main limitation of quasi-experimental designs is that the lack of randomised controls threatens internal validity and increases the likelihood of plausible rival hypotheses. Cook and Campbell 3 provide a framework for considering the internal validity of the results of experiments and quasi experiments when trying to establish causality. They suggest that “Estimating the internal validity of a relationship is a deductive process in which the investigator has to systematically think through how each of the internal validity threats may have influenced the data. Then the investigator has to examine the data to test which relevant threats can be ruled out. . . . When all of the threats can plausibly be eliminated it is possible to make confident conclusions about whether a relationship is probably causal.” Within quasi experiments there are potentially greater threats to internal validity and less ability to account for these. We believe that the design and conduct of quasi-experimental studies is at least as methodologically challenging as the design and conduct of randomised trials. Furthermore, there has been a lack of development of quasi-experimental methods since Cook and Campbell published their key text “Quasi-experimentation: design and analysis issues for field settings” in 1979. 27 The generalisability of quasi-experimental designs is also uncertain. Many quasi-experimental studies are conducted in a small number of study sites which may not be representative of the population to which the researcher wishes to generalise.

Key messages

Whatever design is chosen, it is important to minimise bias and maximise generalisability.

Quantitative designs should be used within a sequence of evaluation building as appropriate on preceding theoretical, qualitative, and modelling work.

There are a range of more or less complex randomised designs.

When using randomised designs it is important to consider the appropriate use of cluster, rather than individual, randomisation. This has implications for both study design and analysis.

Where randomised designs are not feasible, non-randomised designs can be used although they are more susceptible to bias.

CONCLUSIONS

We have considered a range of research designs for studies evaluating the effectiveness of change and improvement strategies. The design chosen will reflect both the needs (and resources) in any particular circumstances and also the purpose of the evaluation. The general principle underlying the choice of evaluative design is, however, simple—those conducting such evaluations should use the most robust design possible to minimise bias and maximise generalisability.

Acknowledgments

The Health Services Research Unit is funded by the Chief Scientist Office, Scottish Executive Department of Health. The views expressed are those of the authors and not the funding bodies.

  • ↵ Russell IT . The evaluation of a computerised tomography: a review of research methods. In: Culyer AJ, Horisberger B, eds. Economic and medical evaluation of health care technologies . Berlin: Springer-Verlag, 1983:38–68.
  • ↵ Campbell DT , Stanley J. Experimental and quasi-experimental designs for research . Chicago: Rand McNally, 1966.
  • ↵ Cook TD , Campbell DT. Quasi-experimentation: design and analysis issues for field settings . Chicago: Rand McNally, 1979.
  • ↵ Campbell M , Fitzpatrick R, Haines A, et al . Framework for design and evaluation of complex interventions to improve health. BMJ 2000 ; 321 : 694 –6. OpenUrl FREE Full Text
  • ↵ Cochrane AL . Effectiveness and efficiency: random reflections on health services. London: Nuffield Provincial Hospitals Trust, 1979.
  • ↵ Pocock SJ . Clinical trials: a practical approach . New York: Wiley, 1983.
  • ↵ Morgan M , Studney DR, Barnett GO, et al . Computerized concurrent review of prenatal care. Qual Rev Bull 1978 ; 4 : 33 –6.
  • ↵ Donner A , Klar N. Design and analysis of cluster randomization trials in health research . London: Arnold, 2000.
  • ↵ Murray DM . The design and analysis of group randomised trials . Oxford: Oxford University Press, 1998.
  • ↵ Donner A , Koval JJ. The estimation of intraclass correlation in the analysis of family data. Biometrics 1980 ; 36 : 19 –25. OpenUrl CrossRef PubMed Web of Science
  • ↵ Diwan VK , Eriksson B, Sterky G, et al . Randomization by group in the studying the effect of drug information in primary care. Int J Epidemiol 1992 ; 21 : 124 –30. OpenUrl Abstract / FREE Full Text
  • ↵ Flynn TN , Whitley E, Peters TJ. Recruitment strategies in a cluster randomised trial - cost implications. Stat Med 2002 ; 21 : 397 –405. OpenUrl CrossRef PubMed Web of Science
  • ↵ Donner A . Some aspects of the design and analysis of cluster randomization trials. Appl Stat 1998 ; 47 : 95 –113. OpenUrl
  • ↵ Turner MJ , Flannelly GM, Wingfield M, et al . The miscarriage clinic: an audit of the first year. Br J Obstet Gynaecol 1991 ; 98 : 306 –8. OpenUrl PubMed Web of Science
  • ↵ Campbell MK , Mollison J, Steen N, et al . Analysis of cluster randomized trails in primary care: a practical approach. Fam Pract 2000 ; 17 : 192 –6. OpenUrl Abstract / FREE Full Text
  • ↵ Mollison JA , Simpson JA, Campbell MK, et al . Comparison of analytical methods for cluster randomised trials: an example from a primary care setting. J Epidemiol Biostat 2000 ; 5 : 339 –48. OpenUrl PubMed
  • ↵ Lobo CM , Frijling BD, Hulscher MEJL, et al . Improving quality of organising cardiovascular preventive care in general practice by outreach visitors: a randomised controlled trial. Prevent Med 2003 (in press).
  • ↵ Cochran WG , Cox GM. Experimental design . New York: Wiley, 1957.
  • ↵ Eccles M , Steen N, Grimshaw J, et al . Effect of audit and feedback, and reminder messages on primary-care radiology referrals: a randomised trial. Lancet 2001 ; 357 : 1406 –9. OpenUrl CrossRef PubMed Web of Science
  • ↵ Eccles M , McColl E, Steen N, et al . A cluster randomised controlled trial of computerised evidence based guidelines for angina and asthma in primary care. BMJ 2002 ; 325 : 941 –7. OpenUrl Abstract / FREE Full Text
  • ↵ Lipsey MW , Wilson DB. The efficacy of psychological, educational, and behavioral treatment: confirmation from meta-analysis. Am Psychol 1993 ; 48 : 1181 –209. OpenUrl CrossRef PubMed
  • ↵ Matowe L , Ramsay C, Grimshaw JM, et al . Influence of the Royal College of Radiologists’ guidelines on referrals from general practice: a time series analysis. Clin Radiol 2002 ; 57 : 575 –8. OpenUrl CrossRef PubMed Web of Science
  • ↵ Ostrom CW . Time series analysis: regression techniques . London: Sage, 1990.
  • ↵ Grilli R , Ramsay CR, Minozi S. Mass media interventions: effects on health services utilisation In: Cochrane Collaboration. The Cochrane Library. Oxford: Update Software, 2002.
  • ↵ Grilli R , Freemantle N, Minozzi S, et al . Impact of mass media on health services utilisation. In: Cochrane Collaboration. The Cochrane Library . Issue 3. Oxford: Update Software, 1998.
  • ↵ Schwartz D , Lellouch J. Explanatory and pragmatic attitudes in clinical trials. J Chron Dis 1967 ; 20 : 648 . OpenUrl
  • ↵ Shaddish WR . The empirical program of quasi-expermentation. In Bickman, ed. Research design . Thousand Oaks: Sage, 2000.

Read the full text or download the PDF:

Evaluative Research Design Examples, Methods, And Questions For Product Managers

Evaluative Research Design Examples, Methods, And Questions For Product Managers cover

Looking for excellent evaluative research design examples?

If so, you’re in the right place!

In this article, we explore various evaluative research methods and best data collection techniques for SaaS product leaders that will help you set up your own research projects.

Sound like it’s worth a read? Let’s get right to it then!

  • Evaluative research gauges how well the product meets its goals at all stages of the product development process.
  • The purpose of generative research is to gain a better understanding of user needs and define problems to solve, while evaluative research assesses how successful your current product or feature is.
  • Evaluation research helps teams validate ideas and estimate how good the product or feature will be at satisfying user needs, which greatly increases the chances of product success .
  • Formative evaluation research sets the baseline for other kinds of evaluative research and assesses user needs.
  • Summative evaluation research checks how successful the outputs of the process are against its targets.
  • Outcome evaluation research evaluates if the product has had the desired effect on users’ lives.
  • Quantitative research collects and analyzes numerical data like satisfaction scores or conversion rates to establish trends and interdependencies.
  • Qualitative methods use non-numerical data to understand reasons for trends and user behavior.
  • You can use feedback surveys to collect both quantitative and qualitative data from your target audience.
  • A/B testing is a quantitative research method for choosing the best versions of a product or feature.
  • Usability testing techniques like session replays or eye-tracking help PMs and designers determine how easy and intuitive the product is to use.
  • Beta-testing is a popular technique that enables teams to evaluate the product or feature with real users before its launch .
  • Fake door tests are a popular and cost-effective validation technique.
  • With Userpilot, you can run user feedback surveys, and build user segments based on product usage data to recruit participants for interviews and beta-testing. Want to see how? Book the demo!

What is evaluative research?

Evaluative research, aka program evaluation or evaluation research, is a set of research practices aimed at assessing how well the product meets its goals .

It takes place at all stages of the product development process, both in the launch lead-up and afterward.

This kind of research is not limited to your own product. You can use it to evaluate your rivals to find ways to get a competitive edge.

Evaluative research vs generative research

Generative and evaluation research have different objectives.

Generative research is used for product and customer discovery . Its purpose is to gain a more detailed understanding of user needs , define the problem to solve, and guide product ideation .

Evaluative research, on the other hand, tests how good your current product or feature is. It assesses customer satisfaction by looking at how well the solution addresses their problems and its usability .

Why is conducting evaluation research important for product managers?

Ongoing evaluation research is essential for product success .

It allows PMs to identify ways to improve the product and the overall user experience. It helps you validate your ideas and determine how likely your product is to satisfy the needs of the target consumers.

Types of evaluation research methods

There are a number of evaluation methods that you can leverage to assess your product. The type of research method you choose will depend on the stage in the development process and what exactly you’re trying to find out.

Formative evaluation research

Formative evaluation research happens at the beginning of the evaluation process and sets the baseline for subsequent studies.

In short, its objective is to assess the needs of target users and the market before you start working on any specific solutions.

Summative evaluation research

Summative evaluation research focuses on how successful the outcomes are.

This kind of research happens as soon as the project or program is over. It assesses the value of the deliverables against the forecast results and project objectives.

Outcome evaluation research

Outcome evaluation research measures the impact of the product on the customer. In other words, it assesses if the product brings a positive change to users’ lives.

Quantitative research

Quantitative research methods use numerical data and statistical analysis. They’re great for establishing cause-effect relationships and tracking trends, for example in customer satisfaction.

In SaaS, we normally use surveys and product usage data tracking for quantitative research purposes.

Qualitative research

Qualitative research uses non-numerical data and focuses on gaining a deeper understanding of user experience and their attitude toward the product.

In other words, qualitative research is about the ‘why?’ of user satisfaction or its lack. For example, it can shed light on what makes your detractors dissatisfied with the product.

What techniques can you use for qualitative research ?

The most popular ones include interviews, case studies, and focus groups.

Best evaluative research data collection techniques

How is evaluation research conducted? SaaS PMs can use a range of techniques to collect quantitative and qualitative data to support the evaluation research process.

User feedback surveys

User feedback surveys are the cornerstone of the evaluation research methodology in SaaS.

There are plenty of tools that allow you to build and customize in-app and email surveys without any coding skills.

You use them to target specific user segments at a time that’s most suitable for what you’re testing. For example, you can trigger them contextually as soon as the users engage with the feature that you’re evaluating.

Apart from quantitative data, like the NPS or CSAT scores, it’s good practice to follow up with qualitative questions to get a deeper understanding of user sentiment towards the feature or product.

Evaluative Research Design Examples: in-app feedback survey

A/B testing

A/B tests are some of the most common ways of evaluating features, UI elements, and onboarding flows in SaaS. That’s because they’re fairly simple to design and administer.

Let’s imagine you’re working on a new landing page layout to boost demo bookings.

First, you modify one UI element at a time, like the position of the CTA button. Next, you launch the new version and direct half of your user traffic to it, while the remaining 50% of users still use the old version.

As your users engage with both versions, you track the conversion rate. You repeat the process with the other versions to eventually choose the best one.

Evaluative Research Design Examples: A/B testing

Usability testing

Usability testing helps you evaluate how easy it is for users to complete their tasks in the product.

There is a range of techniques that you can leverage for usability testing :

  • Guerilla testing is the easiest to set up. Just head over to a public place like a coffee shop or a mall where your target users hang out. Take your prototype with you and ask random users for their feedback.
  • In the 5-second test, you allow the user to engage with a feature for 5 seconds and interview them about their impressions.
  • First-click testing helps you assess how intuitive the product is and how easy it is for the user to find and follow the happy path.
  • In session replays you record and analyze what the users do in the app or on the website.
  • Eye-tracking uses webcams to record where users look on a webpage or dashboard and presents it in a heatmap for ease of analysis.

As with all the qualitative and quantitative methods, it’s essential to select a representative user sample for your usability testing. Relying exclusively on the early adopters or power users can skew the outcomes.

Beta testing

Beta testing is another popular evaluation research technique. And there’s a good reason for that.

By testing the product or feature prior to the launch with real users, you can gather user feedback and validate your product-market fit.

Most importantly, you can identify and fix bugs that could otherwise damage your reputation and the trust of the wider user population. And if you get it right, your beta testers can spread the word about your product and build up the hype around the launch.

How do you recruit beta testers ?

If you’re looking at expanding into new markets, you may opt for users who have no experience with your product. You can find them on sites like Ubertesters, in beta testing communities, or through paid advertising.

Otherwise, your active users are the best bet because they are familiar with the product and they are normally keen to help. You can reach out to them by email or in-app messages .

Evaluative Research Design Examples: Beta Testing

Fake door testing

Fake door testing is a sneaky way of evaluating your ideas.

Why sneaky? Well, because it kind of involves cheating.

If you want to test if there’s demand for a feature or product, you can add it to your UI or create a landing page before you even start working on it.

Next, paid adverts or in-app messages like the tooltip below, to drive traffic and engagement.

Evaluative Research Design Examples: Fake Door Test

By tracking engagement with the feature, it’s easy to determine if there’s enough interest in the functionality to justify the resources you would need to spend on its development.

Of course, that’s not the end. If you don’t want to face customer rage and fury, you must always explain why you’ve stooped down to such a mischievous deed.

A modal will do the job nicely. Tell them the feature isn’t ready yet but you’re working on it. Try to placate your users by offering them early access to the feature before everybody else.

In this way, you kill two birds with one stone. You evaluate the interest and build a list of possible beta testers .

Evaluative Research Design Examples: Fake Door Test

Evaluation research questions

The success of your evaluation research very much depends on asking the right questions.

Usability evaluation questions

  • How was your experience completing this task?
  • What technical difficulties did you experience while completing the task?
  • How intuitive was the navigation?
  • How would you prefer to do this action instead?
  • Were there any unnecessary features?
  • How easy was the task to complete?
  • Were there any features missing?

Product survey research questions

  • Would you recommend the product to your colleagues/friends?
  • How disappointed would you be if you could no longer use the feature/product?
  • How satisfied are you with the product/feature?
  • What is the one thing you wish the product/feature could do that it doesn’t already?
  • What would make you cancel your subscription?

How Userpilot can help product managers conduct evaluation research

Userpilot is a digital adoption platform . It consists of three main components: engagement, product analytics, and user sentiment layers. While all of them can help you evaluate your product performance, it’s the latter two that are particularly relevant.

Let’s start with the user sentiment. With Userpilot you can create customized in-app surveys that will blend seamlessly into your product UI.

Easy survey customization in Userpilot

You can trigger these for all your users or target particular segments.

Where do the segments come from? You can create them based on a wide range of criteria. Apart from demographics or JTBDs, you can use product usage data or survey results. In addition to the quantitative scores, you can also use qualitative NPS responses for this.

Segmentation is also great for finding your beta testers and interview participants. If your users engage with your product regularly and give you high scores in customer satisfaction surveys , they may be happy to spare some of their time to help you.

power-users-user-segments-userpilot-evaluative-research-design-examples

Evaluative research enables product managers to assess how well the product meets user and organizational needs, and how easy it is to use. When carried out regularly during the product development process, it allows them to validate ideas and iterate on them in an informed way.

If you’d like to see how Userpilot can help your business collect evaluative data, book the demo!

Leave a comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

Book a demo with on of our product specialists

Get The Insights!

The fastest way to learn about Product Growth,Management & Trends.

The coolest way to learn about Product Growth, Management & Trends. Delivered fresh to your inbox, weekly.

evaluation research design type

The fastest way to learn about Product Growth, Management & Trends.

You might also be interested in ...

How to create a product launch plan for saas companies.

Aazar Ali Shad

Feature Adoption 101: Definition, Metrics, Best Practices, and Tools

How to create a feature launch plan + best practices.

[email protected]

This paper is in the following e-collection/theme issue:

Published on 4.4.2024 in Vol 8 (2024)

Correction: Improving the Engagement of Underrepresented People in Health Research Through Equity-Centered Design Thinking: Qualitative Study and Process Evaluation for the Development of the Grounding Health Research in Design Toolkit

Authors of this article:

Author Orcid Image

Corrigenda and Addenda

  • Alessandra N Bazzano 1 , MPH, PhD   ; 
  • Lesley-Ann Noel 2 , PhD   ; 
  • Tejal Patel 1 , MA   ; 
  • C Chantel Dominique 3 , AA   ; 
  • Catherine Haywood 4 , BSW   ; 
  • Shenitta Moore 5 , MD   ; 
  • Andrea Mantsios 6 , PhD   ; 
  • Patricia A Davis 1 , BSc  

1 Department of Social, Behavioral, and Population Sciences, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, United States

2 College of Design, North Carolina State University, Raleigh, NC, United States

3 Global Impact Board, New Orleans, LA, United States

4 Louisiana Community Health Outreach Network, New Orleans, LA, United States

5 Louisiana State University Health Sciences Center, New Orleans, LA, United States

6 Public Health Innovation & Action, New York, NY, United States

Corresponding Author:

Alessandra N Bazzano, MPH, PhD

Department of Social, Behavioral, and Population Sciences

Tulane University School of Public Health and Tropical Medicine

1440 Canal St

New Orleans, LA, 70112

United States

Phone: 1 5049882338

Email: [email protected]

Related Article Correction of: hhttps://formative.jmir.org/2023/1/e43101 JMIR Form Res 2024;8:e58397 doi:10.2196/58397

In “Improving the Engagement of Underrepresented People in Health Research Through Equity-Centered Design Thinking: Qualitative Study and Process Evaluation for the Development of the Grounding Health Research in Design Toolkit” (JMIR Form Res 2023;7:e43101) the authors noted one error.

In Reference 28, the URL:

https://pcornet.org/about/

has been changed to:

https://www.pcori.org/engagement/engagement-resources/grid-toolkit

The correction will appear in the online version of the paper on the JMIR Publications website on April 4, 2024 together with the publication of this correction notice. Because this was made after submission to PubMed, PubMed Central, and other full-text repositories, the corrected article has also been resubmitted to those repositories.

This is a non–peer-reviewed article. submitted 14.03.24; accepted 14.03.24; published 04.04.24.

©Alessandra N Bazzano, Lesley-Ann Noel, Tejal Patel, C Chantel Dominique, Catherine Haywood, Shenitta Moore, Andrea Mantsios, Patricia A Davis. Originally published in JMIR Formative Research (https://formative.jmir.org), 04.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.

U.S. flag

An official website of the Department of Health & Human Services

  • Search All AHRQ Sites
  • Email Updates

Patient Safety Network

1. Use quotes to search for an exact match of a phrase.

2. Put a minus sign just before words you don't want.

3. Enter any important keywords in any order to find entries where all these terms appear.

  • The PSNet Collection
  • All Content
  • Perspectives
  • Current Weekly Issue
  • Past Weekly Issues
  • Curated Libraries
  • Clinical Areas
  • Patient Safety 101
  • The Fundamentals
  • Training and Education
  • Continuing Education
  • WebM&M: Case Studies
  • Training Catalog
  • Submit a Case
  • Improvement Resources
  • Innovations
  • Submit an Innovation
  • About PSNet
  • Editorial Team
  • Technical Expert Panel

Evaluation of the design and structure of electronic medication labels to improve patient health knowledge and safety: a systematic review.

Saif S, Bui TTT, Srivastava G, et al. Evaluation of the design and structure of electronic medication labels to improve patient health knowledge and safety: a systematic review. Syst Rev. 2024;13(1):12. doi:10.1186/s13643-023-02413-z.

Medication labels can be confusing to patients, leading to errors, incorrect usage, and early discontinuation of the drug. This review highlights how patient-centered design of medication labels can improve patient comprehension , safe usage, and adherence. Future research should focus on standardization, evaluation, and validation of content and display.

Recommendations and low-technology safety solutions following neuromuscular blocking agent incidents. February 3, 2016

Implementing patient and family involvement interventions for promoting patient safety: a systematic review and meta-analysis. February 17, 2021

Patient and family engagement as a potential approach for improving patient safety: a systematic review. October 30, 2019

Speaking up—when doctors navigate medical hierarchy. February 6, 2013

Effectiveness of a ‘do not interrupt’ vest intervention to reduce medication errors during medication administration: a multicenter cluster randomized controlled trial. October 13, 2021

Interventions targeted at reducing diagnostic error: systematic review. September 8, 2021

Exploring new avenues to assess the sharp end of patient safety: an analysis of nationally aggregated peer review data. October 8, 2014

Development and pilot evaluation of an electronic health record usability and safety self-assessment tool. August 24, 2022

Older folks in hospitals: the contributing factors and recommendations for incident prevention. September 3, 2014

E-learning on risk management. An opportunity for sharing knowledge and experiences in patient safety. January 30, 2019

Investigating the impact of intensive care unit interruptions on patient safety events and electronic health records use: an observational study. July 24, 2019

Hospital admission medication reconciliation in medically complex children: an observational study. April 21, 2010

Surgeon's vigilance in the operating room. August 3, 2011

A usability and safety analysis of electronic health records: a multi-center study. September 19, 2018

Medication errors: immunisation.  September 27, 2006

Toward understanding errors in inpatient psychiatry: a qualitative inquiry. April 14, 2010

Medication safety in the neonatal intensive care unit: big measures for our smallest patients. March 8, 2017

Adverse events in hospitalized pediatric patients. July 25, 2018

Using simulation to prepare nursing staff for the move to a new building. April 26, 2017

The concept of shared mental models in healthcare collaboration. December 11, 2013

Untenable expectations: nurses' work in the context of medication administration, error, and the organization. February 1, 2023

The FIRST curriculum: cultivating speaking up behaviors in the Clinical Learning Environment. September 25, 2019

Implementation and outcomes of a rapid response team. October 3, 2007

Worries and concerns experienced by nurse specialists during inter-hospital transports of critically ill patients: a critical incident study. April 14, 2010

Leader communication approaches and patient safety: an integrated model. May 20, 2015

Reducing hospital errors: interventions that build safety culture. February 6, 2013

Medication administration errors for older people in long-term residential care. February 1, 2012

Creating highly reliable accountable care organizations. September 27, 2017

The evolving literature on safety WalkRounds: emerging themes and practical messages. October 1, 2014

The effectiveness of management-by-walking-around: a randomized field study. April 23, 2014

Managing near-miss reporting in hospitals: the dynamics between staff members’ willingness to report and management’s handling of near-miss events. June 7, 2023

Improving patient safety by taking systems seriously. February 13, 2008

I-PASS, a mnemonic to standardize verbal handoffs. April 11, 2012

Effect of Patient and Family Centered I-PASS on adverse event rates in hospitalized children with complex chronic conditions. March 8, 2023

Teaching medical error disclosure to residents using patient-centered simulation training. January 8, 2014

Factors affecting attitudes and barriers to a medical emergency team among nurses and medical doctors: a multi-centre survey. February 25, 2015

Roles and role ambiguity in patient- and caregiver-performed outpatient parenteral antimicrobial therapy. November 20, 2019

Perceptions of hospital safety climate and incidence of readmission. December 15, 2010

Influence of unit-level staffing on medication errors and falls in military hospitals. May 18, 2011

Safety, performance, and satisfaction outcomes in the operating room: a literature review. July 11, 2018

Defining patient safety events in inpatient psychiatry. August 22, 2018

Patient involvement in medication safety in hospital: an exploratory study. July 16, 2014

Exploring system features of primary care practices that promote better providers' clinical work satisfaction: a qualitative comparative analysis. May 18, 2022

Workplace team resilience: a systematic review and conceptual development. June 10, 2020

Sensemaking, safety, and cooperative work in the intensive care unit. August 8, 2007

Computer-assisted bar-coding system significantly reduces clinical laboratory specimen identification errors in a pediatric oncology hospital. February 13, 2008

The AHRQ Report on Diagnostic Errors in the Emergency Department: the wrong answer to the wrong question. June 28, 2023

Understanding psychological safety in health care and education organizations: a comparative perspective. March 16, 2016

Social disparities in patient safety in primary care: a systematic review. August 22, 2018

Implementation of the safety huddle. February 8, 2017

Exploring the theory, barriers and enablers for patient and public involvement across health, social care and patient safety: a systematic review of reviews. February 10, 2021

The accuracy of the Global Trigger Tool is higher for the identification of adverse events of greater harm: a diagnostic test study. March 22, 2023

A 2-year study of patient safety competency assessment in 29 clinical laboratories. June 18, 2008

The computerized rounding report: implementation of a model system to support transitions of care. August 3, 2011

Quality of medication use in primary care—mapping the problem, working to a solution: a systematic review of the literature. October 21, 2009

Carers' medication administration errors in the domiciliary setting: a systematic review. February 8, 2017

"I am administering medication—please do not interrupt me": red tabards preventing interruptions as perceived by surgical patients. March 6, 2019

Effect of a pharmacist-led educational intervention on inappropriate medication prescriptions in older adults: the D-PRESCRIBE randomized clinical trial. November 21, 2018

Spreading a medication administration intervention organizationwide in six hospitals. February 15, 2012

Resident and nurse perspectives on the use of secure text messaging systems. October 19, 2022

An evolution of reporting: identifying the missing link. August 10, 2022

Racial bias in cesarean decision-making. May 10, 2023

Front-line staff perspectives on opportunities for improving the safety and efficiency of hospital work systems. June 18, 2008

Factors associated with the use of cognitive aids in operating room crises: a cross-sectional study of US hospitals and ambulatory surgical centers. May 9, 2018

Interdisciplinary team training identifies discrepancies in institutional policies and practices. June 22, 2011

An evaluation of shared mental models and mutual trust on general medical units: implications for collaboration, teamwork, and patient safety. November 29, 2017

Cost-effectiveness of a computerized provider order entry system in improving medication safety ambulatory care. July 16, 2014

Interprofessional model on speaking up behaviour in healthcare professionals: a qualitative study. May 25, 2022

Application of human factors methods to ensure appropriate infant identification and abduction prevention within the hospital setting. August 18, 2021

Unintended consequences of the electronic health record and cognitive load in emergency department nurses. October 18, 2023

Comparing rates of adverse events and medical errors on inpatient psychiatric units at Veterans Health Administration and community-based general hospitals. November 6, 2019

Knowledge-based errors in anesthesia: a paired, controlled trial of learning and retention. March 18, 2009

Patient safety climate in US hospitals: variation by management level. November 12, 2008

Variation in printed handoff documents: results and recommendations from a multicenter needs assessment. June 10, 2015

Assessment of changes in visits and antibiotic prescribing during the Agency for Healthcare Research and Quality Safety Program for Improving Antibiotic Use and the COVID-19 Pandemic. July 20, 2022

Teaching medical error apologies: development of a multi-component intervention. July 6, 2011

RN assessments of excellent quality of care and patient safety are associated with significantly lower odds of 30-day inpatient mortality: a national cross-sectional study of acute-care hospitals. July 20, 2016

Bringing perioperative emergency manuals to your institution: a "How To" from concept to implementation in 10 steps. April 3, 2019

Simulation exercises as a patient safety strategy: a systematic review. March 20, 2013

Adverse events in Veterans Affairs inpatient psychiatric units: staff perspectives on contributing and protective factors. September 20, 2017

Joy in medical practice: clinician satisfaction in the Healthy Work Place trial. November 1, 2017

Communication during interhospital transfers of emergency general surgery patients: a qualitative study of challenges and opportunities. October 26, 2022

Differences in hospitals' workplace violence incident reporting practices: a mixed methods study. May 18, 2022

Evaluation of the quality of 'do not use' medication abbreviation audits: a key enabler to successful implementation of audit and feedback. March 24, 2021

Barriers to and facilitators of bedside nursing handover: a systematic review and meta-synthesis. July 7, 2021

Infection preventionist checklist to improve culture and reduce central line–associated bloodstream infections. December 8, 2010

Improving patient safety in public hospitals: developing standard measures to track medical errors and process breakdowns. April 4, 2018

Using voluntary reports from physicians to learn from diagnostic errors in emergency medicine. November 18, 2015

Perspective: beyond counting hours: the importance of supervision, professionalism, transitions of care, and workload in residency training. July 18, 2012

Relationship of safety climate and safety performance in hospitals. April 1, 2009

From implementation to sustainment: a large-scale adverse event disclosure support program generated through embedded research in the Veterans Health Administration. August 18, 2021

Comparing safety climate in naval aviation and hospitals: implications for improving patient safety. April 21, 2010

Effective surgical safety checklist implementation. January 30, 2005

Best practices: an electronic drug alert program to improve safety in an accountable care environment. July 1, 2015

Operating room clinicians' attitudes and perceptions of a pediatric surgical safety checklist at 1 institution. March 9, 2016

Developing strategic recommendations for implementing smart pumps in advanced healthcare systems to improve intravenous medication safety. November 16, 2022

Keeping patients at risk for self-harm safe in the emergency department: a protocolized approach. December 2, 2020

Human factors and ergonomics at time of crises: the Italian experience coping with COVID19. July 22, 2020

Potentially inappropriate prescribing for elderly patients in 2 outpatient settings. June 14, 2006

Employee silence in health care: charting new avenues for leadership and management. August 3, 2022

Hospital testing of the effectiveness of co-designed educational materials to improve patient and visitor knowledge and confidence in reporting patient deterioration. February 14, 2024

Health literacy-informed communication to reduce discharge medication errors in hospitalized children: a randomized clinical trial. January 31, 2024

Analysis of a medication safety intervention in the pediatric emergency department. January 24, 2024

Parents' understanding of medication at discharge and potential harm in children with medical complexity. December 20, 2023

Measurement of ambulatory medication errors in children: a scoping review. December 13, 2023

Rethinking Patient Safety: A Discussion Guide for Patients, Healthcare Providers and Leaders. November 1, 2023

Patient Safety Primers

Digital Health Literacy

Flow of information contributing to medication incidents in home care- an analysis considering incident reporters' perspectives. October 25, 2023

Implementing strategies to prevent home medication administration errors in children with medical complexity. October 18, 2023

Primary care teams' reported actions to improve medication safety: a qualitative study with insights in high reliability organising. October 18, 2023

Five strategies for how patients and families can improve patient safety: World Patient Safety Day 2023. October 4, 2023

Pediatric ADHD medication errors reported to United States poison centers, 2000 to 2021. September 27, 2023

Handling polypharmacy--a qualitative study using focus group interviews with older patients, their relatives, and healthcare professionals. September 13, 2023

Perceptions and attitudes of pediatricians and families with regard to pediatric medication errors at home. September 6, 2023

Caregiver and clinician perspectives on discharge medication counseling: a qualitative study. July 12, 2023

Patient safety and sense of security when telemonitoring chronic conditions at home: the views of patients and healthcare professionals - a qualitative study. July 5, 2023

Towards safe conversational agents in healthcare. June 14, 2023

The impact of language barriers on patient care: a pharmacy perspective. May 24, 2023

Family conferences to facilitate deprescribing in older outpatients with frailty and with polypharmacy: the COFRAIL cluster randomized trial. May 10, 2023

Preventing home medication errors. April 12, 2023

Minimize medication errors in urgent care clinics. March 15, 2023

The effect of transitions intervention to ensure patient safety and satisfaction when transferred from hospital to home health care-a systematic review. March 8, 2023

Preventable harm because of outpatient medication errors among children with leukemia and lymphoma: a multisite longitudinal assessment. February 15, 2023

Reducing potential errors associated with insulin administration: an integrative review. December 14, 2022

Prevent administration of ear drops into the eyes. December 14, 2022

Using consumer engagement strategies to improve healthcare safety for young people: an exploration of the relevance and suitability of current approaches. November 30, 2022

The neglected barrier to medication use: a systematic review of difficulties associated with opening medication packaging. November 16, 2022

Deprescribing medicines in older people living with multimorbidity and polypharmacy: the TAILOR evidence synthesis. October 5, 2022

WebM&M Cases

The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review. August 31, 2022

Patient Safety Network

Connect With Us

LinkedIn

Sign up for Email Updates

To sign up for updates or to access your subscriber preferences, please enter your email address below.

Agency for Healthcare Research and Quality

5600 Fishers Lane Rockville, MD 20857 Telephone: (301) 427-1364

  • Accessibility
  • Disclaimers
  • Electronic Policies
  • HHS Digital Strategy
  • HHS Nondiscrimination Notice
  • Inspector General
  • Plain Writing Act
  • Privacy Policy
  • Viewers & Players
  • U.S. Department of Health & Human Services
  • The White House
  • Don't have an account? Sign up to PSNet

Submit Your Innovations

Please select your preferred way to submit an innovation.

Continue as a Guest

Track and save your innovation

in My Innovations

Edit your innovation as a draft

Continue Logged In

Please select your preferred way to submit an innovation. Note that even if you have an account, you can still choose to submit an innovation as a guest.

Continue logged in

New users to the psnet site.

Access to quizzes and start earning

CME, CEU, or Trainee Certification.

Get email alerts when new content

matching your topics of interest

in My Innovations.

evaluation research design type

RSC Medicinal Chemistry

Design, synthesis and antibacterial activity evaluation of ebselen derivatives in ndm-1 producing bacteria.

New Delhi-β-lactamase-1 (NDM-1) is a type of one of metal-β-lactamase. NDM-1-expressing bacteria can spread rapidly across the globe via plasmid transfer, which greatly undermines the clinical efficacy of the last of the carbapenem class of antibiotics. Research on NDM-1 inhibitors has attracted extensive attention. However, there are currently no clinically available NDM-1 inhibitors. Our research group has reported that 1,2-benzisoselenazol-3(2H)-one derivatives as covalent NDM-1 inhibitors can restore the efficacy of Meropenem(Mem) against NDM-1 producing strains. In this study, 22 compounds were designed and synthesized, which restored the Mem susceptibility of NDM-1-expressing Escherichia coli. and its minimum inhibitory concentration (MIC) was reduced by 2-16 times. Representative compound A4 showed significant synergistic antibacterial activity against NDM-1-producing carbapenem-resistant enterobacteriaceae (CRE) isolates. In vitro, NDM-1 enzyme inhibitory activity test showed that IC50 was 1.26 ± 0.37 μM, which had low cytotoxicity. When combined with meropenem, it showed good combined antibacterial activity. Electrospray ionization mass spectrometry (ESI-MS) analysis demonstrates that compound A4 covalently binds to NDM-1 enzyme. In summary, compound A4 is a potent NDM-1 covalent inhibitor and provides a potential lead compound for drug development in resistant bacteria.

Supplementary files

  • Supplementary information PDF (2719K)

Article information

Download citation, permissions.

evaluation research design type

W. Meng, C. Liu, G. Wu, Z. Bai, Z. Wang, S. Chen, S. Wan and W. Liu, RSC Med. Chem. , 2024, Accepted Manuscript , DOI: 10.1039/D4MD00031E

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page .

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page .

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author.

This article has not yet been cited.

Advertisements

ORIGINAL RESEARCH article

Design of hybrid deep learning using tsa with ann for cost evaluation in the plastic injection industry provisionally accepted.

  • 1 King Mongkut's University of Technology North Bangkok, Thailand

The final, formatted version of the article will be published soon.

The plastic injection industry is one of the most extensively used mass-production technology and has been continuously increasing in recent years. Cost evaluation is essential in corporate operations to increase market share and lead in plastic parts pricing. The complexity of the plastic parts and manufacturing data resulted in a long data waiting time and inaccurate cost evaluation. Therefore, the aim of this research is to apply a cost evaluation approach that hybrid deep learning of Tunicate Swarm Optimization (TSA) with Artificial Neural Network (ANN) for cost evaluation of complicated surface products in the plastic injection industry to achieve a faster convergence rate for optimal solutions and higher accuracy. The methodology entails ANN, which applies feature-based extraction of 3D-model complicated surface products to develop a cost evaluation model. TSA is used to construct the initial weight into the learning model of ANN, which can generate faster-to-convergent optimal solutions and higher accuracy. The result shows that the new hybrid deep learning TSA with ANN provides a more accurate cost evaluation than ANN. The prediction accuracy of cost evaluation is approximately 96.66% for part cost and 93.75% for mould cost. The contribution of this research is the development of a new hybrid deep learning combining TSA with ANN that includes the calculation of the number of hidden layers specifically for complicated surface products which are unavailable in the literature. The cost evaluation approach can be practically applied and is accurate for complicated surface products in the plastic injection industry.

Keywords: deep learning, artificial neural network, Tunicate swarm optimization, Cost evaluation, Plastic injection, Complicated Surface Products

Received: 11 Nov 2023; Accepted: 03 Apr 2024.

Copyright: © 2024 Kengpol and Tabkosai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: PhD. Pornthip Tabkosai, King Mongkut's University of Technology North Bangkok, Bang Sue District, 10800, Thailand

People also looked at

COMMENTS

  1. Evaluation Research Design: Examples, Methods & Types

    Evaluation Research Methodology. There are four major evaluation research methods, namely; output measurement, input measurement, impact assessment and service quality. Output measurement is a method employed in evaluative research that shows the results of an activity undertaking by an organization.

  2. Design and Implementation of Evaluation Research

    Evaluation has its roots in the social, behavioral, and statistical sciences, and it relies on their principles and methodologies of research, including experimental design, measurement, statistical tests, and direct observation. What distinguishes evaluation research from other social science is that its subjects are ongoing social action programs that are intended to produce individual or ...

  3. Choose an Evaluation Design

    Choose an Appropriate Evaluation Design. Once you've identified your questions, you can select an appropriate evaluation design. Evaluation design refers to the overall approach to gathering information or data to answer specific research questions. There is a spectrum of research design options—ranging from small-scale feasibility studies ...

  4. Section 4. Selecting an Appropriate Design for the Evaluation

    If your results are to be reliable, you have to give the evaluation a structure that will tell you what you want to know. That structure - the arrangement of discovery- is the evaluation's design. The design depends on what kinds of questions your evaluation is meant to answer. Some of the most common evaluation (research) questions:

  5. What Is a Research Design

    A research design is a strategy for answering your research question using empirical data. Creating a research design means making decisions about: Your overall research objectives and approach. Whether you'll rely on primary research or secondary research. Your sampling methods or criteria for selecting subjects. Your data collection methods.

  6. Choose an Evaluation Design « Pell Institute

    A research design is simply a plan for conducting research. It is a blueprint for how you will conduct your program evaluation. Selecting the appropriate design and working through and completing a well thought out logic plan provides a strong foundation for achieving a successful and informative program evaluation.

  7. Program Evaluation Guide

    Step 3: Focus the Evaluation Design. Introduction to Program Evaluation for Public Health Programs: A Self-Study Guide. After completing Steps 1 and 2, you and your stakeholders should have a clear understanding of the program and have reached consensus. Now your evaluation team will need to focus the evaluation.

  8. PDF Evaluation Models, Approaches, and Designs

    The question this type of evaluation addresses ... The following are brief descriptions of the most commonly used evaluation (and research) designs. One-Shot Design.In using this design, the evaluator gathers data following an intervention or program. For example, a survey of participants might be ...

  9. Evaluative research: Key methods, types, and examples

    Evaluative research is a type of research used to evaluate a product or concept, and collect data that helps improve your solution. ... is conducted early and often during the design process to test and improve a solution before arriving at the final design. Running a formative evaluation allows you to test and identify issues in the solutions ...

  10. Chapter 2

    We consider the following methodological principles to be important for developing high-quality evaluations: Giving due consideration to methodological aspects of evaluation quality in design: focus, consistency, reliability, and validity. Matching evaluation design to the evaluation questions. Using effective tools for evaluation design.

  11. PDF RESEARCH DESIGN IN EVALUATION: CHOICES AND ISSUES

    process and product public, evaluation is clearly a research activity. It is, however, a research activity with particular characteristics which distinguish it from other forms of research. Jamieson [1] has suggested that evaluation differs from academic research in two important ways. The first concerns the choice of research questions.

  12. Evaluation and Study Designs for Implementation and Quality Improvement

    In this type of evaluation as shown in Figure 25, comparisons are made between health care units receiving an implementation strategy condition versus those not receiving the strategy which serve as a control or implementation as usual group. This is a design well-suited for testing a quality improvement innovation or implementation strategy ...

  13. Evaluative research: Methods, types, and examples (2024)

    Evaluative research assesses the effectiveness and usability of products or services. It involves gathering user feedback to measure performance and identify areas for improvement. Product owners and user researchers employ evaluative research to make informed decisions. Users' experiences and preferences are actively observed and analyzed to ...

  14. PDF RESEARCH DESIGNS FOR PROGRAM EVALUATIONS

    research designs in an evaluation, and test different parts of the program logic with each one. These designs are often referred to as patched-up research designs (Poister, 1978), and usually, they do not test all the causal linkages in a logic model. Research designs that fully test the causal links in logic models often

  15. Evaluation Research: Definition, Methods and Examples

    Evaluation research is a type of applied research, and so it is intended to have some real-world effect. Many methods like surveys and experiments can be used to do evaluation research. The process of evaluation research consisting of data analysis and reporting is a rigorous, systematic process that involves collecting data about organizations ...

  16. PDF Step 5: Choose Evaluation Design and Methods

    of research design and methods. Regardless of your level of expertise, as you plan your evaluation it may be useful to involve another evaluator with advanced training in evaluation and research design and methods. Whether you are a highly Design and Methods Design refers to the overall structure of the evaluation: how indicators measured for the

  17. What Is Evaluation?: Perspectives of How Evaluation Differs (or Not

    Social science research design, methods, theory, and so on: Same as researchers, but also interpersonal effectiveness, planning/management, political maneuvering, and so on ... differences were that participants who believed evaluation is a subcomponent of research were more likely to reference evaluation as simply a type of research study or ...

  18. Evaluation Design for Community Health Programs

    This type of design is often considered the gold standard against which other research designs are judged, as it offers a powerful technique for evaluating cause and effect. Fully experimental designs are unusual in evaluation research for rural community health programs.

  19. Evaluation design

    An evaluation design describes how data will be collected and analysed to answer the Key Evaluation Questions. There are different pathways for you as manager depending on who will develop the evaluation design. In most cases your evaluator will develop the evaluation design. In some cases you will - if you have evaluation expertise and/or ...

  20. Evaluation Research

    Evaluation Research. The regulatory definition under 45 CFR 46.102(d) is 'Research means a systematic investigation, including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge.' ... This audience analysis is a type of formative, user-centered design. Although these early "planning ...

  21. Step 3 of EBP: Part 1—Evaluating Research Designs

    Research design is the first methodological issue a clinical social worker must identify in appraising the quality of a research study. ... One example of this type of research design is the Nurses' Health Study (Colditz, Manson, & Hankinson, 1997). This is a study of female nurses who worked at Brigham and Women's Hospital in Boston and ...

  22. Research designs for studies evaluating the effectiveness of change and

    The methods of evaluating change and improvement strategies are not well described. The design and conduct of a range of experimental and non-experimental quantitative designs are considered. Such study designs should usually be used in a context where they build on appropriate theoretical, qualitative and modelling work, particularly in the development of appropriate interventions. A range of ...

  23. Evaluative Research Design Examples, Methods, And Questions ...

    Evaluative research, aka program evaluation or evaluation research, is a set of research practices aimed at assessing how well the product meets its goals. It takes place at all stages of the product development process, both in the launch lead-up and afterward. This kind of research is not limited to your own product.

  24. JMIR Formative Research

    Correction: Improving the Engagement of Underrepresented People in Health Research Through Equity-Centered Design Thinking: Qualitative Study and Process Evaluation for the Development of the Grounding Health Research in Design Toolkit JMIR Form Res 2024;8:e58397

  25. Evaluation of the design and structure of electronic medication ...

    Medication labels can be confusing to patients, leading to errors, incorrect usage, and early discontinuation of the drug. This review highlights how patient-centered design of medication labels can improve patient comprehension, safe usage, and adherence. Future research should focus on standardization, evaluation, and validation of content and display.

  26. Design, synthesis and antibacterial activity evaluation of ebselen

    New Delhi-β-lactamase-1 (NDM-1) is a type of one of metal-β-lactamase. NDM-1-expressing bacteria can spread rapidly across the globe via plasmid transfer, which greatly undermines the clinical efficacy of the last of the carbapenem class of antibiotics. Research on NDM-1 inhibitors has attracted extensive at

  27. Design of hybrid deep learning using TSA with ANN for cost evaluation

    The plastic injection industry is one of the most extensively used mass-production technology and has been continuously increasing in recent years. Cost evaluation is essential in corporate operations to increase market share and lead in plastic parts pricing. The complexity of the plastic parts and manufacturing data resulted in a long data waiting time and inaccurate cost evaluation ...