Statology

Statistics Made Easy

Two Sample t-test: Definition, Formula, and Example

A two sample t-test is used to determine whether or not two population means are equal.

This tutorial explains the following:

  • The motivation for performing a two sample t-test.
  • The formula to perform a two sample t-test.
  • The assumptions that should be met to perform a two sample t-test.
  • An example of how to perform a two sample t-test.

Two Sample t-test: Motivation

Suppose we want to know whether or not the mean weight between two different species of turtles is equal. Since there are thousands of turtles in each population, it would be too time-consuming and costly to go around and weigh each individual turtle.

Instead, we might take a simple random sample of 15 turtles from each population and use the mean weight in each sample to determine if the mean weight is equal between the two populations:

Two sample t-test example

However, it’s virtually guaranteed that the mean weight between the two samples will be at least a little different. The question is whether or not this difference is statistically significant . Fortunately, a two sample t-test allows us to answer this question.

Two Sample t-test: Formula

A two-sample t-test always uses the following null hypothesis:

  • H 0 : μ 1  = μ 2 (the two population means are equal)

The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:

  • H 1 (two-tailed): μ 1  ≠ μ 2 (the two population means are not equal)
  • H 1 (left-tailed): μ 1  < μ 2  (population 1 mean is less than population 2 mean)
  • H 1 (right-tailed):  μ 1 > μ 2  (population 1 mean is greater than population 2 mean)

We use the following formula to calculate the test statistic t:

Test statistic:  ( x 1  –  x 2 )  /  s p (√ 1/n 1  + 1/n 2 )

where  x 1  and  x 2 are the sample means, n 1 and n 2  are the sample sizes, and where s p is calculated as:

s p = √  (n 1 -1)s 1 2  +  (n 2 -1)s 2 2  /  (n 1 +n 2 -2)

where s 1 2  and s 2 2  are the sample variances.

If the p-value that corresponds to the test statistic t with (n 1 +n 2 -1) degrees of freedom is less than your chosen significance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis.

Two Sample t-test: Assumptions

For the results of a two sample t-test to be valid, the following assumptions should be met:

  • The observations in one sample should be independent of the observations in the other sample.
  • The data should be approximately normally distributed.
  • The two samples should have approximately the same variance. If this assumption is not met, you should instead perform Welch’s t-test .
  • The data in both samples was obtained using a random sampling method .

Two Sample t-test : Example

Suppose we want to know whether or not the mean weight between two different species of turtles is equal. To test this, will perform a two sample t-test at significance level α = 0.05 using the following steps:

Step 1: Gather the sample data.

Suppose we collect a random sample of turtles from each population with the following information:

  • Sample size n 1 = 40
  • Sample mean weight  x 1  = 300
  • Sample standard deviation s 1 = 18.5
  • Sample size n 2 = 38
  • Sample mean weight  x 2  = 305
  • Sample standard deviation s 2 = 16.7

Step 2: Define the hypotheses.

We will perform the two sample t-test with the following hypotheses:

  • H 0 :  μ 1  = μ 2 (the two population means are equal)
  • H 1 :  μ 1  ≠ μ 2 (the two population means are not equal)

Step 3: Calculate the test statistic  t .

First, we will calculate the pooled standard deviation s p :

s p = √  (n 1 -1)s 1 2  +  (n 2 -1)s 2 2  /  (n 1 +n 2 -2)  = √  (40-1)18.5 2  +  (38-1)16.7 2  /  (40+38-2)  = 17.647

Next, we will calculate the test statistic  t :

t = ( x 1  –  x 2 )  /  s p (√ 1/n 1  + 1/n 2 ) =  (300-305) / 17.647(√ 1/40 + 1/38 ) =  -1.2508

Step 4: Calculate the p-value of the test statistic  t .

According to the T Score to P Value Calculator , the p-value associated with t = -1.2508 and degrees of freedom = n 1 +n 2 -2 = 40+38-2 = 76 is  0.21484 .

Step 5: Draw a conclusion.

Since this p-value is not less than our significance level α = 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that the mean weight of turtles between these two populations is different.

Note:  You can also perform this entire two sample t-test by simply using the Two Sample t-test Calculator .

Additional Resources

The following tutorials explain how to perform a two-sample t-test using different statistical programs:

How to Perform a Two Sample t-test in Excel How to Perform a Two Sample t-test in SPSS How to Perform a Two Sample t-test in Stata How to Perform a Two Sample t-test in R How to Perform a Two Sample t-test in Python How to Perform a Two Sample t-test on a TI-84 Calculator

hypothesis testing of two samples

Hey there. My name is Zach Bobbitt. I have a Master of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

JMP | Statistical Discovery.™ From SAS.

Statistics Knowledge Portal

A free online introduction to statistics

The Two-Sample t -Test

What is the two-sample t -test.

The two-sample t -test (also known as the independent samples t -test) is a method used to test whether the unknown population means of two groups are equal or not.

Is this the same as an A/B test?

Yes, a two-sample t -test is used to analyze the results from A/B tests.

When can I use the test?

You can use the test when your data values are independent, are randomly sampled from two normal populations and the two independent groups have equal variances.

What if I have more than two groups?

Use a multiple comparison method. Analysis of variance (ANOVA) is one such method. Other multiple comparison methods include the Tukey-Kramer test of all pairwise differences, analysis of means (ANOM) to compare group means to the overall mean or Dunnett’s test to compare each group mean to a control mean.

What if the variances for my two groups are not equal?

You can still use the two-sample t- test. You use a different estimate of the standard deviation. 

What if my data isn’t nearly normally distributed?

If your sample sizes are very small, you might not be able to test for normality. You might need to rely on your understanding of the data. When you cannot safely assume normality, you can perform a nonparametric test that doesn’t assume normality.

See how to perform a two-sample t -test using statistical software

  • Download JMP to follow along using the sample data included with the software.
  • To see more JMP tutorials, visit the JMP Learning Library .

Using the two-sample t -test

The sections below discuss what is needed to perform the test, checking our data, how to perform the test and statistical details.

What do we need?

For the two-sample t -test, we need two variables. One variable defines the two groups. The second variable is the measurement of interest.

We also have an idea, or hypothesis, that the means of the underlying populations for the two groups are different. Here are a couple of examples:

  • We have students who speak English as their first language and students who do not. All students take a reading test. Our two groups are the native English speakers and the non-native speakers. Our measurements are the test scores. Our idea is that the mean test scores for the underlying populations of native and non-native English speakers are not the same. We want to know if the mean score for the population of native English speakers is different from the people who learned English as a second language.
  • We measure the grams of protein in two different brands of energy bars. Our two groups are the two brands. Our measurement is the grams of protein for each energy bar. Our idea is that the mean grams of protein for the underlying populations for the two brands may be different. We want to know if we have evidence that the mean grams of protein for the two brands of energy bars is different or not.

Two-sample t -test assumptions

To conduct a valid test:

  • Data values must be independent. Measurements for one observation do not affect measurements for any other observation.
  • Data in each group must be obtained via a random sample from the population.
  • Data in each group are normally distributed .
  • Data values are continuous.
  • The variances for the two independent groups are equal.

For very small groups of data, it can be hard to test these requirements. Below, we'll discuss how to check the requirements using software and what to do when a requirement isn’t met.

Two-sample t -test example

One way to measure a person’s fitness is to measure their body fat percentage. Average body fat percentages vary by age, but according to some guidelines, the normal range for men is 15-20% body fat, and the normal range for women is 20-25% body fat.

Our sample data is from a group of men and women who did workouts at a gym three times a week for a year. Then, their trainer measured the body fat. The table below shows the data.

Table 1: Body fat percentage data grouped by gender

You can clearly see some overlap in the body fat measurements for the men and women in our sample, but also some differences. Just by looking at the data, it's hard to draw any solid conclusions about whether the underlying populations of men and women at the gym have the same mean body fat. That is the value of statistical tests – they provide a common, statistically valid way to make decisions, so that everyone makes the same decision on the same set of data values.

Checking the data

Let’s start by answering: Is the two-sample t -test an appropriate method to evaluate the difference in body fat between men and women?

  • The data values are independent. The body fat for any one person does not depend on the body fat for another person.
  • We assume the people measured represent a simple random sample from the population of members of the gym.
  • We assume the data are normally distributed, and we can check this assumption.
  • The data values are body fat measurements. The measurements are continuous.
  • We assume the variances for men and women are equal, and we can check this assumption.

Before jumping into analysis, we should always take a quick look at the data. The figure below shows histograms and summary statistics for the men and women.

Histogram and summary statistics for the body fat data

The two histograms are on the same scale. From a quick look, we can see that there are no very unusual points, or outliers . The data look roughly bell-shaped, so our initial idea of a normal distribution seems reasonable.

Examining the summary statistics, we see that the standard deviations are similar. This supports the idea of equal variances. We can also check this using a test for variances.

Based on these observations, the two-sample t -test appears to be an appropriate method to test for a difference in means.

How to perform the two-sample t -test

For each group, we need the average, standard deviation and sample size. These are shown in the table below.

Table 2: Average, standard deviation and sample size statistics grouped by gender

Without doing any testing, we can see that the averages for men and women in our samples are not the same. But how different are they? Are the averages “close enough” for us to conclude that mean body fat is the same for the larger population of men and women at the gym? Or are the averages too different for us to make this conclusion?

We'll further explain the principles underlying the two sample t -test in the statistical details section below, but let's first proceed through the steps from beginning to end. We start by calculating our test statistic. This calculation begins with finding the difference between the two averages:

$ 22.29 - 14.95 = 7.34 $

This difference in our samples estimates the difference between the population means for the two groups.

Next, we calculate the pooled standard deviation. This builds a combined estimate of the overall standard deviation. The estimate adjusts for different group sizes. First, we calculate the pooled variance:

$ s_p^2 = \frac{((n_1 - 1)s_1^2) + ((n_2 - 1)s_2^2)} {n_1 + n_2 - 2} $

$ s_p^2 = \frac{((10 - 1)5.32^2) + ((13 - 1)6.84^2)}{(10 + 13 - 2)} $

$ = \frac{(9\times28.30) + (12\times46.82)}{21} $

$ = \frac{(254.7 + 561.85)}{21} $

$ =\frac{816.55}{21} = 38.88 $

Next, we take the square root of the pooled variance to get the pooled standard deviation. This is:

$ \sqrt{38.88} = 6.24 $

We now have all the pieces for our test statistic. We have the difference of the averages, the pooled standard deviation and the sample sizes.  We calculate our test statistic as follows:

$ t = \frac{\text{difference of group averages}}{\text{standard error of difference}} = \frac{7.34}{(6.24\times \sqrt{(1/10 + 1/13)})} = \frac{7.34}{2.62} = 2.80 $

To evaluate the difference between the means in order to make a decision about our gym programs, we compare the test statistic to a theoretical value from the t- distribution. This activity involves four steps:

  • We decide on the risk we are willing to take for declaring a significant difference. For the body fat data, we decide that we are willing to take a 5% risk of saying that the unknown population means for men and women are not equal when they really are. In statistics-speak, the significance level, denoted by α, is set to 0.05. It is a good practice to make this decision before collecting the data and before calculating test statistics.
  • We calculate a test statistic. Our test statistic is 2.80.
  • We find the theoretical value from the t- distribution based on our null hypothesis which states that the means for men and women are equal. Most statistics books have look-up tables for the t- distribution. You can also find tables online. The most likely situation is that you will use software and will not use printed tables. To find this value, we need the significance level (α = 0.05) and the degrees of freedom . The degrees of freedom ( df ) are based on the sample sizes of the two groups. For the body fat data, this is: $ df = n_1 + n_2 - 2 = 10 + 13 - 2 = 21 $ The t value with α = 0.05 and 21 degrees of freedom is 2.080.
  • We compare the value of our statistic (2.80) to the t value. Since 2.80 > 2.080, we reject the null hypothesis that the mean body fat for men and women are equal, and conclude that we have evidence body fat in the population is different between men and women.

Statistical details

Let’s look at the body fat data and the two-sample t -test using statistical terms.

Our null hypothesis is that the underlying population means are the same. The null hypothesis is written as:

$ H_o:  \mathrm{\mu_1} =\mathrm{\mu_2} $

The alternative hypothesis is that the means are not equal. This is written as:

$ H_o:  \mathrm{\mu_1} \neq \mathrm{\mu_2} $

We calculate the average for each group, and then calculate the difference between the two averages. This is written as:

$\overline{x_1} -  \overline{x_2} $

We calculate the pooled standard deviation. This assumes that the underlying population variances are equal. The pooled variance formula is written as:

The formula shows the sample size for the first group as n 1 and the second group as n 2 . The standard deviations for the two groups are s 1 and s 2 . This estimate allows the two groups to have different numbers of observations. The pooled standard deviation is the square root of the variance and is written as s p .

What if your sample sizes for the two groups are the same? In this situation, the pooled estimate of variance is simply the average of the variances for the two groups:

$ s_p^2 = \frac{(s_1^2 + s_2^2)}{2} $

The test statistic is calculated as:

$ t = \frac{(\overline{x_1} -\overline{x_2})}{s_p\sqrt{1/n_1 + 1/n_2}} $

The numerator of the test statistic is the difference between the two group averages. It estimates the difference between the two unknown population means. The denominator is an estimate of the standard error of the difference between the two unknown population means. 

Technical Detail: For a single mean, the standard error is $ s/\sqrt{n} $  . The formula above extends this idea to two groups that use a pooled estimate for s (standard deviation), and that can have different group sizes.

We then compare the test statistic to a t value with our chosen alpha value and the degrees of freedom for our data. Using the body fat data as an example, we set α = 0.05. The degrees of freedom ( df ) are based on the group sizes and are calculated as:

$ df = n_1 + n_2 - 2 = 10 + 13 - 2 = 21 $

The formula shows the sample size for the first group as n 1 and the second group as n 2 .  Statisticians write the t value with α = 0.05 and 21 degrees of freedom as:

$ t_{0.05,21} $

The t value with α = 0.05 and 21 degrees of freedom is 2.080. There are two possible results from our comparison:

  • The test statistic is lower than the t value. You fail to reject the hypothesis of equal means. You conclude that the data support the assumption that the men and women have the same average body fat.
  • The test statistic is higher than the t value. You reject the hypothesis of equal means. You do not conclude that men and women have the same average body fat.

t -Test with unequal variances

When the variances for the two groups are not equal, we cannot use the pooled estimate of standard deviation. Instead, we take the standard error for each group separately. The test statistic is:

$ t = \frac{ (\overline{x_1} -  \overline{x_2})}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} $

The numerator of the test statistic is the same. It is the difference between the averages of the two groups. The denominator is an estimate of the overall standard error of the difference between means. It is based on the separate standard error for each group.

The degrees of freedom calculation for the t value is more complex with unequal variances than equal variances and is usually left up to statistical software packages. The key point to remember is that if you cannot use the pooled estimate of standard deviation, then you cannot use the simple formula for the degrees of freedom.

Testing for normality

The normality assumption is more important   when the two groups have small sample sizes than for larger sample sizes.

Normal distributions are symmetric, which means they are “even” on both sides of the center. Normal distributions do not have extreme values, or outliers. You can check these two features of a normal distribution with graphs. Earlier, we decided that the body fat data was “close enough” to normal to go ahead with the assumption of normality. The figure below shows a normal quantile plot for men and women, and supports our decision.

 Normal quantile plot of the body fat measurements for men and women

You can also perform a formal test for normality using software. The figure above shows results of testing for normality with JMP software. We test each group separately. Both the test for men and the test for women show that we cannot reject the hypothesis of a normal distribution. We can go ahead with the assumption that the body fat data for men and for women are normally distributed.

Testing for unequal variances

Testing for unequal variances is complex. We won’t show the calculations in detail, but will show the results from JMP software. The figure below shows results of a test for unequal variances for the body fat data.

Test for unequal variances for the body fat data

Without diving into details of the different types of tests for unequal variances, we will use the F test. Before testing, we decide to accept a 10% risk of concluding the variances are equal when they are not. This means we have set α = 0.10.

Like most statistical software, JMP shows the p -value for a test. This is the likelihood of finding a more extreme value for the test statistic than the one observed. It’s difficult to calculate by hand. For the figure above, with the F test statistic of 1.654, the p- value is 0.4561. This is larger than our α value: 0.4561 > 0.10. We fail to reject the hypothesis of equal variances. In practical terms, we can go ahead with the two-sample t -test with the assumption of equal variances for the two groups.

Understanding p-values

Using a visual, you can check to see if your test statistic is a more extreme value in the distribution. The figure below shows a t- distribution with 21 degrees of freedom.

t-distribution with 21 degrees of freedom and α = .05

Since our test is two-sided and we have set α = .05, the figure shows that the value of 2.080 “cuts off” 2.5% of the data in each of the two tails. Only 5% of the data overall is further out in the tails than 2.080. Because our test statistic of 2.80 is beyond the cut-off point, we reject the null hypothesis of equal means.

Putting it all together with software

The figure below shows results for the two-sample t -test for the body fat data from JMP software.

Results for the two-sample t-test from JMP software

The results for the two-sample t -test that assumes equal variances are the same as our calculations earlier. The test statistic is 2.79996. The software shows results for a two-sided test and for one-sided tests. The two-sided test is what we want (Prob > |t|). Our null hypothesis is that the mean body fat for men and women is equal. Our alternative hypothesis is that the mean body fat is not equal. The one-sided tests are for one-sided alternative hypotheses – for example, for a null hypothesis that mean body fat for men is less than that for women.

We can reject the hypothesis of equal mean body fat for the two groups and conclude that we have evidence body fat differs in the population between men and women. The software shows a p -value of 0.0107. We decided on a 5% risk of concluding the mean body fat for men and women are different, when they are not. It is important to make this decision before doing the statistical test.

The figure also shows the results for the t- test that does not assume equal variances. This test does not use the pooled estimate of the standard deviation. As was mentioned above, this test also has a complex formula for degrees of freedom. You can see that the degrees of freedom are 20.9888. The software shows a p- value of 0.0086. Again, with our decision of a 5% risk, we can reject the null hypothesis of equal mean body fat for men and women.

Other topics

If you have more than two independent groups, you cannot use the two-sample t- test. You should use a multiple comparison   method. ANOVA, or analysis of variance, is one such method. Other multiple comparison methods include the Tukey-Kramer test of all pairwise differences, analysis of means (ANOM) to compare group means to the overall mean or Dunnett’s test to compare each group mean to a control mean.

What if my data are not from normal distributions?

If your sample size is very small, it might be hard to test for normality. In this situation, you might need to use your understanding of the measurements. For example, for the body fat data, the trainer knows that the underlying distribution of body fat is normally distributed. Even for a very small sample, the trainer would likely go ahead with the t -test and assume normality.

What if you know the underlying measurements are not normally distributed? Or what if your sample size is large and the test for normality is rejected? In this situation, you can use nonparametric analyses. These types of analyses do not depend on an assumption that the data values are from a specific distribution. For the two-sample t ­-test, the Wilcoxon rank sum test is a nonparametric test that could be used.

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

What is a Hypothesis Test for 2 Samples?

Searching the internet for a definition of hypothesis testing for 2 samples brings back a lot of different results. Most of them are a little different. The definitions you will find online usually are disjointed, covering hypothesis testing for independent means, paired means, and proportions. Instead of giving one uniform definition, we’ll take a look at key components that are common to all of the tests, and then some of the specific components and notation.

The Basic Idea

The appearance of these hypothesis tests (in the real world) will be very similar to the tests that we see with one sample. In fact, the examples of hypothesis tests that were in the previous introduction include tests for one sample as well as two samples. The basic structure of these hypothesis tests are very similar to the ones we saw before. You have a problem, hypothesis, data collection, some computations, results or conclusions. Some of the notation will be slightly different. These examples below are the same ones we presented in the previous introduction, but here we are highlighting the two-sample variations. The examples with bolded terms are the ones that use 2 samples.

Some Examples of Hypothesis Tests

Example 1: agility testing in youth football (soccer)players; evaluating reliability, validity, and correlates of newly developed testing protocols.

Reactive agility (RAG)and change of direction speed (CODS) were analyzed in 13U and 15U youth soccer players. “ Independent samples t-test indicated significant differences between U13 and U15 in S10 (t-test: 3.57, p < 0.001), S20M (t-test: 3.13, p < 0.001), 20Y (t-test: 4.89, p < 0.001), FS_RAG (t-test: 3.96, p < 0.001), and FS_CODS (t-test: 6.42, p < 0.001), with better performance in U15. Starters outperformed non-starters in most capacities among U13, but only in FS_RAG among U15 (t-test: 1.56, p < 0.05).”

Most of this might seem like gibberish for now, but essentially the two groups were analyzed and compared, with significant differences observed between the groups. This is a hypothesis test for 2 means, independent samples.

Source: https://pubmed.ncbi.nlm.nih.gov/31906269/

Example 2: Manual therapy in the treatment of carpal tunnel syndrome in diabetic patients: A randomized clinical trial

Thirty diabetic patients with carpal tunnel syndrome were split up into two groups. One received physiotherapy modality and the other received manual therapy. “ Paired t-test revealed that all of the outcome measures had a significant change in the manual therapy group, whereas only the VAS and SSS changed significantly in the modality group at the end of 4 weeks. Independent t-test showed that the variables of SSS, FSS and MNT in the manual therapy group improved significantly greater than the modality group.”

This is a hypothesis test for matched pairs, sometimes known as 2 means, dependent samples.

Source: https://pubmed.ncbi.nlm.nih.gov/30197774/

Example 3: Omega-3 fatty acids decreased irritability of patients with bipolar disorder in an add-on, open label study

“The initial mean was 63.51 (SD 34.17), indicating that on average, subjects were irritable for about six of the previous ten days. The mean for the last recorded percentage was less than half of the initial score: 30.27 (SD 34.03). The decrease was found to be statistically significant using a paired sample t-test (t = 4.36, 36 df, p < .001).”

Source: https://nutritionj.biomedcentral.com/articles/10.1186/1475-2891-4-6

Example 4: Evaluating the Efficacy of COVID-19 Vaccines

“We reduced all values of vaccine efficacy by 30% to reflect the waning of vaccine efficacy against each endpoint over time. We tested the null hypothesis that the vaccine efficacy is 0% versus the alternative hypothesis that the vaccine efficacy is greater than 0% at the nominal significance level of 2.5%.”

Source: https://www.medrxiv.org/content/10.1101/2020.10.02.20205906v2.full

Example 5: Social Isolation During COVID-19 Pandemic. Perceived Stress and Containment Measures Compliance Among Polish and Italian Residents

“The Polish group had a higher stress level than the Italian group (mean PSS-10 total score 22,14 vs 17,01, respectively; p < 0.01). There was a greater prevalence of chronic diseases among Polish respondents. Italian subjects expressed more concern about their health, as well as about their future employment. Italian subjects did not comply with suggested restrictions as much as Polish subjects and were less eager to restrain from their usual activities (social, physical, and religious), which were more often perceived as “most needed matters” in Italian than in Polish residents.”

Even though the test wording itself does not explicitly state the tests we will study, this is a comparison of means from two different groups, so this is a test for two means, independent samples.

Source: https://www.frontiersin.org/articles/10.3389/fpsyg.2021.673514/full

Example 6: A Comparative Analysis of Student Performance in an Online vs. Face-to-Face Environmental Science Course From 2009 to 2016

“The independent sample t-test showed no significant difference in student performance between online and F2F learners with respect to gender [t(145) = 1.42, p = 0.122].”

Once again, a test of 2 means, independent samples.

Source: https://www.frontiersin.org/articles/10.3389/fcomp.2019.00007/full

But what does it all mean?

That’s what comes next. The examples above span a variety of different types of hypothesis tests. Within this chapter we will take a look at some of the terminology, formulas, and concepts related to Hypothesis Testing for 2 Samples.

Key Terminology and Formulas

Hypothesis: This is a claim or statement about a population, usually focusing on a parameter such as a proportion (%), mean, standard deviation, or variance. We will be focusing primarily on the proportion and the mean.

Hypothesis Test: Also known as a Significance Test or Test of Significance , the hypothesis test is the collection of procedures we use to test a claim about a population.

Null Hypothesis: This is a statement that the population parameter (such as the proportion, mean, standard deviation, or variance) is equal to some value. In simpler terms, the Null Hypothesis is a statement that “nothing is different from what usually happens.” The Null Hypothesis is usually denoted by [latex]H_{0}[/latex], followed by other symbols and notation that describe how the parameter from one population or group is the same as the parameter from another population or group.

Alternative Hypothesis: This is a statement that the population parameter (such as the proportion, mean, standard deviation, or variance) is somehow different the value involved in the Null Hypothesis. For our examples, “somehow different” will involve the use of [latex] [/latex], or [latex]\neq[/latex]. In simpler terms, the Alternative Hypothesis is a statement that “something is different from what usually happens.” The Alternative Hypothesis is usually denoted by [latex]H_{1}[/latex], [latex]H_{A}[/latex], or [latex]H_{a}[/latex], followed by other symbols and notation that describe how the parameter from one population or group is different from the parameter from another population or group.

Significance Level: We previous learned about the significance level as the “left over” stuff from the confidence level. This is still true, but we will now focus more on the significance level as its own value, and we will use the symbol alpha, [latex]\alpha[/latex]. This looks like a lowercase “a,” or a drawing of a little fish. The significance level [latex]\alpha[/latex] is the probability of rejecting the null hypothesis when it is actually true (more on what this means in the next section). The common values are still similar to what we had previously, 1%, 5%, and 10%. We commonly write these as decimals instead, 0.01, 0.05, and 0.10.

Test Statistic:  One of the key components of a hypothesis test is what we call a  test statistic . This is a calculation, sort of like a z-score, that is specific to the type of test being conducted. The idea behind a test statistic, relating it back to science projects, would be like calculations from measurements that were taken. In this chapter we will address the test statistic for 2 proportions, 2 means (independent samples), and matched pairs (2 means from dependent samples). The formulas are listed in the table below:

What the different symbols mean:

Critical Region: The critical region , also known as the rejection region , is the area in the normal (or other) distribution in which we reject the null hypothesis. Think of the critical region  like a target area that you are aiming for. If we are able to get a value in this region, it means we have evidence for the claim.

Critical Value: These are like special z-scores for us; the critical value  (or values, sometimes there are two) separates the critical region from the rest of the distribution. This is the non-target part, or what we are not aiming for. If our value is in this region, we do not have evidence for the claim.

P-Value: This is a special value that we compute. If we assume the null hypothesis is true, the p-value represents the probability that a test statistic is at least as extreme as the one we computed from our sample data; for us the test statistics would be either [latex]z[/latex] or [latex]t[/latex].

Decision Rule for Hypothesis Testing:  There are a few ways we can arrive at our decision with a hypothesis test. We can arrive at our conclusion by using confidence intervals, critical values (also known as traditional method), and using p-values. Relating this to a science project, the decision rule would be what we take into consideration to arrive at our conclusion. When we make our decision, the wording will sound a little strange. We’ll say things like “we have enough evidence to reject the null hypothesis” or “there is insufficient evidence to reject the null hypothesis.”

Decision Rule with Critical Values:  If the test statistic is in the critical region, we have enough evidence to reject the null hypothesis. We can also say we have sufficient evidence to support the claim. If the test statistic is not in the critical region, we fail to reject the null hypothesis. We can also say we do not have sufficient evidence to support the claim.

Decision Rule with P-Values: If the p-value is less than or equal to the significance level, we have enough evidence to reject the null hypothesis. We can also say we have sufficient evidence to support the claim. If the p-value is greater than the significance level, we fail to reject the null hypothesis. We can also say we do not have sufficient evidence to support the claim.

More About Hypotheses

Writing the Null and Alternative Hypothesis can be tricky. Here are a few examples of claims followed by the respective hypotheses:

Basic Statistics Copyright © by Allyn Leon is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

logo

Introduction to Data Science I & II

Two sample testing, two sample testing #.

In many applications there is an interest in comparing two random samples; for example, investigate differences in cholesterol levels between two groups of patients. It is often done using a hypothesis test - hence the name “two sample testing”. This is also called A/B testing.

The natural hypotheses for this situation are:

\(H_0\) : the two samples are generated from the same distribution.

\(H_A\) : the two samples are generated from two different distributions.

The test statistic is normally based on the difference in a specified sample summary; for example, difference in means, or medians, or standard deviations (if we expect the sample to differ in their variability).

We illustrate this with a classic diabetes dataset from the National Institute of Diabetes and Digestive and Kidney Diseases. The subjects of this dataset are females at least 21 years old, and the goal was to predict diabetes status that is summarized in the column called “Outcome”.

We will focus in this example on BMI. Below are boxplots for the two diabetes status groups.

../../_images/HypothesisTesting_3_TwoSample_4_0.png

There are several observations from the above plots:

The distributions of BMI in the two groups seem different; for example, the median BMI is larger in diabetics.

There are some subjects for which the recorded value for BMI is equal to 0; this suggests that missing data were recorded as 0 and we will have to take that into account in our analysis.

Below, we create two arrays that contain the BMI values in the two groups after removing the missing data.

We have two samples here of size \(n_0=491\) and \(n_1=266\) , and the null hypothesis we investigate is: BMI distributions in diabetics and non-diabetics subjects are the same.

The test statistic we will use is the difference in sample medians, with an observed value of 4.2:

The next step is to obtain an approximation for the sampling distribution of our test statistic. The procedure we implement, called a permutation test uses the following observations:

If the null hypothesis is true: a BMI value is equally likely to be sampled from diabetics and non-diabetics

If the null hypothesis is true: all rearrangements (permutations) of BMI values among the two groups are equally likely

If the null hypothesis is true: the observed test statistic can be viewed as a sample from the distribution of median differences of permuted BMI values in two groups.

It suggests the following simulation to learn the null distribution for the test statistic:

Shuffle (permute) the BMI values

Assign \(n_0=491\) to “Group A“ and the rest to “Group B“ (to maintain the two sample sizes)

Find the differences between medians of the two shuffled (permuted) groups

The generated distribution and the value of the test statistic are used to calculate a p-value.

We first illustrate how to create shuffled samples and calculate the corresponding test statistic. We use the numpy function random.permutation to create an array that has the same values but with order that is shuffled: the first part of the new array will correspond to the control group.

In the cell code below, we repeat the procedure 5000 times and create an approximation for the distribution of our test statistic that is saved in the array differences .

../../_images/HypothesisTesting_3_TwoSample_13_0.png

From the above histogram, we can see that there is strong evidence against the null hypothesis that the distributions of BMI in cases and controls are the same.

Please note that the choice of test statistic could have a big impact on the conclusions from the test. Below, we repeat the procedure using as test statistic the difference in standard deviations of the two samples. There is no evidence, when using this statistic, that the distributions are different.

../../_images/HypothesisTesting_3_TwoSample_15_0.png

Module 10: Hypothesis Testing With Two Samples

Why it matters: hypothesis testing with two samples.

The concepts discussed in the module Hypothesis Testing with One Sample can be applied to situations involving two samples. The reason we can do this is due to the following big ideas:

  • Random samples vary. When we use a sample proportion or sample mean to make an inference about a population proportion or population mean, there is uncertainty. For this reason, inference involves probability.
  • Under certain conditions, we can model the variability in sample proportions or sample means with a normal curve. We use the normal curve to make probability-based decisions about population values.
  • We can estimate a population proportion or a population mean with a confidence interval. The confidence interval is an actual sample proportion or sample mean plus or minus a margin of error. We state our confidence in the accuracy of these intervals using probability.
  • We can test a hypothesis about a population proportion or population mean using a sample proportion or a sample mean. Again, we base our conclusion on probability using a P-value. The P-value describes the strength of our evidence in rejecting a hypothesis about the population.

In Hypothesis Testing with Two Samples , we extend these big ideas to make inferences that compare two populations (or two treatments). An example of such an inference follows:

The Abecedarian Early Intervention Project

In the 1970s, Abecedarian Early Intervention Project studied the long-term effects of early childhood education for poor children.

Research question: Does early childhood education increase the likelihood of college attendance for poor children?

  • Produce Data: Determine what to measure, then collect the data.  In this experiment, researchers selected 111 high-risk infants on the basis of the mothers’ education, family income, and other factors. They randomly assigned 57 infants to receive 5 years of high-quality preschool. The remaining 54 infants were a control group. All children received nutritional supplements, social services, and health care to control the effects of these confounding factors on the outcomes of the experiment.
  • Exploratory Data Analysis: Analyze and summarize the data . By the age of 21 a much higher percentage of the treatment group enrolled in college, 42% vs. 20%.
  • Draw a Conclusion: Use data, probability, and statistical inference to draw a conclusion about the populations . Is this difference statistically significant? In other words, is this difference due to the pre-school experience or due to chance? We will test the claim that a larger proportion of children who attend pre-school will attend college.

The following figure summarizes this investigation in the Big Picture.

The Big Picture applied to the Abecedarian Early Intervention Project

  • Provided by : Lumen Learning. License : CC BY: Attribution
  • Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution

Footer Logo Lumen Candela

Privacy Policy

404 Not found

BUS204: Business Statistics

hypothesis testing of two samples

Hypothesis Testing with Two Samples

Read this chapter, which discusses how to compare data from two similar groups. This is useful when, for example, you want to analyze things like how someone's income relates to another sample that you are interested in. Make sure you read the introduction as well as sections 10.1 through 10.6. Attempt the practice problems and homework at the end of the chapter.

Matched or Paired Samples

  • Simple random sampling is used.
  • Sample sizes are often small.
  • Two measurements (samples) are drawn from the same pair of individuals or objects.
  • Differences are calculated from the matched or paired samples.
  • The differences form the sample that is used for the hypothesis test.
  • Either the matched pairs have differences that come from a population that is normal or the number of differences is sufficiently large so that distribution of the sample mean of differences is approximately normal.

In a hypothesis test for matched or paired samples, subjects are matched in pairs and differences are calculated. The differences are the data. The population mean for the differences, μd, is then tested using a Student's-t test for a single population mean with n – 1 degrees of freedom, where n is the number of differences, that is, the number of pairs not the number of observations.

The null and alternative hypotheses for this test are:

The test statistic is:

Example 10.9

Problem A company has developed a training program for its entering employees because they have become concerned with the results of the six-month employee review. They hope that the training program can result in better six-month reviews. Each trainee constitutes a "pair", the entering score the employee received when first entering the firm and the score given at the six-month review. The difference in the two scores were calculated for each employee and the means for before and after the training program was calculated. The sample mean before the training program was 20.4 and the sample mean after the training program was 23.9. The standard deviation of the differences in the two scores across the 20 employees was 3.8 points. Test at the 10% significance level the null hypothesis that the two population means are equal against the alternative that the training program helps improve the employees' scores.

Solution 1 The first step is to identify this as a two sample case: before the training and after the training. This differentiates this problem from simple one sample issues. Second, we determine that the two samples are "paired". Each observation in the first sample has a paired observation in the second sample. This information tells us that the null and alternative hypotheses should be:

This form reflects the implied claim that the training course improves scores; the test is one-tailed and the claim is in the alternative hypothesis. Because the experiment was conducted as a matched paired sample rather than simply taking scores from people who took the training course those who didn't, we use the matched pair test statistic:

In order to solve this equation, the individual scores, pre-training course and post-training course need to be used to calculate the individual differences. These scores are then averaged and the average difference is calculated:

From these differences we can calculate the standard deviation across the individual differences:

We can now compare the calculated value of the test statistic, 4.12, with the critical value. The critical value is a Student's t with degrees of freedom equal to the number of pairs, not observations, minus 1. In this case 20 pairs and at 90% confidence level t a/2 = ±1.729 at df = 20 - 1 = 19. The calculated test statistic is most certainly in the tail of the distribution and thus we cannot accept the null hypothesis that there is no difference from the training program. Evidence seems indicate that the training aids employees in gaining higher scores.

Example 10.10

A study was conducted to investigate the effectiveness of hypnotism in reducing pain. Results for randomly selected subjects are shown in Table 10.5. A lower score indicates less pain. The "before" value is matched to an "after" value and the differences are calculated. Are the sensory measurements, on average, lower after hypnotism? Test at a 5% significance level.

Figure 10.9

Example 10.11

Figure 10.10

10.5 Hypothesis Testing for Two Means and Two Proportions

Hypothesis testing for two means and two proportions.

Student Learning Outcomes

  • The student will select the appropriate distributions to use in each case.
  • The student will conduct hypothesis tests and interpret the results.
  • The business section from two consecutive days’ newspapers
  • Three small packages of multicolored chocolates
  • Five small packages of peanut butter candies

Increasing Stocks Survey Look at yesterday’s newspaper business section. Conduct a hypothesis test to determine if the proportion of New York Stock Exchange (NYSE) stocks that increased is greater than the proportion of NASDAQ stocks that increased. As randomly as possible, choose 40 NYSE stocks and 32 NASDAQ stocks and complete the following statements.

  • H 0 : _________
  • H a : _________
  • In words, define the random variable.
  • The distribution to use for the test is _____________.
  • Calculate the test statistic using your data.
  • Calculate the p value.
  • Do you reject or not reject the null hypothesis? Why?
  • Write a clear conclusion using a complete sentence.

Decreasing Stocks Survey Randomly pick eight stocks from the newspaper. Using two consecutive days’ business sections, test whether the stocks went down, on average, for the second day.

  • H 0 : ________
  • H a : ________
  • Calculate the p value:

Candy Survey Buy three small packages of multicolored chocolates and five small packages of peanut butter candies (same net weight as the multicolored chocolates). Test whether the mean number of candy pieces per package is the same for the two brands.

  • What distribution should be used for this test?

Shoe Survey Test whether women have, on average, more pairs of shoes than men. Include all forms of sneakers, shoes, sandals, and boots. Use your class as the sample.

  • The distribution to use for the test is ________________.

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/10-5-hypothesis-testing-for-two-means-and-two-proportions

© Jan 23, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Hypothesis Testing

Hypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid.

A null hypothesis and an alternative hypothesis are set up before performing the hypothesis testing. This helps to arrive at a conclusion regarding the sample obtained from the population. In this article, we will learn more about hypothesis testing, its types, steps to perform the testing, and associated examples.

What is Hypothesis Testing in Statistics?

Hypothesis testing uses sample data from the population to draw useful conclusions regarding the population probability distribution . It tests an assumption made about the data using different types of hypothesis testing methodologies. The hypothesis testing results in either rejecting or not rejecting the null hypothesis.

Hypothesis Testing Definition

Hypothesis testing can be defined as a statistical tool that is used to identify if the results of an experiment are meaningful or not. It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses will always be mutually exclusive. This means that if the null hypothesis is true then the alternative hypothesis is false and vice versa. An example of hypothesis testing is setting up a test to check if a new medicine works on a disease in a more efficient manner.

Null Hypothesis

The null hypothesis is a concise mathematical statement that is used to indicate that there is no difference between two possibilities. In other words, there is no difference between certain characteristics of data. This hypothesis assumes that the outcomes of an experiment are based on chance alone. It is denoted as \(H_{0}\). Hypothesis testing is used to conclude if the null hypothesis can be rejected or not. Suppose an experiment is conducted to check if girls are shorter than boys at the age of 5. The null hypothesis will say that they are the same height.

Alternative Hypothesis

The alternative hypothesis is an alternative to the null hypothesis. It is used to show that the observations of an experiment are due to some real effect. It indicates that there is a statistical significance between two possible outcomes and can be denoted as \(H_{1}\) or \(H_{a}\). For the above-mentioned example, the alternative hypothesis would be that girls are shorter than boys at the age of 5.

Hypothesis Testing P Value

In hypothesis testing, the p value is used to indicate whether the results obtained after conducting a test are statistically significant or not. It also indicates the probability of making an error in rejecting or not rejecting the null hypothesis.This value is always a number between 0 and 1. The p value is compared to an alpha level, \(\alpha\) or significance level. The alpha level can be defined as the acceptable risk of incorrectly rejecting the null hypothesis. The alpha level is usually chosen between 1% to 5%.

Hypothesis Testing Critical region

All sets of values that lead to rejecting the null hypothesis lie in the critical region. Furthermore, the value that separates the critical region from the non-critical region is known as the critical value.

Hypothesis Testing Formula

Depending upon the type of data available and the size, different types of hypothesis testing are used to determine whether the null hypothesis can be rejected or not. The hypothesis testing formula for some important test statistics are given below:

  • z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\overline{x}\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation and n is the size of the sample.
  • t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\). s is the sample standard deviation.
  • \(\chi ^{2} = \sum \frac{(O_{i}-E_{i})^{2}}{E_{i}}\). \(O_{i}\) is the observed value and \(E_{i}\) is the expected value.

We will learn more about these test statistics in the upcoming section.

Types of Hypothesis Testing

Selecting the correct test for performing hypothesis testing can be confusing. These tests are used to determine a test statistic on the basis of which the null hypothesis can either be rejected or not rejected. Some of the important tests used for hypothesis testing are given below.

Hypothesis Testing Z Test

A z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known. It can also be used to compare the mean of two samples. It is used to compute the z test statistic. The formulas are given as follows:

  • One sample: z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).
  • Two samples: z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing t Test

The t test is another method of hypothesis testing that is used for a small sample size (n < 30). It is also used to compare the sample mean and population mean. However, the population standard deviation is not known. Instead, the sample standard deviation is known. The mean of two samples can also be compared using the t test.

  • One sample: t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\).
  • Two samples: t = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing Chi Square

The Chi square test is a hypothesis testing method that is used to check whether the variables in a population are independent or not. It is used when the test statistic is chi-squared distributed.

One Tailed Hypothesis Testing

One tailed hypothesis testing is done when the rejection region is only in one direction. It can also be known as directional hypothesis testing because the effects can be tested in one direction only. This type of testing is further classified into the right tailed test and left tailed test.

Right Tailed Hypothesis Testing

The right tail test is also known as the upper tail test. This test is used to check whether the population parameter is greater than some value. The null and alternative hypotheses for this test are given as follows:

\(H_{0}\): The population parameter is ≤ some value

\(H_{1}\): The population parameter is > some value.

If the test statistic has a greater value than the critical value then the null hypothesis is rejected

Right Tail Hypothesis Testing

Left Tailed Hypothesis Testing

The left tail test is also known as the lower tail test. It is used to check whether the population parameter is less than some value. The hypotheses for this hypothesis testing can be written as follows:

\(H_{0}\): The population parameter is ≥ some value

\(H_{1}\): The population parameter is < some value.

The null hypothesis is rejected if the test statistic has a value lesser than the critical value.

Left Tail Hypothesis Testing

Two Tailed Hypothesis Testing

In this hypothesis testing method, the critical region lies on both sides of the sampling distribution. It is also known as a non - directional hypothesis testing method. The two-tailed test is used when it needs to be determined if the population parameter is assumed to be different than some value. The hypotheses can be set up as follows:

\(H_{0}\): the population parameter = some value

\(H_{1}\): the population parameter ≠ some value

The null hypothesis is rejected if the test statistic has a value that is not equal to the critical value.

Two Tail Hypothesis Testing

Hypothesis Testing Steps

Hypothesis testing can be easily performed in five simple steps. The most important step is to correctly set up the hypotheses and identify the right method for hypothesis testing. The basic steps to perform hypothesis testing are as follows:

  • Step 1: Set up the null hypothesis by correctly identifying whether it is the left-tailed, right-tailed, or two-tailed hypothesis testing.
  • Step 2: Set up the alternative hypothesis.
  • Step 3: Choose the correct significance level, \(\alpha\), and find the critical value.
  • Step 4: Calculate the correct test statistic (z, t or \(\chi\)) and p-value.
  • Step 5: Compare the test statistic with the critical value or compare the p-value with \(\alpha\) to arrive at a conclusion. In other words, decide if the null hypothesis is to be rejected or not.

Hypothesis Testing Example

The best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs. Using hypothesis testing, check if there is enough evidence to support the researcher's claim. The confidence interval is given as 95%.

Step 1: This is an example of a right-tailed test. Set up the null hypothesis as \(H_{0}\): \(\mu\) = 100.

Step 2: The alternative hypothesis is given by \(H_{1}\): \(\mu\) > 100.

Step 3: As this is a one-tailed test, \(\alpha\) = 100% - 95% = 5%. This can be used to determine the critical value.

1 - \(\alpha\) = 1 - 0.05 = 0.95

0.95 gives the required area under the curve. Now using a normal distribution table, the area 0.95 is at z = 1.645. A similar process can be followed for a t-test. The only additional requirement is to calculate the degrees of freedom given by n - 1.

Step 4: Calculate the z test statistic. This is because the sample size is 30. Furthermore, the sample and population means are known along with the standard deviation.

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).

\(\mu\) = 100, \(\overline{x}\) = 112.5, n = 30, \(\sigma\) = 15

z = \(\frac{112.5-100}{\frac{15}{\sqrt{30}}}\) = 4.56

Step 5: Conclusion. As 4.56 > 1.645 thus, the null hypothesis can be rejected.

Hypothesis Testing and Confidence Intervals

Confidence intervals form an important part of hypothesis testing. This is because the alpha level can be determined from a given confidence interval. Suppose a confidence interval is given as 95%. Subtract the confidence interval from 100%. This gives 100 - 95 = 5% or 0.05. This is the alpha value of a one-tailed hypothesis testing. To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025.

Related Articles:

  • Probability and Statistics
  • Data Handling

Important Notes on Hypothesis Testing

  • Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.
  • It involves the setting up of a null hypothesis and an alternate hypothesis.
  • There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.
  • Hypothesis testing can be classified as right tail, left tail, and two tail tests.

Examples on Hypothesis Testing

  • Example 1: The average weight of a dumbbell in a gym is 90lbs. However, a physical trainer believes that the average weight might be higher. A random sample of 5 dumbbells with an average weight of 110lbs and a standard deviation of 18lbs. Using hypothesis testing check if the physical trainer's claim can be supported for a 95% confidence level. Solution: As the sample size is lesser than 30, the t-test is used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) > 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 5, s = 18. \(\alpha\) = 0.05 Using the t-distribution table, the critical value is 2.132 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = 2.484 As 2.484 > 2.132, the null hypothesis is rejected. Answer: The average weight of the dumbbells may be greater than 90lbs
  • Example 2: The average score on a test is 80 with a standard deviation of 10. With a new teaching curriculum introduced it is believed that this score will change. On random testing, the score of 38 students, the mean was found to be 88. With a 0.05 significance level, is there any evidence to support this claim? Solution: This is an example of two-tail hypothesis testing. The z test will be used. \(H_{0}\): \(\mu\) = 80, \(H_{1}\): \(\mu\) ≠ 80 \(\overline{x}\) = 88, \(\mu\) = 80, n = 36, \(\sigma\) = 10. \(\alpha\) = 0.05 / 2 = 0.025 The critical value using the normal distribution table is 1.96 z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) z = \(\frac{88-80}{\frac{10}{\sqrt{36}}}\) = 4.8 As 4.8 > 1.96, the null hypothesis is rejected. Answer: There is a difference in the scores after the new curriculum was introduced.
  • Example 3: The average score of a class is 90. However, a teacher believes that the average score might be lower. The scores of 6 students were randomly measured. The mean was 82 with a standard deviation of 18. With a 0.05 significance level use hypothesis testing to check if this claim is true. Solution: The t test will be used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) < 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 6, s = 18 The critical value from the t table is -2.015 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = \(\frac{82-90}{\frac{18}{\sqrt{6}}}\) t = -1.088 As -1.088 > -2.015, we fail to reject the null hypothesis. Answer: There is not enough evidence to support the claim.

go to slide go to slide go to slide

hypothesis testing of two samples

Book a Free Trial Class

FAQs on Hypothesis Testing

What is hypothesis testing.

Hypothesis testing in statistics is a tool that is used to make inferences about the population data. It is also used to check if the results of an experiment are valid.

What is the z Test in Hypothesis Testing?

The z test in hypothesis testing is used to find the z test statistic for normally distributed data . The z test is used when the standard deviation of the population is known and the sample size is greater than or equal to 30.

What is the t Test in Hypothesis Testing?

The t test in hypothesis testing is used when the data follows a student t distribution . It is used when the sample size is less than 30 and standard deviation of the population is not known.

What is the formula for z test in Hypothesis Testing?

The formula for a one sample z test in hypothesis testing is z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) and for two samples is z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

What is the p Value in Hypothesis Testing?

The p value helps to determine if the test results are statistically significant or not. In hypothesis testing, the null hypothesis can either be rejected or not rejected based on the comparison between the p value and the alpha level.

What is One Tail Hypothesis Testing?

When the rejection region is only on one side of the distribution curve then it is known as one tail hypothesis testing. The right tail test and the left tail test are two types of directional hypothesis testing.

What is the Alpha Level in Two Tail Hypothesis Testing?

To get the alpha level in a two tail hypothesis testing divide \(\alpha\) by 2. This is done as there are two rejection regions in the curve.

IMAGES

  1. Two Sample t Test (Independent Samples)

    hypothesis testing of two samples

  2. PPT

    hypothesis testing of two samples

  3. Hypothesis Testing Solved Examples(Questions and Solutions)

    hypothesis testing of two samples

  4. PPT

    hypothesis testing of two samples

  5. Ch8: Hypothesis Testing (2 Samples)

    hypothesis testing of two samples

  6. Chapter 8 Hypothesis Testing with Two Samples LarsonFarber

    hypothesis testing of two samples

VIDEO

  1. Hypothesis Testing Two Sample Test Chapter 10

  2. Two-Sample Hypothesis Testing: Dependent Sample

  3. Two Sample Hypothesis Testing

  4. Hypothesis Testing Two Samples LEC 157

  5. Introduction to Hypothesis Testing for 2 Samples

  6. Hypothesis Testing for the Difference Between Two Population Proportions

COMMENTS

  1. 10: Hypothesis Testing with Two Samples

    10.5: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  2. Two Sample t-test: Definition, Formula, and Example

    A two sample t-test is used to determine whether or not two population means are equal. ... 0.05, and 0.01) then you can reject the null hypothesis. Two Sample t-test: Assumptions. For the results of a two sample t-test to be valid, the following assumptions should be met:

  3. PDF Two Samples Hypothesis Testing

    Statisticians refer to this case (equal n in the two samples) as a paired samples hypothesis test. The procedure is very similar to the single-sample hypothesis tests we have already discussed, except that we replace variable x by the difference between the two variables, δ = x − x . B A.

  4. Two-Sample t-Test

    The two-sample t-test (also known as the independent samples t-test) ... We can reject the hypothesis of equal mean body fat for the two groups and conclude that we have evidence body fat differs in the population between men and women. The software shows a p-value of 0.0107. We decided on a 5% risk of concluding the mean body fat for men and ...

  5. Two-sample hypothesis testing

    In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant . There are a large number of statistical tests that ...

  6. Hypotheses for a two-sample t test (video)

    If that's below your significance level, then you would reject your null hypothesis and it would suggest the alternative that might be that, "Hey, maybe this mean "is greater than zero." On the other hand, a two-sample T test is where you're thinking about two different populations. For example, you could be thinking about a population of men ...

  7. Putting It Together: Hypothesis Testing with Two Samples

    The difference of two proportions is approximately normal if there are at least five successes and five failures in each sample. When conducting a hypothesis test for a difference of two proportions, the random samples must be independent and the population must be at least ten times the sample size.

  8. Hypothesis Testing

    Hypothesis testing example. You want to test whether there is a relationship between gender and height. Based on your knowledge of human physiology, you formulate a hypothesis that men are, on average, taller than women. To test this hypothesis, you restate it as: H 0: Men are, on average, not taller than women. H a: Men are, on average, taller ...

  9. Hypothesis Testing for 2 Samples: Introduction

    The mean for the last recorded percentage was less than half of the initial score: 30.27 (SD 34.03). The decrease was found to be statistically significant using a paired sample t-test (t = 4.36, 36 df, p < .001).". This is a hypothesis test for matched pairs, sometimes known as 2 means, dependent samples.

  10. Hypothesis Testing: Two Samples

    The Population Mean: This image shows a series of histograms for a large number of sample means taken from a population.Recall that as more sample means are taken, the closer the mean of these means will be to the population mean. In this section, we explore hypothesis testing of two independent population means (and proportions) and also tests for paired samples of population means.

  11. Two-sample t test for difference of means

    And let's assume that we are working with a significance level of 0.05. So pause the video, and conduct the two sample T test here, to see whether there's evidence that the sizes of tomato plants differ between the fields. Alright, now let's work through this together. So like always, let's first construct our null hypothesis.

  12. Two sample testing

    It is often done using a hypothesis test - hence the name "two sample testing". This is also called A/B testing. The natural hypotheses for this situation are: H 0: the two samples are generated from the same distribution. H A: the two samples are generated from two different distributions. The test statistic is normally based on the ...

  13. Why It Matters: Hypothesis Testing with Two Samples

    The concepts discussed in the module Hypothesis Testing with One Sample can be applied to situations involving two samples. The reason we can do this is due to the following big ideas: Random samples vary. When we use a sample proportion or sample mean to make an inference about a population proportion or population mean, there is uncertainty.

  14. Two independent sampler t-tests: Formula & Examples

    What is independent-samples conversely unspared two samples T-test? The independent browse T-test is defined as statistical hypothesis testing technique in whatever the samples from two stand-alone groups are compared to determine if aforementioned means of the associated population are substantial different. The check compares the means of two groups, such as a control group and adenine ...

  15. 10: Hypothesis Testing with Two Samples

    When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  16. Hypothesis Testing with Two Samples: Matched or Paired Samples

    When using a hypothesis test for matched or paired samples, the following characteristics may be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  17. 10.5 Hypothesis Testing for Two Means and Two Proportions

    The student will conduct hypothesis tests and interpret the results. Supplies: The business section from two consecutive days' newspapers. Three small packages of multicolored chocolates. Five small packages of peanut butter candies. Increasing Stocks Survey. Look at yesterday's newspaper business section. Conduct a hypothesis test to ...

  18. 10: Hypothesis Testing with Two Samples

    10.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  19. Hypothesis Testing

    To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025. Related Articles: Probability and Statistics; Data Handling; Data; Important Notes on Hypothesis Testing. Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.