Module 10: Hypothesis Testing With Two Samples

Putting it together: hypothesis testing with two samples, let’s summarize.

  • The steps for performing a hypothesis test for two population means with unknown standard deviation is generally the same as the steps for conducting a hypothesis test for one population mean with unknown standard deviation, using a t -distribution.
  • Because the population standard deviations are not known, the sample standard deviations are used for calculations.
  • When the sum of the sample sizes is more than 30, a normal distribution can be used to approximate the student’s  t -distribution.
  • The difference of two proportions is approximately normal if there are at least five successes and five failures in each sample.
  • When conducting a hypothesis test for a difference of two proportions, the random samples must be independent and the population must be at least ten times the sample size.
  • When calculating the standard error for the difference in sample proportions, the pooled proportion must be used.
  • When two measurements (samples) are drawn from the same pair of individuals or objects, the differences from the sample are used to conduct the hypothesis test.
  • The distribution that is used to conduct the hypothesis test on the differences is a t -distribution.
  • Provided by : Lumen Learning. License : CC BY: Attribution
  • Introductory Statistics. Authored by : Barbara Illowsky, Susan Dean. Provided by : OpenStax. Located at : https://openstax.org/books/introductory-statistics/pages/1-introduction . License : CC BY: Attribution . License Terms : Access for free at https://openstax.org/books/introductory-statistics/pages/1-introduction

Footer Logo Lumen Candela

Privacy Policy

logo

Introduction to Data Science I & II

Two sample testing, two sample testing #.

In many applications there is an interest in comparing two random samples; for example, investigate differences in cholesterol levels between two groups of patients. It is often done using a hypothesis test - hence the name “two sample testing”. This is also called A/B testing.

The natural hypotheses for this situation are:

\(H_0\) : the two samples are generated from the same distribution.

\(H_A\) : the two samples are generated from two different distributions.

The test statistic is normally based on the difference in a specified sample summary; for example, difference in means, or medians, or standard deviations (if we expect the sample to differ in their variability).

We illustrate this with a classic diabetes dataset from the National Institute of Diabetes and Digestive and Kidney Diseases. The subjects of this dataset are females at least 21 years old, and the goal was to predict diabetes status that is summarized in the column called “Outcome”.

We will focus in this example on BMI. Below are boxplots for the two diabetes status groups.

../../_images/HypothesisTesting_3_TwoSample_4_0.png

There are several observations from the above plots:

The distributions of BMI in the two groups seem different; for example, the median BMI is larger in diabetics.

There are some subjects for which the recorded value for BMI is equal to 0; this suggests that missing data were recorded as 0 and we will have to take that into account in our analysis.

Below, we create two arrays that contain the BMI values in the two groups after removing the missing data.

We have two samples here of size \(n_0=491\) and \(n_1=266\) , and the null hypothesis we investigate is: BMI distributions in diabetics and non-diabetics subjects are the same.

The test statistic we will use is the difference in sample medians, with an observed value of 4.2:

The next step is to obtain an approximation for the sampling distribution of our test statistic. The procedure we implement, called a permutation test uses the following observations:

If the null hypothesis is true: a BMI value is equally likely to be sampled from diabetics and non-diabetics

If the null hypothesis is true: all rearrangements (permutations) of BMI values among the two groups are equally likely

If the null hypothesis is true: the observed test statistic can be viewed as a sample from the distribution of median differences of permuted BMI values in two groups.

It suggests the following simulation to learn the null distribution for the test statistic:

Shuffle (permute) the BMI values

Assign \(n_0=491\) to “Group A“ and the rest to “Group B“ (to maintain the two sample sizes)

Find the differences between medians of the two shuffled (permuted) groups

The generated distribution and the value of the test statistic are used to calculate a p-value.

We first illustrate how to create shuffled samples and calculate the corresponding test statistic. We use the numpy function random.permutation to create an array that has the same values but with order that is shuffled: the first part of the new array will correspond to the control group.

In the cell code below, we repeat the procedure 5000 times and create an approximation for the distribution of our test statistic that is saved in the array differences .

../../_images/HypothesisTesting_3_TwoSample_13_0.png

From the above histogram, we can see that there is strong evidence against the null hypothesis that the distributions of BMI in cases and controls are the same.

Please note that the choice of test statistic could have a big impact on the conclusions from the test. Below, we repeat the procedure using as test statistic the difference in standard deviations of the two samples. There is no evidence, when using this statistic, that the distributions are different.

../../_images/HypothesisTesting_3_TwoSample_15_0.png

Introduction to Statistics and Data Science

Chapter 15 hypothesis testing: two sample tests, 15.1 two sample t test.

We can also use the t test command to conduct a hypothesis test on data where we have samples from two populations. To introduce this lets consider an example from sports analytics. In particular, let us consider the NBA draft and the value of a lottery pick in the draft. Teams which do make the playoffs are entered into a lottery to determine the order of the top picks in the draft for the following year. These top 14 picks are called lottery picks.

Using historical data we might want to investigate the value of a lottery pick against those players who were selected outside the lottery.

We can now make a boxplot comparing the career scoring averages of the lottery picks between these two pick levels.

hypothesis testing 2 samples

From this boxplot we notice that the lottery picks tend to have a higher point per game (PPG) average. However, we certainly see many exceptions to this rule. We can also compute the averages of the PTS column for these two groups:

This table once again demonstrates that the lottery picks tend to average more points. However, we might like to test this trend to see if have sufficient evidence to conclude this trend is real (this could also just be a function of sampling error).

15.1.1 Regression analysis

Our first technique for looking for a difference between our two categories is linear regression with a categorical explanatory variable. We fit a regression model of the form: \[PTS=\beta \delta_{\text{ not lottery}}+\alpha\] Where \(\delta_{\text{ not lottery}}\) is equal to one if the draft pick fell outside the lottery and zero otherwise.

To see if this relationship is real we can form a confidence interval for the coefficients.

From this we can see that Lottery picks to tend to average more point per game over their careers. The magnitude of this effect is somewhere between 3.5 and 4.7 points more for lottery picks.

15.1.2 Two Sample t test approach

For this we can use the two-sample t-test to compare the means of these two distinct populations.

Here the alternative hypothesis is that the lottery players score more points \[H_A: \mu_L > \mu_{NL}\] thus the null hypothesis is \[H_0: \mu_L \leq \mu_{NL}.\] We can now perform the test in R using the same t.test command as before.

Notice that I used the magic tilde ~ to split the PTS column into the lottery/non-lottery pick subdivisions. I could also do this manually and get the same answer:

The very small p-value here indicates that the population mean of the lottery picks is truly greater than the population mean of the non-lottery picks.

The 95% confidence interval also tells us that this difference is rather large (at least 3.85 points).

Conditions for using a two-sample t test:

These are roughly the same as the conditions for using a one sample t test, although we now need to assume that BOTH samples satisfy the conditions.

Must be looking for a difference in the population means (averages)

30 or greater samples in both groups (CLT)

  • If you have less than 30 in one sample, you can use the t test must you must then assume that the population is roughly mound shaped.

At this point you would probably like to know why we would ever want to do a two sample t test instead of a linear regression?

My answer is that a two sample t test is more robust against a difference in variance between the two groups. Recall, that one of the assumptions of simple linear regression is that the variance of the residuals does not depend on the explanatory variable(s). By default R does a type of t test which does not assume equal variance between the two groups. This is the one advantage of using the t test command.

15.1.2.1 Paired t test

Lets say we are trying to estimate the effect of a new training regiment on the 40 yard dash times for soccer players. Before implementing the training regime we measure the 40 yard dash times of the 30 players. First lets read this data set into R.

First, we can compare the mean times before and after the training:

Also we could make a side by side boxplot for the soccer players times before and after the training

hypothesis testing 2 samples

We could do a simple t test to examine whether mean of the players times after the training regime is implemented decrease (on average). Here we have the alternative hypothesis that \(H_a: \mu_b-\mu_a>0\) and thus the null hypothesis that \(H_0: \mu_b-\mu_a \leq 0\) . Using the two sample t test format in R we have:

Here we cannot reject the null hypothesis that the training had no effect on the players sprinting performance. However, we haven’t used all of the information available to us in this scenario. The t test we have just run doesn’t know that we recorded the before and after for the same players more than once. As far as R knows the before and after times could be entirely different players as if we are comparing the results between one team which received the training and one who didn’t. Therefore, R has to be pretty conservative in its predictions. The differences between the two groups could be due to many reasons other than the training regime implemented. Maybe the second set of players just started off being a little bit faster, etc.

The data we collected is actually more powerful because we know the performance of the same players before and after the test. This greatly reduces the number of variables which need to be accounted for in our statistical test. Luckily, we can easily let R know that our data points are paired .

Setting the paired keyword to true lets R know that the two columns should be paired together during the test. We can see that running the a paired t test gives us a much smaller p value. Moreover, we can now safely conclude that the new training regiment is effective in at least modestly reducing the 40 yard dash times of the soccer players.

This is our first example of the huge subject of experimental design which is the study of methods which can be used to create data sets which have more power to distinguish differences between groups. Where possible it is better to collect data for the same subjects under two conditions as this will allow for more powerful statistical analysis of the data (i.e a paired t test instead of a normal t test).

Whenever the assumptions are met for a paired t test, you will be expected to perform a paired t test in this class.

15.2 Two Sample Proportion Tests

We can also use statistical hypothesis testing to compare the proportion between two samples. For example, we might conduct a survey of 100 smokers and 50 non-smokers to see whether they buy organic foods. If we find that 30/100 smokers buy organic and only 11/50 non-smokers buy organic then can we conclude that more smokers buy organic foods that smokers? \(H_a: p_s > p_n\) and \(H_0: p_s \leq p_n\) .

In this case we don’t have sufficient evidence to conclude that a larger fraction of smokers buy organic foods. It is common when analyzing survey data to want to compare proportions between populations.

The key assumptions when performing a two-sample proportion test are that we have at least 5 successes and 5 failures in BOTH samples.

15.3 Extra Example: Birth Weights and Smoking

For this example we are going to use a data from a study on the risk factors associated with giving birth to a low-weight baby (sometimes defined as less than 2,500 grams). This data set is another one which is build into R . To load this data for analysis type:

You can view all a description of the data by typing ?birthwt once it is loaded. To begin we could look at the raw birth weight of mothers who were smokers versus non-smokers. We can do some EDA on this data using a boxplot:

hypothesis testing 2 samples

From the boxplot we can see that the median birth weight of babies whose mothers smoked was smaller. We can test the data for a difference in the means using a t.test command.

Notice we can use the ~ shorthand to split the data into those two groups faster than filtering. Here we get a small p value meaning we have sufficient evidence to reject the null hypothesis that the mean weight of babies of women who smoked is greater than or equal to those of non-smokers.

Within this data set we also have a column low which classifies whether the babies birth weight is considered low using the medical criterion (birth weight less than 2,500 grams):

We can see that smoking gives a higher fraction of low-weight births. However, this could just be due to sampling error so let’s run a proportion test to find out.

Once again we find we have sufficient evidence to reject the null hypothesis that smoking does not increase the risk of a low birth weight.

15.4 Homework

15.4.1 concept questions.

  • What the assumptions behind using a two sample proportion test? Hint these will be the same as forming a confidence interval for for the fraction of a population, with two samples where this assumption needs to hold.
  • What assumptions are required for a two sample t test with small \(N\leq 30\) sample sizes?
  • A paired t test may be used for any two sample experiment (True/False)
  • The power of any statistical test will increase with increasing sample sizes. (True/False)
  • Where possible it is better to collect data on the same individuals when trying to distinguish a difference in the average response to a condition (True/False)
  • The paired t test is a more powerful statistical test than a normal t test (True/ False)

15.4.2 Practice Problems

For each of the scenarios below form the null and alternative hypothesis.

  • We have conducted an educational study on two classrooms of 30 students using two different teaching methods. The first method had 50% of students pass a standardized test, and the classroom using the second teaching method had 60% of the students pass.
  • A basketball coach is extremely superstitious and believes that when he wears his lucky tie the team has a greater chance of winning the game. He comes to you because he is looking to design an experiment to test this belief. If the team has 40 games in the upcoming season, design an experiment and the (null and alt) hypothesis to test the coaches claims.

For the below question work out the number of errors in the data set.

  • Before the Olympics all athletes are required to submit a urine sample to be tested for banned substances. This is done by estimating the concentration of certain compounds in the urine and is prone to some degree of laboratory error. In addition, the concentration of these compounds are known to vary with the individual (genetic, diet, etc). To weigh the evidence present in a drug test the laboratory conducts a statistical test. To ensure they don’t falsely convict athletes of doping they use a significance level of \(\alpha=0.01\) . If they test 3000 athletes, all of whom are clean about how many will be falsely accused of doping? Explain the issue with this procedure.

15.4.3 Advanced Problems

Load the drug_use data set from the fivethirtyeight package. Run a hypothesis test to determine if a larger proportion of 22-23 year olds are using marijuana then 24-25 year olds. Interpret your results statistically and practically.

Import the data set Cavaliers_Home_Away_2016 . Form a hypothesis on whether being home or away for the game had an effect on the proportion of games won by the Cavaliers during the 2016-2017 season, test this hypothesis using a hypothesis test.

Load the data set animal_sleep and compare the average total sleep time (sleep_total column) between carnivores and herbivores (using the vore column) to divide the between the two categories. To begin make a boxplot to compare the total sleep time between these two categories. Do we have sufficient evidence to conclude the average total sleep time differs between these groups?

Load the HR_Employee_Attrition data set. We wish to investigate whether the daily rate (pay) has anything to do with whether a employee has quit (the attrition column is “Yes”). To begin make a boxplot of the DailyRate column split into these Attrition categories. Use the boxplot to help form the null hypothesis for your test and decide on an alternative hypothesis. Conduct a statistical hypothesis test to determine if we have sufficient evidence to conclude that those employees who quit tended to be paid less. Report and interpret the p value for your test.

Load the BirdCaptureData data set. Perform a hypothesis test to determine if the proportion of orange-crowned warblers (SpeciesCode==OCWA) caught at the station is truly less than the proportion of Yellow Warblers (SpeciesCode==YWAR). Report your p value and interpret the results statistically and practically.

(All of Statistics Problem) In 1861, 10 essays appeared in the New Orleans Daily Crescent. They were signed “Quintus Curtius Snodgrass” and one hypothesis is that these essays were written by Mark Twain. One way to look for similarity between writing styles is to compare the proportion of three letter words found in two works. For 8 Mark Twain essays we have:

From 10 Snodgrass essays we have that:

  • Perform a two sample t test to examine these two data sets for a difference in the mean values. Report your p value and a 95% confidence interval for the results.
  • What are the issues with using a t-test on this data?

Consider the analysis of the kidiq data set again.

  • Run a regression with kid_score as the response and mom_hs as the explanatory variable and look at the summary() of your results. Notice the p-value which is reported in the last line of the summary. This “F-test” is a hypothesis test with the null hypothesis that the explanatory variable tells us nothing about the value of the response variable.
  • Perform a t test for the a difference in means in the kid_score values based on the mom_hs column. What is your conclusion?
  • Repeat the t test again using the command:

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.5 - hypothesis testing for two-sample proportions.

We are now going to develop the hypothesis test for the difference of two proportions for independent samples. The hypothesis test follows the same steps as one group.

These notes are going to go into a little bit of math and formulas to help demonstrate the logic behind hypothesis testing for two groups. If this starts to get a little confusion, just skim over it for a general understanding! Remember we can rely on the software to do the calculations for us, but it is good to have a basic understanding of the logic!

We will use the sampling distribution of \(\hat{p}_1-\hat{p}_2\) as we did for the confidence interval.

For a test for two proportions, we are interested in the difference between two groups. If the difference is zero, then they are not different (i.e., they are equal). Therefore, the null hypothesis will always be:

\(H_0\colon p_1-p_2=0\)

Another way to look at it is \(H_0\colon p_1=p_2\). This is worth stopping to think about. Remember, in hypothesis testing, we assume the null hypothesis is true. In this case, it means that \(p_1\) and \(p_2\) are equal. Under this assumption, then \(\hat{p}_1\) and \(\hat{p}_2\) are both estimating the same proportion. Think of this proportion as \(p^*\).

Therefore, the sampling distribution of both proportions, \(\hat{p}_1\) and \(\hat{p}_2\), will, under certain conditions, be approximately normal centered around \(p^*\), with standard error \(\sqrt{\dfrac{p^*(1-p^*)}{n_i}}\), for \(i=1, 2\).

We take this into account by finding an estimate for this \(p^*\) using the two-sample proportions. We can calculate an estimate of \(p^*\) using the following formula:

\(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\)

This value is the total number in the desired categories \((x_1+x_2)\) from both samples over the total number of sampling units in the combined sample \((n_1+n_2)\).

Putting everything together, if we assume \(p_1=p_2\), then the sampling distribution of \(\hat{p}_1-\hat{p}_2\) will be approximately normal with mean 0 and standard error of \(\sqrt{p^*(1-p^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}\), under certain conditions.

\(z^*=\dfrac{(\hat{p}_1-\hat{p}_2)-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...will follow a standard normal distribution.

Finally, we can develop our hypothesis test for \(p_1-p_2\).

Hypothesis Testing for Two-Sample Proportions

Conditions :

\(n_1\hat{p}_1\), \(n_1(1-\hat{p}_1)\), \(n_2\hat{p}_2\), and \(n_2(1-\hat{p}_2)\) are all greater than five

Test Statistic:

\(z^*=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...where \(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\).

The critical values, p-values, and decisions will all follow the same steps as those from a hypothesis test for a one-sample proportion.

BUS204: Business Statistics

hypothesis testing 2 samples

Hypothesis Testing with Two Samples

Read this chapter, which discusses how to compare data from two similar groups. This is useful when, for example, you want to analyze things like how someone's income relates to another sample that you are interested in. Make sure you read the introduction as well as sections 10.1 through 10.6. Attempt the practice problems and homework at the end of the chapter.

Matched or Paired Samples

  • Simple random sampling is used.
  • Sample sizes are often small.
  • Two measurements (samples) are drawn from the same pair of individuals or objects.
  • Differences are calculated from the matched or paired samples.
  • The differences form the sample that is used for the hypothesis test.
  • Either the matched pairs have differences that come from a population that is normal or the number of differences is sufficiently large so that distribution of the sample mean of differences is approximately normal.

In a hypothesis test for matched or paired samples, subjects are matched in pairs and differences are calculated. The differences are the data. The population mean for the differences, μd, is then tested using a Student's-t test for a single population mean with n – 1 degrees of freedom, where n is the number of differences, that is, the number of pairs not the number of observations.

The null and alternative hypotheses for this test are:

The test statistic is:

Example 10.9

Problem A company has developed a training program for its entering employees because they have become concerned with the results of the six-month employee review. They hope that the training program can result in better six-month reviews. Each trainee constitutes a "pair", the entering score the employee received when first entering the firm and the score given at the six-month review. The difference in the two scores were calculated for each employee and the means for before and after the training program was calculated. The sample mean before the training program was 20.4 and the sample mean after the training program was 23.9. The standard deviation of the differences in the two scores across the 20 employees was 3.8 points. Test at the 10% significance level the null hypothesis that the two population means are equal against the alternative that the training program helps improve the employees' scores.

Solution 1 The first step is to identify this as a two sample case: before the training and after the training. This differentiates this problem from simple one sample issues. Second, we determine that the two samples are "paired". Each observation in the first sample has a paired observation in the second sample. This information tells us that the null and alternative hypotheses should be:

This form reflects the implied claim that the training course improves scores; the test is one-tailed and the claim is in the alternative hypothesis. Because the experiment was conducted as a matched paired sample rather than simply taking scores from people who took the training course those who didn't, we use the matched pair test statistic:

In order to solve this equation, the individual scores, pre-training course and post-training course need to be used to calculate the individual differences. These scores are then averaged and the average difference is calculated:

From these differences we can calculate the standard deviation across the individual differences:

We can now compare the calculated value of the test statistic, 4.12, with the critical value. The critical value is a Student's t with degrees of freedom equal to the number of pairs, not observations, minus 1. In this case 20 pairs and at 90% confidence level t a/2 = ±1.729 at df = 20 - 1 = 19. The calculated test statistic is most certainly in the tail of the distribution and thus we cannot accept the null hypothesis that there is no difference from the training program. Evidence seems indicate that the training aids employees in gaining higher scores.

Example 10.10

A study was conducted to investigate the effectiveness of hypnotism in reducing pain. Results for randomly selected subjects are shown in Table 10.5. A lower score indicates less pain. The "before" value is matched to an "after" value and the differences are calculated. Are the sensory measurements, on average, lower after hypnotism? Test at a 5% significance level.

Figure 10.9

Example 10.11

Figure 10.10

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Inference for Comparing 2 Population Proportions (HT for 2 Proportions)

Now we get to the good stuff! We will need to know how to label the null and alternative hypothesis, calculate the test statistic, and then reach our conclusion using the critical value method or the p-value method.

The Test Statistic for a 2 Proportion Test:

[latex]z = \displaystyle \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\displaystyle \frac{\bar{p} \times \bar{q}}{n_1} + \displaystyle \frac{\bar{p} \times \bar{q}}{n_2}}}[/latex]

What the different symbols mean:

[latex]x_1[/latex] is the number of successes or observations in the first group (not always needed)

[latex]n_1[/latex] is the sample size from the first group (number of people, items, etc… in the study)

[latex]p_1[/latex] is the population proportion for the first group; this will be used in the null and alternative hypotheses as well

[latex]\hat{p_1}[/latex] is the sample proportion (or percentage) for the first group, given by [latex]\hat{p_1} = \frac{x_1}{n_1}[/latex]

[latex]\hat{q_1}[/latex] is what is left over from the sample proportion (or percentage) for the first group, given by [latex]\hat{q_1} = 1 - \hat{p_1}[/latex]

[latex]x_2[/latex] is the number of successes or observations in the second group (not always needed)

[latex]n_2[/latex] is the sample size from the second group (number of people, items, etc… in the study)

[latex]p_2[/latex] is the population proportion for the second group; this will be used in the null and alternative hypotheses as well

[latex]\hat{p_2}[/latex] is the sample proportion (or percentage) for the second group, given by [latex]\hat{p_2} = \frac{x_2}{n_2}[/latex]

[latex]\hat{q_2}[/latex] is what is left over from the sample proportion (or percentage) for the second group, given by [latex]\hat{q_2} = 1 - \hat{p_2}[/latex]

[latex]\bar{p} = \displaystyle \frac{x_1 + x_2}{n_1 + n_2}[/latex] is the pooled sample proportion,  which combines the two sample proportions into a single value

[latex]\bar{q} = 1 - \bar{p}[/latex]

[latex]\alpha[/latex] is the significance level , usually given within the problem, or if not given, we assume it to be 5% or 0.05

Assumptions when conducting a 2 Proportion Test:

  • We have a simple random sample
  • The two samples or groups are independent
  • There are at least 5 successes and at least 5 failures for each of the samples
  • [latex]n\hat{p} \ge 5[/latex] and [latex]n\hat{q} \ge 5[/latex]

Steps to conduct the 2 Proportion Test:

  • Identify all the symbols listed above (all the stuff that will go into the formulas). This includes [latex]x_1[/latex] and [latex]x_2[/latex] (if necessary), [latex]n_1[/latex] and [latex]n_2[/latex], [latex]\hat{p_1}[/latex] and [latex]\hat{q_1}[/latex], [latex]\hat{p_2}[/latex] and [latex]\hat{q_2}[/latex], [latex]\bar{p}[/latex] and [latex]\bar{q}[/latex], and [latex]\alpha[/latex]
  • Identify the null and alternative hypotheses
  • Calculate the test statistic, [latex]z = \displaystyle \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\displaystyle \frac{\bar{p} \times \bar{q}}{n_1} + \displaystyle \frac{\bar{p} \times \bar{q}}{n_2}}}[/latex]
  • Find the critical value(s) OR the p-value OR both
  • Apply the Decision Rule
  • Write up a conclusion for the test

Example 1: Race/Name Resume Study [1]

In this study , investigators created mock identical resumés, which were sent to job placement ads in Chicago and Boston. Each resumé was randomly assigned either a commonly-white or commonly-black name. In total, 246 out of 2445 commonly-white named resumés received a callback and 164 out of 2445 commonly-black named resumés received a callback. Is there compelling evidence to conclude that callback rates are higher for common white names vs. common black names?

Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with rates or percents from two samples or groups (the applicants with common white names and those with common black names), so we will conduct a 2 Proportion Test.

  • [latex]x_1 = 246[/latex] is the number of callbacks for applicants with common white names
  • [latex]n_1 = 2445[/latex] is the sample size from the first group; those with common white names
  • [latex]\hat{p_1} = \displaystyle \frac{x_1}{n_1} = \displaystyle \frac{246}{2445} = 0.101[/latex]
  • [latex]\hat{q_1} = 1 - \hat{p_1}\ = 1 - 0.101 = 0.899[/latex]
  • [latex]x_2 = 164[/latex] is the number of callbacks for applicants with common black names
  • [latex]n_2 = 2445[/latex] is the sample size from the second group; those with common black names
  • [latex]\hat{p_2} = \displaystyle \frac{x_2}{n_2} = \displaystyle \frac{164}{2445} = 0.067[/latex]
  • [latex]\hat{q_2} = 1 - \hat{p_2} = 1 - 0.067 = 0.933[/latex]
  • [latex]\bar{p} = \displaystyle \frac{x_1 + x_2}{n_1 + n_2} = \displaystyle \frac{246 + 164}{2445 + 2445} = 0.084[/latex]
  • [latex]\bar{q} = 1 - \bar{p} = 1 - 0.084 = 0.916[/latex]
  • [latex]\alpha = 0.05[/latex] (we were not told a specific value in the problem, so we are assuming it is 5%)
  • [latex]H_{0}: p_1 = p_2[/latex]
  • [latex]H_{A}: p_1 > p_2[/latex]
  • [latex]z = \displaystyle \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\displaystyle \frac{\bar{p} \times \bar{q}}{n_1} + \displaystyle \frac{\bar{p} \times \bar{q}}{n_2}}} = \displaystyle \frac{0.101 - 0.067}{\sqrt{\displaystyle \frac{0.084 \times 0.916}{2445} + \displaystyle \frac{0.084 \times 0.916}{2445}}} = 4.29[/latex]
  • P-Value:  The p-value is found by looking up the test statistic calculated (in this case [latex]z = 4.29[/latex]) in the normal distribution table. We find that this corresponds to a value of [latex]0.9999[/latex]. Since this is a “greater than” test, we subtract from one, and get [latex]p-value = 1 - 0.9999 = 0.0001[/latex].
  • Applying the Decision Rule: We now compare this to our significance level, which is 0.05. If the p-value is smaller or equal to the alpha level, we have enough evidence for our claim, otherwise we do not. Here, [latex]p-value = 0.0001[/latex], which is definitely smaller than [latex]\alpha = 0.05[/latex], so we have enough evidence for the alternative hypothesis…but what does this mean?
  • Conclusion: Because our p-value  of [latex]0.0001[/latex] is less than our [latex]\alpha[/latex] level of [latex]0.05[/latex], we reject [latex]H_{0}[/latex]. We have convincing evidence that the callback rate for common white names is higher than the callback rate for common black names.

Example 2: Seat Belt Use in New York and Boston [2]

Police officers in New York City can stop a driver who is not wearing their seat belt. In Boston, police officers can issue citations to drivers for not wearing their seat belts ONLY if the driver has been stopped for another violation. Data from random samples of female Hispanic drivers in 2002 is summarized in the following table:

Is there compelling evidence to conclude that a smaller rate (or proportion) of drivers wear their seat belts in Boston as compared to New York?

Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with rates or percents from two samples or groups (female Hispanic drivers in the two cities), so we will conduct a 2 Proportion Test. We will think of Boston as the first group and New York as the second group

  • [latex]x_1 = 68[/latex] is the number of female Hispanic drivers in Boston who wore their seat belts
  • [latex]n_1 = 117[/latex] is the sample size from the first group; female Hispanic drivers in Boston who were part of the study
  • [latex]\hat{p_1} = \displaystyle \frac{x_1}{n_1} = \displaystyle \frac{68}{117} = 0.581[/latex]
  • [latex]\hat{q_1} = 1 - \hat{p_1}\ = 1 - 0.581 = 0.419[/latex]
  • [latex]x_2 = 183[/latex] is the number of female Hispanic drivers in New York who wore their seat belts
  • [latex]n_2 = 220[/latex] is the sample size from the first group; female Hispanic drivers in New York who were part of the study
  • [latex]\hat{p_2} = \displaystyle \frac{x_2}{n_2} = \displaystyle \frac{183}{220} = 0.832[/latex]
  • [latex]\hat{q_2} = 1 - \hat{p_2} = 1 - 0.832 = 0.168[/latex]
  • [latex]\bar{p} = \displaystyle \frac{x_1 + x_2}{n_1 + n_2} = \displaystyle \frac{68 + 183}{117 + 220} = 0.745[/latex]
  • [latex]\bar{q} = 1 - \bar{p} = 1 - 0.745 = 0.255[/latex]
  • [latex]H_{A}: p_1 < p_2[/latex]
  • [latex]z = \displaystyle \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\displaystyle \frac{\bar{p} \times \bar{q}}{n_1} + \displaystyle \frac{\bar{p} \times \bar{q}}{n_2}}} = \displaystyle \frac{0.581 - 0.832}{\sqrt{\displaystyle \frac{0.745 \times 0.255}{117} + \displaystyle \frac{0.745 \times 0.255}{220}}} = -5.03[/latex]
  • P-Value:  The p-value is found by looking up the test statistic calculated (in this case [latex]z = -5.03[/latex]) in the normal distribution table. We find that this corresponds to a value of [latex]0.0001[/latex]. Since this is a “less than” test, we keep the value from the table, and get [latex]p-value = 0.0001[/latex].
  • Conclusion: Because our p-value  of [latex]0.0001[/latex] is less than our [latex]\alpha[/latex] level of [latex]0.05[/latex], we reject [latex]H_{0}[/latex]. We have convincing evidence that there is a lower rate of seat belt use among female Hispanic drivers in Boston as compared to New York.
  • Adapted from the Skew The Script curriculum ( skewthescript.org ), licensed under CC BY-NC-Sa 4.0 ↵
  • Adapted from The Basic Practice of Statistics, 7th Edition, by Moore, Notz, and Fligner ↵

Basic Statistics Copyright © by Allyn Leon is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Statology

Statistics Made Easy

Two Sample t-test Calculator

t = -1.608761

p-value (one-tailed) = 0.060963

p-value (two-tailed) = 0.121926

hypothesis testing 2 samples

Hey there. My name is Zach Bobbitt. I have a Master of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

One Reply to “Two Sample t-test Calculator”

good work zach

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Hypothesis Testing

Hypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid.

A null hypothesis and an alternative hypothesis are set up before performing the hypothesis testing. This helps to arrive at a conclusion regarding the sample obtained from the population. In this article, we will learn more about hypothesis testing, its types, steps to perform the testing, and associated examples.

What is Hypothesis Testing in Statistics?

Hypothesis testing uses sample data from the population to draw useful conclusions regarding the population probability distribution . It tests an assumption made about the data using different types of hypothesis testing methodologies. The hypothesis testing results in either rejecting or not rejecting the null hypothesis.

Hypothesis Testing Definition

Hypothesis testing can be defined as a statistical tool that is used to identify if the results of an experiment are meaningful or not. It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses will always be mutually exclusive. This means that if the null hypothesis is true then the alternative hypothesis is false and vice versa. An example of hypothesis testing is setting up a test to check if a new medicine works on a disease in a more efficient manner.

Null Hypothesis

The null hypothesis is a concise mathematical statement that is used to indicate that there is no difference between two possibilities. In other words, there is no difference between certain characteristics of data. This hypothesis assumes that the outcomes of an experiment are based on chance alone. It is denoted as \(H_{0}\). Hypothesis testing is used to conclude if the null hypothesis can be rejected or not. Suppose an experiment is conducted to check if girls are shorter than boys at the age of 5. The null hypothesis will say that they are the same height.

Alternative Hypothesis

The alternative hypothesis is an alternative to the null hypothesis. It is used to show that the observations of an experiment are due to some real effect. It indicates that there is a statistical significance between two possible outcomes and can be denoted as \(H_{1}\) or \(H_{a}\). For the above-mentioned example, the alternative hypothesis would be that girls are shorter than boys at the age of 5.

Hypothesis Testing P Value

In hypothesis testing, the p value is used to indicate whether the results obtained after conducting a test are statistically significant or not. It also indicates the probability of making an error in rejecting or not rejecting the null hypothesis.This value is always a number between 0 and 1. The p value is compared to an alpha level, \(\alpha\) or significance level. The alpha level can be defined as the acceptable risk of incorrectly rejecting the null hypothesis. The alpha level is usually chosen between 1% to 5%.

Hypothesis Testing Critical region

All sets of values that lead to rejecting the null hypothesis lie in the critical region. Furthermore, the value that separates the critical region from the non-critical region is known as the critical value.

Hypothesis Testing Formula

Depending upon the type of data available and the size, different types of hypothesis testing are used to determine whether the null hypothesis can be rejected or not. The hypothesis testing formula for some important test statistics are given below:

  • z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\overline{x}\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation and n is the size of the sample.
  • t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\). s is the sample standard deviation.
  • \(\chi ^{2} = \sum \frac{(O_{i}-E_{i})^{2}}{E_{i}}\). \(O_{i}\) is the observed value and \(E_{i}\) is the expected value.

We will learn more about these test statistics in the upcoming section.

Types of Hypothesis Testing

Selecting the correct test for performing hypothesis testing can be confusing. These tests are used to determine a test statistic on the basis of which the null hypothesis can either be rejected or not rejected. Some of the important tests used for hypothesis testing are given below.

Hypothesis Testing Z Test

A z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known. It can also be used to compare the mean of two samples. It is used to compute the z test statistic. The formulas are given as follows:

  • One sample: z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).
  • Two samples: z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing t Test

The t test is another method of hypothesis testing that is used for a small sample size (n < 30). It is also used to compare the sample mean and population mean. However, the population standard deviation is not known. Instead, the sample standard deviation is known. The mean of two samples can also be compared using the t test.

  • One sample: t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\).
  • Two samples: t = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing Chi Square

The Chi square test is a hypothesis testing method that is used to check whether the variables in a population are independent or not. It is used when the test statistic is chi-squared distributed.

One Tailed Hypothesis Testing

One tailed hypothesis testing is done when the rejection region is only in one direction. It can also be known as directional hypothesis testing because the effects can be tested in one direction only. This type of testing is further classified into the right tailed test and left tailed test.

Right Tailed Hypothesis Testing

The right tail test is also known as the upper tail test. This test is used to check whether the population parameter is greater than some value. The null and alternative hypotheses for this test are given as follows:

\(H_{0}\): The population parameter is ≤ some value

\(H_{1}\): The population parameter is > some value.

If the test statistic has a greater value than the critical value then the null hypothesis is rejected

Right Tail Hypothesis Testing

Left Tailed Hypothesis Testing

The left tail test is also known as the lower tail test. It is used to check whether the population parameter is less than some value. The hypotheses for this hypothesis testing can be written as follows:

\(H_{0}\): The population parameter is ≥ some value

\(H_{1}\): The population parameter is < some value.

The null hypothesis is rejected if the test statistic has a value lesser than the critical value.

Left Tail Hypothesis Testing

Two Tailed Hypothesis Testing

In this hypothesis testing method, the critical region lies on both sides of the sampling distribution. It is also known as a non - directional hypothesis testing method. The two-tailed test is used when it needs to be determined if the population parameter is assumed to be different than some value. The hypotheses can be set up as follows:

\(H_{0}\): the population parameter = some value

\(H_{1}\): the population parameter ≠ some value

The null hypothesis is rejected if the test statistic has a value that is not equal to the critical value.

Two Tail Hypothesis Testing

Hypothesis Testing Steps

Hypothesis testing can be easily performed in five simple steps. The most important step is to correctly set up the hypotheses and identify the right method for hypothesis testing. The basic steps to perform hypothesis testing are as follows:

  • Step 1: Set up the null hypothesis by correctly identifying whether it is the left-tailed, right-tailed, or two-tailed hypothesis testing.
  • Step 2: Set up the alternative hypothesis.
  • Step 3: Choose the correct significance level, \(\alpha\), and find the critical value.
  • Step 4: Calculate the correct test statistic (z, t or \(\chi\)) and p-value.
  • Step 5: Compare the test statistic with the critical value or compare the p-value with \(\alpha\) to arrive at a conclusion. In other words, decide if the null hypothesis is to be rejected or not.

Hypothesis Testing Example

The best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs. Using hypothesis testing, check if there is enough evidence to support the researcher's claim. The confidence interval is given as 95%.

Step 1: This is an example of a right-tailed test. Set up the null hypothesis as \(H_{0}\): \(\mu\) = 100.

Step 2: The alternative hypothesis is given by \(H_{1}\): \(\mu\) > 100.

Step 3: As this is a one-tailed test, \(\alpha\) = 100% - 95% = 5%. This can be used to determine the critical value.

1 - \(\alpha\) = 1 - 0.05 = 0.95

0.95 gives the required area under the curve. Now using a normal distribution table, the area 0.95 is at z = 1.645. A similar process can be followed for a t-test. The only additional requirement is to calculate the degrees of freedom given by n - 1.

Step 4: Calculate the z test statistic. This is because the sample size is 30. Furthermore, the sample and population means are known along with the standard deviation.

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).

\(\mu\) = 100, \(\overline{x}\) = 112.5, n = 30, \(\sigma\) = 15

z = \(\frac{112.5-100}{\frac{15}{\sqrt{30}}}\) = 4.56

Step 5: Conclusion. As 4.56 > 1.645 thus, the null hypothesis can be rejected.

Hypothesis Testing and Confidence Intervals

Confidence intervals form an important part of hypothesis testing. This is because the alpha level can be determined from a given confidence interval. Suppose a confidence interval is given as 95%. Subtract the confidence interval from 100%. This gives 100 - 95 = 5% or 0.05. This is the alpha value of a one-tailed hypothesis testing. To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025.

Related Articles:

  • Probability and Statistics
  • Data Handling

Important Notes on Hypothesis Testing

  • Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.
  • It involves the setting up of a null hypothesis and an alternate hypothesis.
  • There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.
  • Hypothesis testing can be classified as right tail, left tail, and two tail tests.

Examples on Hypothesis Testing

  • Example 1: The average weight of a dumbbell in a gym is 90lbs. However, a physical trainer believes that the average weight might be higher. A random sample of 5 dumbbells with an average weight of 110lbs and a standard deviation of 18lbs. Using hypothesis testing check if the physical trainer's claim can be supported for a 95% confidence level. Solution: As the sample size is lesser than 30, the t-test is used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) > 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 5, s = 18. \(\alpha\) = 0.05 Using the t-distribution table, the critical value is 2.132 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = 2.484 As 2.484 > 2.132, the null hypothesis is rejected. Answer: The average weight of the dumbbells may be greater than 90lbs
  • Example 2: The average score on a test is 80 with a standard deviation of 10. With a new teaching curriculum introduced it is believed that this score will change. On random testing, the score of 38 students, the mean was found to be 88. With a 0.05 significance level, is there any evidence to support this claim? Solution: This is an example of two-tail hypothesis testing. The z test will be used. \(H_{0}\): \(\mu\) = 80, \(H_{1}\): \(\mu\) ≠ 80 \(\overline{x}\) = 88, \(\mu\) = 80, n = 36, \(\sigma\) = 10. \(\alpha\) = 0.05 / 2 = 0.025 The critical value using the normal distribution table is 1.96 z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) z = \(\frac{88-80}{\frac{10}{\sqrt{36}}}\) = 4.8 As 4.8 > 1.96, the null hypothesis is rejected. Answer: There is a difference in the scores after the new curriculum was introduced.
  • Example 3: The average score of a class is 90. However, a teacher believes that the average score might be lower. The scores of 6 students were randomly measured. The mean was 82 with a standard deviation of 18. With a 0.05 significance level use hypothesis testing to check if this claim is true. Solution: The t test will be used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) < 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 6, s = 18 The critical value from the t table is -2.015 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = \(\frac{82-90}{\frac{18}{\sqrt{6}}}\) t = -1.088 As -1.088 > -2.015, we fail to reject the null hypothesis. Answer: There is not enough evidence to support the claim.

go to slide go to slide go to slide

hypothesis testing 2 samples

Book a Free Trial Class

FAQs on Hypothesis Testing

What is hypothesis testing.

Hypothesis testing in statistics is a tool that is used to make inferences about the population data. It is also used to check if the results of an experiment are valid.

What is the z Test in Hypothesis Testing?

The z test in hypothesis testing is used to find the z test statistic for normally distributed data . The z test is used when the standard deviation of the population is known and the sample size is greater than or equal to 30.

What is the t Test in Hypothesis Testing?

The t test in hypothesis testing is used when the data follows a student t distribution . It is used when the sample size is less than 30 and standard deviation of the population is not known.

What is the formula for z test in Hypothesis Testing?

The formula for a one sample z test in hypothesis testing is z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) and for two samples is z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

What is the p Value in Hypothesis Testing?

The p value helps to determine if the test results are statistically significant or not. In hypothesis testing, the null hypothesis can either be rejected or not rejected based on the comparison between the p value and the alpha level.

What is One Tail Hypothesis Testing?

When the rejection region is only on one side of the distribution curve then it is known as one tail hypothesis testing. The right tail test and the left tail test are two types of directional hypothesis testing.

What is the Alpha Level in Two Tail Hypothesis Testing?

To get the alpha level in a two tail hypothesis testing divide \(\alpha\) by 2. This is done as there are two rejection regions in the curve.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

10: Hypothesis Testing with Two Samples

  • Last updated
  • Save as PDF
  • Page ID 23470

You have learned to conduct hypothesis tests on single means and single proportions. You will expand upon that in this chapter. You will compare two means or two proportions to each other. The general procedure is still the same, just expanded. To compare two means or two proportions, you work with two groups. The groups are classified either as independent or matched pairs. Independent groups consist of two samples that are independent, that is, sample values selected from one population are not related in any way to sample values selected from the other population. Matched pairs consist of two samples that are dependent. The parameter tested using matched pairs is the population mean. The parameters tested using independent groups are either population means or population proportions.

  • 10.0: Prelude to Hypothesis Testing with Two Samples This chapter deals with the following hypothesis tests: Independent groups (samples are independent) Test of two population means. Test of two population proportions. Matched or paired samples (samples are dependent) Test of the two population proportions by testing one population mean of differences.
  • 10.1: Two Population Means with Unknown Standard Deviations The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples.
  • 10.2: Two Population Means with Known Standard Deviations Even though this situation is not likely (knowing the population standard deviations is not likely), the following example illustrates hypothesis testing for independent means, known population standard deviations.
  • 10.3: Comparing Two Independent Population Proportions Comparing two proportions, like comparing two means, is common. If two estimated proportions are different, it may be due to a difference in the populations or it may be due to chance. A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions.
  • 10.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples. The differences form the sample that is used for the hypothesis test. Either the matched pairs have differences that come from a population that is normal or the number of difference
  • 10.5: Hypothesis Testing for Two Means and Two Proportions (Worksheet) A statistics Worksheet: The student will select the appropriate distributions to use in each case. The student will conduct hypothesis tests and interpret the results.
  • 10.E: Hypothesis Testing with Two Samples (Exercises) These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

Contributors and Attributions

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/[email protected] .

Lecture 14: Hypothesis Test for One Variance

STAT 205: Introduction to Mathematical Statistics

University of British Columbia Okanagan

March 17, 2024

Introduction

We have covered three hypothesis tests for a single sample:

  • Hypothesis test for the mean \(\mu\) with \(\sigma\) known ( \(Z\) - test)
  • Hypothesis tests for the proportion \(p\) ( \(Z\) - test)
  • Hypothesis test for the mean \(\mu\) with \(\sigma\) unknown ( \(t\) -test)

Today we consider hypothesis tests involve the population variance \(\sigma^2\)

hypothesis testing 2 samples

Assumptions: \(X_1, X_2, \dots, X_n\) are i.i.d + assumptions in the rhombuses.

In Lecture 7 we saw how to construct a confidence interval for \(\sigma^2\) based on the sampling distribution derived in Lecture 8 .

For random samples from normal populations , we know:

\[ \dfrac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1} \]

where \(S^2 = \frac{\sum_{i = 1}^n (X_i - \bar{X})}{n-1}\) is the sample variance and \(\chi^2_{n-1}\) is the Chi-squared distribution with \(n-1\) degrees of freedom.

We may which to test if there is evidence to suggest that population variance differs for some hypothesized value \(\sigma_0^2\) .

As before, we start with a null hypothesis ( \(H_0\) ) that the population variance equals a specified value ( \(\sigma^2 = \sigma_0^2\) )

We test this against the alternative hypothesis \(H_A\) which can either be one-sided ( \(\sigma^2 < \sigma_0^2\) or \(\sigma^2 > \sigma_0^2\) ) or two-sided ( \(\sigma^2 \neq \sigma_0^2\) ).

Test Statistic

Recall that our test statistic is calculated assuming the null hypothesis is true . Hence, if we are testing \(H_0: \sigma^2 = \sigma_0^2\) , the test statistic we use is : \[ \chi^2 = \dfrac{(n-1)S^2}{\sigma_0^2} \] where \(\chi^2 \sim \chi^2_{n-1}\) .

Chi-square distrbituion

hypothesis testing 2 samples

Assumptions

For the following inference procedures to be valid we require:

  • A simple random sample from the population
  • A normally distributed population (very important, even for large sample sizes)

It is important to note that if the population is not approximately normally distributed, chi-squared distribution may not accurately represent the sampling distribution of the test statistic.

Critical Region (upper-tailed)

hypothesis testing 2 samples

The rejection region associated with an upper-tailed test for the population variance. Note that the critical value will depend on the chosen significance level ( \(\alpha\) ) and the d.f.

Critical Region (lower-tailed)

hypothesis testing 2 samples

Critical Region (two-tailed)

hypothesis testing 2 samples

Similarly we can find \(p\) -values from Chi-squared tables or R

hypothesis testing 2 samples

\(p\) -value for lower-tailed: \[\Pr(\chi^2 < \chi^2_{\text{obs}})\] \(p\) -value for upper-tailed: \[\Pr(\chi^2 > \chi^2_{\text{obs}})\] \(p\) -value for two-tailed:

\[2\cdot \min \{ \Pr(\chi^2 < \chi^2_{\text{obs}}), \Pr(\chi^2 > \chi^2_{\text{obs}})\}\]

hypothesis testing 2 samples

Exercise 1: Beyond Burger Fat

Beyond Burgers claim to have 18g grams of fat. A random sample of 6 burgers had a mean of 19.45 and a variance of 0.85 grams \(^2\) . Suppose that the quality assurance team at the company will on accept at most a \(\sigma\) of 0.5. Use the 0.05 level of significance to test the null hypotehsis \(\sigma = 0.5\) against the appropriate alternative.

Distribution of Test Statistic

hypothesis testing 2 samples

Under the null hypothesis, the test statistic follows \(\chi^2 = (n-1)S^2/0.5^2\) a chi-square distribution with df = 5

Critical value

hypothesis testing 2 samples

The critical value can be found by determining what value on the chi-square curve with 5 df yield a 5 percent probability in the upper tail (since we are doing an upper-tailed test). In R: qchisq(alpha, df=n-1, lower.tail = FALSE) . Verify using \(\chi^2\) table.

Observed Test Statistic

Compute the observed test statistic which we denote by \(\chi^2_{\text{obs}}\)

hypothesis testing 2 samples

Since the observed test statistic falls in the rejection region, i.e.  \(\chi^2_{\text{obs}} > \chi^2_{\alpha}\) , we rejection the null hypothesis in favour of the alternative.

P-value in R

hypothesis testing 2 samples

Alternatively we could compute the p-value which in this case is 0.013. Since this is smaller than the alpha-level of 0.05, we reject the null hypothesis in favour of the alternative. Verify using \(\chi^2\) table.

P-value from tables

hypothesis testing 2 samples

Using the chi-square distribution table we can see that our observed test statistic falls between two values. We can use the neigbouring values to approximate our p-value.

Approximate P-value

hypothesis testing 2 samples

It is clear from the visualization that \[\begin{align} \Pr(\chi^2_{5} > \chi^2_{0.025}) > \Pr(\chi^2_{5} > \chi^2_{\text{obs}})\\ \Pr(\chi^2_{5} > \chi^2_{\text{obs}}) < \Pr(\chi^2_{5} > \chi^2_{0.01}) \\ \end{align}\]

The \(p\) -value, \(\Pr(\chi^2_{5} > 14.45)\) can then be expressed as: \[\begin{align} 0.01 < p\text{-value } < 0.025 \end{align}\]

  • the \(p\) -value (0.013) is less than \(\alpha\) = 0.05 OR
  • the the observed test statistic ( \(\chi^2_{\text{obs}}\) = 14.45) is larger than the critical value \(\chi^2_{\alpha}\)

we reject the null hypothesis in favour of the alternative. More specifically, there is very strong evidence to suggest that the population variance \(\sigma^2\) is greater than \(0.5^2\) .

https://irene.vrbik.ok.ubc.ca/quarto/stat205/

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

9: Hypothesis Testing with Two Samples

  • Last updated
  • Save as PDF
  • Page ID 125733

You have learned to conduct hypothesis tests on single means and single proportions. You will expand upon that in this chapter. You will compare two means or two proportions to each other. The general procedure is still the same, just expanded. To compare two means or two proportions, you work with two groups. The groups are classified either as independent or matched pairs. Independent groups consist of two samples that are independent, that is, sample values selected from one population are not related in any way to sample values selected from the other population. Matched pairs consist of two samples that are dependent. The parameter tested using matched pairs is the population mean. The parameters tested using independent groups are either population means or population proportions.

  • 9.1: Prelude to Hypothesis Testing with Two Samples This chapter deals with the following hypothesis tests: Independent groups (samples are independent) Test of two population means. Test of two population proportions. Matched or paired samples (samples are dependent) Test of the two population proportions by testing one population mean of differences.
  • 9.2: Comparing Two Independent Population Means (Hypothesis test) The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples.
  • 9.3: Comparing Two Independent Population Proportions (Hyppothesis test) Comparing two proportions, like comparing two means, is common. If two estimated proportions are different, it may be due to a difference in the populations or it may be due to chance. A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions.
  • 9.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples. The differences form the sample that is used for the hypothesis test. Either the matched pairs have differences that come from a population that is normal or the number of difference
  • 9.5: Hypothesis Testing for Two Means and Two Proportions (Worksheet) A statistics Worksheet: The student will select the appropriate distributions to use in each case. The student will conduct hypothesis tests and interpret the results.
  • 9.E: Hypothesis Testing with Two Samples (Exercises) These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

IMAGES

  1. Hypothesis Testing Solved Examples(Questions and Solutions)

    hypothesis testing 2 samples

  2. Ch8: Hypothesis Testing (2 Samples)

    hypothesis testing 2 samples

  3. Chapter 8 Hypothesis Testing with Two Samples LarsonFarber

    hypothesis testing 2 samples

  4. PPT

    hypothesis testing 2 samples

  5. Hypothesis Testing With Two Proportions

    hypothesis testing 2 samples

  6. Ch8: Hypothesis Testing (2 Samples)

    hypothesis testing 2 samples

VIDEO

  1. Advanced Hypothesis Testing 2

  2. Hypothesis Testing #2 (Testing of Proportions)

  3. Two-Sample Hypothesis Testing: Dependent Sample

  4. Hypothesis testing 2 L06

  5. Hypothesis Testing 2

  6. Hypothesis Testing 2.5 Standardising the test statistic

COMMENTS

  1. 10: Hypothesis Testing with Two Samples

    A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. 10.5: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small.

  2. PDF Two Samples Hypothesis Testing

    • In a previous learning module, we discussed how to perform hypothesis tests for a single variable x. • Here, we extend the concept of hypothesis testing to the comparison of two variables x A and x B. Two Samples Hypothesis Testing when n is the same for the two Samples . Two-tailed paired samples hypothesis test: • In engineering ...

  3. Hypothesis Testing

    Step 2: Collect data. For a statistical test to be valid, it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in. Hypothesis testing example.

  4. Hypothesis Testing for 2 Samples: Introduction

    The mean for the last recorded percentage was less than half of the initial score: 30.27 (SD 34.03). The decrease was found to be statistically significant using a paired sample t-test (t = 4.36, 36 df, p < .001).". This is a hypothesis test for matched pairs, sometimes known as 2 means, dependent samples.

  5. Hypothesis Testing: Two Samples

    The Population Mean: This image shows a series of histograms for a large number of sample means taken from a population.Recall that as more sample means are taken, the closer the mean of these means will be to the population mean. In this section, we explore hypothesis testing of two independent population means (and proportions) and also tests for paired samples of population means.

  6. Two-sample hypothesis testing

    In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant . There are a large number of statistical tests that ...

  7. Putting It Together: Hypothesis Testing with Two Samples

    Let's Summarize. The steps for performing a hypothesis test for two population means with unknown standard deviation is generally the same as the steps for conducting a hypothesis test for one population mean with unknown standard deviation, using a t-distribution.; Because the population standard deviations are not known, the sample standard deviations are used for calculations.

  8. Two sample testing

    It is often done using a hypothesis test - hence the name "two sample testing". This is also called A/B testing. The natural hypotheses for this situation are: H 0: the two samples are generated from the same distribution. H A: the two samples are generated from two different distributions. The test statistic is normally based on the ...

  9. Chapter 15 Hypothesis Testing: Two Sample Tests

    15.1.2 Two Sample t test approach. For this we can use the two-sample t-test to compare the means of these two distinct populations. Here the alternative hypothesis is that the lottery players score more points H A: μL > μN L H A: μ L > μ N L thus the null hypothesis is H 0: μL ≤ μN L. H 0: μ L ≤ μ N L. We can now perform the test ...

  10. Hypothesis Testing: 2 Means (Independent Samples)

    Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with averages from two samples or groups (the home run distances), so we will conduct a Test of 2 Means. n1 = 70 n 1 = 70 is the sample size for the first group. n2 = 66 n 2 = 66 is the sample size for the second group.

  11. 10: Hypothesis Testing with Two Samples

    A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. 10.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small.

  12. 8: Hypothesis Testing with Two Samples

    A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. 8.5: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small.

  13. Two Sample t-test: Definition, Formula, and Example

    Fortunately, a two sample t-test allows us to answer this question. Two Sample t-test: Formula. A two-sample t-test always uses the following null hypothesis: H 0: μ 1 = μ 2 (the two population means are equal) The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:

  14. 5.5

    5.5 - Hypothesis Testing for Two-Sample Proportions. We are now going to develop the hypothesis test for the difference of two proportions for independent samples. The hypothesis test follows the same steps as one group. These notes are going to go into a little bit of math and formulas to help demonstrate the logic behind hypothesis testing ...

  15. Hypothesis Testing with Two Samples: Matched or Paired Samples

    The data for the test are the differences: {0.2, -4.1, -1.6, -1.8, -3.2, -2, -2.9, -9.6} The sample mean and sample standard deviation of the differences are: and Verify these values. Let be the population mean for the differences. We use the subscript to denote "differences". Random variable: = the mean difference of the sensory measurements

  16. 9.2: Paired Samples for Two Means

    Examples 9.2.2 and 9.2.4 use the same data set, but one is conducting a hypothesis test and the other is conducting a confidence interval. Notice that the hypothesis test's conclusion was to reject \(H_{o}\) and say that there was a difference in the means, and the confidence interval does not contain the number 0.

  17. Hypothesis Testing: 2 Proportions

    Identify the null and alternative hypotheses. Calculate the test statistic, z = ^p1 − ^p2 √ ¯p × ¯q n1 + ¯p × ¯q n2 z = p 1 ^ − p 2 ^ p ¯ × q ¯ n 1 + p ¯ × q ¯ n 2. Find the critical value (s) OR the p-value OR both. Apply the Decision Rule. Write up a conclusion for the test. Example 1: Race/Name Resume Study [1]

  18. Two Sample t-test Calculator

    If this is not the case, you should instead use the Welch's t-test calculator. To perform a two sample t-test, simply fill in the information below and then click the "Calculate" button. Enter raw data Enter summary data. Sample 1. 301, 298, 295, 297, 304, 305, 309, 298, 291, 299, 293, 304. Sample 2.

  19. Hypothesis Testing

    To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025. Related Articles: Probability and Statistics; Data Handling; Data; Important Notes on Hypothesis Testing. Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.

  20. 10.E: Hypothesis Testing with Two Samples (Exercises)

    Use the following information to answer the next 15 exercises: Indicate if the hypothesis test is for. independent group means, population standard deviations, and/or variances known. independent group means, population standard deviations, and/or variances unknown. matched or paired samples. single mean.

  21. 10: Hypothesis Testing with Two Samples

    10.E: Hypothesis Testing with Two Samples (Exercises) These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax. You have learned to conduct hypothesis tests on single means and single proportions. You will expand upon that in this chapter. You will compare two means or two proportions to each other.

  22. 10: Hypothesis Testing with Two Samples

    A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. 10.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small.

  23. Stat 205

    Under the null hypothesis, the test statistic follows \(\chi^2 = (n-1)S^2/0.5^2\) a chi-square distribution with df = 5. If this alternative hypothesis is true and \(\sigma^2\) squared is greater than the hypothesized value then the sample variance \(S^2\) will have a tendency to be greater than the hypothesized value and the test statistic will tend to be large

  24. 9: Hypothesis Testing with Two Samples

    A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions. 9.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small.