Z-Test for Statistical Hypothesis Testing Explained

hypothesis mean z test

The Z-test is a statistical hypothesis test used to determine where the distribution of the test statistic we are measuring, like the mean , is part of the normal distribution .

There are multiple types of Z-tests, however, we’ll focus on the easiest and most well known one, the one sample mean test. This is used to determine if the difference between the mean of a sample and the mean of a population is statistically significant.

What Is a Z-Test?

A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution.  

The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations’ mean.

Z-tests are the most common statistical tests conducted in fields such as healthcare and data science . Therefore, it’s an essential concept to understand.

Requirements for a Z-Test

In order to conduct a Z-test, your statistics need to meet a few requirements, including:

  • A Sample size that’s greater than 30. This is because we want to ensure our sample mean comes from a distribution that is normal. As stated by the c entral limit theorem , any distribution can be approximated as normally distributed if it contains more than 30 data points.
  • The standard deviation and mean of the population is known .
  • The sample data is collected/acquired randomly .

More on Data Science:   What Is Bootstrapping Statistics?

Z-Test Steps

There are four steps to complete a Z-test. Let’s examine each one.

4 Steps to a Z-Test

  • State the null hypothesis.
  • State the alternate hypothesis.
  • Choose your critical value.
  • Calculate your Z-test statistics. 

1. State the Null Hypothesis

The first step in a Z-test is to state the null hypothesis, H_0 . This what you believe to be true from the population, which could be the mean of the population, μ_0 :

Null hypothesis equation generated in LaTeX.

2. State the Alternate Hypothesis

Next, state the alternate hypothesis, H_1 . This is what you observe from your sample. If the sample mean is different from the population’s mean, then we say the mean is not equal to μ_0:

Alternate hypothesis equation generated in LaTeX.

3. Choose Your Critical Value

Then, choose your critical value, α , which determines whether you accept or reject the null hypothesis. Typically for a Z-test we would use a statistical significance of 5 percent which is z = +/- 1.96 standard deviations from the population’s mean in the normal distribution:

Z-test critical value plot.

This critical value is based on confidence intervals.

4. Calculate Your Z-Test Statistic

Compute the Z-test Statistic using the sample mean, μ_1 , the population mean, μ_0 , the number of data points in the sample, n and the population’s standard deviation, σ :

Z-test statistic equation generated in LaTeX.

If the test statistic is greater (or lower depending on the test we are conducting) than the critical value, then the alternate hypothesis is true because the sample’s mean is statistically significant enough from the population mean.

Another way to think about this is if the sample mean is so far away from the population mean, the alternate hypothesis has to be true or the sample is a complete anomaly.

More on Data Science: Basic Probability Theory and Statistics Terms to Know

Z-Test Example

Let’s go through an example to fully understand the one-sample mean Z-test.

A school says that its pupils are, on average, smarter than other schools. It takes a sample of 50 students whose average IQ measures to be 110. The population, or the rest of the schools, has an average IQ of 100 and standard deviation of 20. Is the school’s claim correct?

The null and alternate hypotheses are:

Null hypothesis and alternate hypothesis generated in LaTeX.

Where we are saying that our sample, the school, has a higher mean IQ than the population mean.

Now, this is what’s called a right-sided, one-tailed test as our sample mean is greater than the population’s mean. So, choosing a critical value of 5 percent, which equals a Z-score of 1.96 , we can only reject the null hypothesis if our Z-test statistic is greater than 1.96.

If the school claimed its students’ IQs were an average of 90, then we would use a left-tailed test, as shown in the figure above. We would then only reject the null hypothesis if our Z-test statistic is less than -1.96.

Computing our Z-test statistic, we see:

Z-test statistic equation generated in LaTeX.

Therefore, we have sufficient evidence to reject the null hypothesis, and the school’s claim is right.

Hope you enjoyed this article on Z-tests. In this post, we only addressed the most simple case, the one-sample mean test. However, there are other types of tests, but they all follow the same process just with some small nuances.  

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Great Companies Need Great People. That's Where We Come In.

Z-test Calculator

What is a z-test, when do i use z-tests, z-test formula, p-value from z-test, two-tailed z-test and one-tailed z-test, z-test critical values & critical regions, how to use the one-sample z-test calculator, z-test examples.

This Z-test calculator is a tool that helps you perform a one-sample Z-test on the population's mean . Two forms of this test - a two-tailed Z-test and a one-tailed Z-tests - exist, and can be used depending on your needs. You can also choose whether the calculator should determine the p-value from Z-test or you'd rather use the critical value approach!

Read on to learn more about Z-test in statistics, and, in particular, when to use Z-tests, what is the Z-test formula, and whether to use Z-test vs. t-test. As a bonus, we give some step-by-step examples of how to perform Z-tests!

Or you may also check our t-statistic calculator , where you can learn the concept of another essential statistic. If you are also interested in F-test, check our F-statistic calculator .

A one sample Z-test is one of the most popular location tests. The null hypothesis is that the population mean value is equal to a given number, μ 0 \mu_0 μ 0 ​ :

We perform a two-tailed Z-test if we want to test whether the population mean is not μ 0 \mu_0 μ 0 ​ :

and a one-tailed Z-test if we want to test whether the population mean is less/greater than μ 0 \mu_0 μ 0 ​ :

Let us now discuss the assumptions of a one-sample Z-test.

You may use a Z-test if your sample consists of independent data points and:

the data is normally distributed , and you know the population variance ;

the sample is large , and data follows a distribution which has a finite mean and variance. You don't need to know the population variance.

The reason these two possibilities exist is that we want the test statistics that follow the standard normal distribution N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) . In the former case, it is an exact standard normal distribution, while in the latter, it is approximately so, thanks to the central limit theorem.

The question remains, "When is my sample considered large?" Well, there's no universal criterion. In general, the more data points you have, the better the approximation works. Statistics textbooks recommend having no fewer than 50 data points, while 30 is considered the bare minimum.

Let x 1 , . . . , x n x_1, ..., x_n x 1 ​ , ... , x n ​ be an independent sample following the normal distribution N ( μ , σ 2 ) \mathrm N(\mu, \sigma^2) N ( μ , σ 2 ) , i.e., with a mean equal to μ \mu μ , and variance equal to σ 2 \sigma ^2 σ 2 .

We pose the null hypothesis, H 0  ⁣  ⁣ :  ⁣  ⁣   μ = μ 0 \mathrm H_0 \!\!:\!\! \mu = \mu_0 H 0 ​ :   μ = μ 0 ​ .

We define the test statistic, Z , as:

x ˉ \bar x x ˉ is the sample mean, i.e., x ˉ = ( x 1 + . . . + x n ) / n \bar x = (x_1 + ... + x_n) / n x ˉ = ( x 1 ​ + ... + x n ​ ) / n ;

μ 0 \mu_0 μ 0 ​ is the mean postulated in H 0 \mathrm H_0 H 0 ​ ;

n n n is sample size; and

σ \sigma σ is the population standard deviation.

In what follows, the uppercase Z Z Z stands for the test statistic (treated as a random variable), while the lowercase z z z will denote an actual value of Z Z Z , computed for a given sample drawn from N(μ,σ²).

If H 0 \mathrm H_0 H 0 ​ holds, then the sum S n = x 1 + . . . + x n S_n = x_1 + ... + x_n S n ​ = x 1 ​ + ... + x n ​ follows the normal distribution, with mean n μ 0 n \mu_0 n μ 0 ​ and variance n 2 σ n^2 \sigma n 2 σ . As Z Z Z is the standardization (z-score) of S n / n S_n/n S n ​ / n , we can conclude that the test statistic Z Z Z follows the standard normal distribution N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , provided that H 0 \mathrm H_0 H 0 ​ is true. By the way, we have the z-score calculator if you want to focus on this value alone.

If our data does not follow a normal distribution, or if the population standard deviation is unknown (and thus in the formula for Z Z Z we substitute the population standard deviation σ \sigma σ with sample standard deviation), then the test statistics Z Z Z is not necessarily normal. However, if the sample is sufficiently large, then the central limit theorem guarantees that Z Z Z is approximately N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) .

In the sections below, we will explain to you how to use the value of the test statistic, z z z , to make a decision , whether or not you should reject the null hypothesis . Two approaches can be used in order to arrive at that decision: the p-value approach, and critical value approach - and we cover both of them! Which one should you use? In the past, the critical value approach was more popular because it was difficult to calculate p-value from Z-test. However, with help of modern computers, we can do it fairly easily, and with decent precision. In general, you are strongly advised to report the p-value of your tests!

Formally, the p-value is the smallest level of significance at which the null hypothesis could be rejected. More intuitively, p-value answers the questions: provided that I live in a world where the null hypothesis holds, how probable is it that the value of the test statistic will be at least as extreme as the z z z - value I've got for my sample? Hence, a small p-value means that your result is very improbable under the null hypothesis, and so there is strong evidence against the null hypothesis - the smaller the p-value, the stronger the evidence.

To find the p-value, you have to calculate the probability that the test statistic, Z Z Z , is at least as extreme as the value we've actually observed, z z z , provided that the null hypothesis is true. (The probability of an event calculated under the assumption that H 0 \mathrm H_0 H 0 ​ is true will be denoted as P r ( event ∣ H 0 ) \small \mathrm{Pr}(\text{event} | \mathrm{H_0}) Pr ( event ∣ H 0 ​ ) .) It is the alternative hypothesis which determines what more extreme means :

  • Two-tailed Z-test: extreme values are those whose absolute value exceeds ∣ z ∣ |z| ∣ z ∣ , so those smaller than − ∣ z ∣ -|z| − ∣ z ∣ or greater than ∣ z ∣ |z| ∣ z ∣ . Therefore, we have:

The symmetry of the normal distribution gives:

  • Left-tailed Z-test: extreme values are those smaller than z z z , so
  • Right-tailed Z-test: extreme values are those greater than z z z , so

To compute these probabilities, we can use the cumulative distribution function, (cdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , which for a real number, x x x , is defined as:

Also, p-values can be nicely depicted as the area under the probability density function (pdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , due to:

With all the knowledge you've got from the previous section, you're ready to learn about Z-tests.

  • Two-tailed Z-test:

From the fact that Φ ( − z ) = 1 − Φ ( z ) \Phi(-z) = 1 - \Phi(z) Φ ( − z ) = 1 − Φ ( z ) , we deduce that

The p-value is the area under the probability distribution function (pdf) both to the left of − ∣ z ∣ -|z| − ∣ z ∣ , and to the right of ∣ z ∣ |z| ∣ z ∣ :

two-tailed p value

  • Left-tailed Z-test:

The p-value is the area under the pdf to the left of our z z z :

left-tailed p value

  • Right-tailed Z-test:

The p-value is the area under the pdf to the right of z z z :

right-tailed p value

The decision as to whether or not you should reject the null hypothesis can be now made at any significance level, α \alpha α , you desire!

if the p-value is less than, or equal to, α \alpha α , the null hypothesis is rejected at this significance level; and

if the p-value is greater than α \alpha α , then there is not enough evidence to reject the null hypothesis at this significance level.

The critical value approach involves comparing the value of the test statistic obtained for our sample, z z z , to the so-called critical values . These values constitute the boundaries of regions where the test statistic is highly improbable to lie . Those regions are often referred to as the critical regions , or rejection regions . The decision of whether or not you should reject the null hypothesis is then based on whether or not our z z z belongs to the critical region.

The critical regions depend on a significance level, α \alpha α , of the test, and on the alternative hypothesis. The choice of α \alpha α is arbitrary; in practice, the values of 0.1, 0.05, or 0.01 are most commonly used as α \alpha α .

Once we agree on the value of α \alpha α , we can easily determine the critical regions of the Z-test:

To decide the fate of H 0 \mathrm H_0 H 0 ​ , check whether or not your z z z falls in the critical region:

If yes, then reject H 0 \mathrm H_0 H 0 ​ and accept H 1 \mathrm H_1 H 1 ​ ; and

If no, then there is not enough evidence to reject H 0 \mathrm H_0 H 0 ​ .

As you see, the formulae for the critical values of Z-tests involve the inverse, Φ − 1 \Phi^{-1} Φ − 1 , of the cumulative distribution function (cdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) .

Our calculator reduces all the complicated steps:

Choose the alternative hypothesis: two-tailed or left/right-tailed.

In our Z-test calculator, you can decide whether to use the p-value or critical regions approach. In the latter case, set the significance level, α \alpha α .

Enter the value of the test statistic, z z z . If you don't know it, then you can enter some data that will allow us to calculate your z z z for you:

  • sample mean x ˉ \bar x x ˉ (If you have raw data, go to the average calculator to determine the mean);
  • tested mean μ 0 \mu_0 μ 0 ​ ;
  • sample size n n n ; and
  • population standard deviation σ \sigma σ (or sample standard deviation if your sample is large).

Results appear immediately below the calculator.

If you want to find z z z based on p-value , please remember that in the case of two-tailed tests there are two possible values of z z z : one positive and one negative, and they are opposite numbers. This Z-test calculator returns the positive value in such a case. In order to find the other possible value of z z z for a given p-value, just take the number opposite to the value of z z z displayed by the calculator.

To make sure that you've fully understood the essence of Z-test, let's go through some examples:

  • A bottle filling machine follows a normal distribution. Its standard deviation, as declared by the manufacturer, is equal to 30 ml. A juice seller claims that the volume poured in each bottle is, on average, one liter, i.e., 1000 ml, but we suspect that in fact the average volume is smaller than that...

Formally, the hypotheses that we set are the following:

H 0  ⁣ :   μ = 1000  ml \mathrm H_0 \! : \mu = 1000 \text{ ml} H 0 ​ :   μ = 1000  ml

H 1  ⁣ :   μ < 1000  ml \mathrm H_1 \! : \mu \lt 1000 \text{ ml} H 1 ​ :   μ < 1000  ml

We went to a shop and bought a sample of 9 bottles. After carefully measuring the volume of juice in each bottle, we've obtained the following sample (in milliliters):

1020 , 970 , 1000 , 980 , 1010 , 930 , 950 , 980 , 980 \small 1020, 970, 1000, 980, 1010, 930, 950, 980, 980 1020 , 970 , 1000 , 980 , 1010 , 930 , 950 , 980 , 980 .

Sample size: n = 9 n = 9 n = 9 ;

Sample mean: x ˉ = 980   m l \bar x = 980 \ \mathrm{ml} x ˉ = 980   ml ;

Population standard deviation: σ = 30   m l \sigma = 30 \ \mathrm{ml} σ = 30   ml ;

And, therefore, p-value = Φ ( − 2 ) ≈ 0.0228 \text{p-value} = \Phi(-2) \approx 0.0228 p-value = Φ ( − 2 ) ≈ 0.0228 .

As 0.0228 < 0.05 0.0228 \lt 0.05 0.0228 < 0.05 , we conclude that our suspicions aren't groundless; at the most common significance level, 0.05, we would reject the producer's claim, H 0 \mathrm H_0 H 0 ​ , and accept the alternative hypothesis, H 1 \mathrm H_1 H 1 ​ .

We tossed a coin 50 times. We got 20 tails and 30 heads. Is there sufficient evidence to claim that the coin is biased?

Clearly, our data follows Bernoulli distribution, with some success probability p p p and variance σ 2 = p ( 1 − p ) \sigma^2 = p (1-p) σ 2 = p ( 1 − p ) . However, the sample is large, so we can safely perform a Z-test. We adopt the convention that getting tails is a success.

Let us state the null and alternative hypotheses:

H 0  ⁣ :   p = 0.5 \mathrm H_0 \! : p = 0.5 H 0 ​ :   p = 0.5 (the coin is fair - the probability of tails is 0.5 0.5 0.5 )

H 1  ⁣ :   p ≠ 0.5 \mathrm H_1 \! : p \ne 0.5 H 1 ​ :   p  = 0.5 (the coin is biased - the probability of tails differs from 0.5 0.5 0.5 )

In our sample we have 20 successes (denoted by ones) and 30 failures (denoted by zeros), so:

Sample size n = 50 n = 50 n = 50 ;

Sample mean x ˉ = 20 / 50 = 0.4 \bar x = 20/50 = 0.4 x ˉ = 20/50 = 0.4 ;

Population standard deviation is given by σ = 0.5 × 0.5 \sigma = \sqrt{0.5 \times 0.5} σ = 0.5 × 0.5 ​ (because 0.5 0.5 0.5 is the proportion p p p hypothesized in H 0 \mathrm H_0 H 0 ​ ). Hence, σ = 0.5 \sigma = 0.5 σ = 0.5 ;

  • And, therefore

Since 0.1573 > 0.1 0.1573 \gt 0.1 0.1573 > 0.1 we don't have enough evidence to reject the claim that the coin is fair , even at such a large significance level as 0.1 0.1 0.1 . In that case, you may safely toss it to your Witcher or use the coin flip probability calculator to find your chances of getting, e.g., 10 heads in a row (which are extremely low!).

What is the difference between Z-test vs t-test?

We use a t-test for testing the population mean of a normally distributed dataset which had an unknown population standard deviation . We get this by replacing the population standard deviation in the Z-test statistic formula by the sample standard deviation, which means that this new test statistic follows (provided that H₀ holds) the t-Student distribution with n-1 degrees of freedom instead of N(0,1) .

When should I use t-test over the Z-test?

For large samples, the t-Student distribution with n degrees of freedom approaches the N(0,1). Hence, as long as there are a sufficient number of data points (at least 30), it does not really matter whether you use the Z-test or the t-test, since the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test instead of Z-test .

How do I calculate the Z test statistic?

To calculate the Z test statistic:

  • Compute the arithmetic mean of your sample .
  • From this mean subtract the mean postulated in null hypothesis .
  • Multiply by the square root of size sample .
  • Divide by the population standard deviation .
  • That's it, you've just computed the Z test statistic!

Helium balloons

Mean median mode, percentile rank.

  • Biology (99)
  • Chemistry (98)
  • Construction (144)
  • Conversion (292)
  • Ecology (30)
  • Everyday life (261)
  • Finance (569)
  • Health (440)
  • Physics (508)
  • Sports (104)
  • Statistics (182)
  • Other (181)
  • Discover Omni (40)

Z test is a statistical test that is conducted on data that approximately follows a normal distribution. The z test can be performed on one sample, two samples, or on proportions for hypothesis testing. It checks if the means of two large samples are different or not when the population variance is known.

A z test can further be classified into left-tailed, right-tailed, and two-tailed hypothesis tests depending upon the parameters of the data. In this article, we will learn more about the z test, its formula, the z test statistic, and how to perform the test for different types of data using examples.

What is Z Test?

A z test is a test that is used to check if the means of two populations are different or not provided the data follows a normal distribution. For this purpose, the null hypothesis and the alternative hypothesis must be set up and the value of the z test statistic must be calculated. The decision criterion is based on the z critical value.

Z Test Definition

A z test is conducted on a population that follows a normal distribution with independent data points and has a sample size that is greater than or equal to 30. It is used to check whether the means of two populations are equal to each other when the population variance is known. The null hypothesis of a z test can be rejected if the z test statistic is statistically significant when compared with the critical value.

Z Test Formula

The z test formula compares the z statistic with the z critical value to test whether there is a difference in the means of two populations. In hypothesis testing , the z critical value divides the distribution graph into the acceptance and the rejection regions. If the test statistic falls in the rejection region then the null hypothesis can be rejected otherwise it cannot be rejected. The z test formula to set up the required hypothesis tests for a one sample and a two-sample z test are given below.

One-Sample Z Test

A one-sample z test is used to check if there is a difference between the sample mean and the population mean when the population standard deviation is known. The formula for the z test statistic is given as follows:

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\overline{x}\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation and n is the sample size.

The algorithm to set a one sample z test based on the z test statistic is given as follows:

Left Tailed Test:

Null Hypothesis: \(H_{0}\) : \(\mu = \mu_{0}\)

Alternate Hypothesis: \(H_{1}\) : \(\mu < \mu_{0}\)

Decision Criteria: If the z statistic < z critical value then reject the null hypothesis.

Right Tailed Test:

Alternate Hypothesis: \(H_{1}\) : \(\mu > \mu_{0}\)

Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.

Two Tailed Test:

Alternate Hypothesis: \(H_{1}\) : \(\mu \neq \mu_{0}\)

Two Sample Z Test

A two sample z test is used to check if there is a difference between the means of two samples. The z test statistic formula is given as follows:

z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\). \(\overline{x_{1}}\), \(\mu_{1}\), \(\sigma_{1}^{2}\) are the sample mean, population mean and population variance respectively for the first sample. \(\overline{x_{2}}\), \(\mu_{2}\), \(\sigma_{2}^{2}\) are the sample mean, population mean and population variance respectively for the second sample.

The two-sample z test can be set up in the same way as the one-sample test. However, this test will be used to compare the means of the two samples. For example, the null hypothesis is given as \(H_{0}\) : \(\mu_{1} = \mu_{2}\).

z test

Z Test for Proportions

A z test for proportions is used to check the difference in proportions. A z test can either be used for one proportion or two proportions. The formulas are given as follows.

One Proportion Z Test

A one proportion z test is used when there are two groups and compares the value of an observed proportion to a theoretical one. The z test statistic for a one proportion z test is given as follows:

z = \(\frac{p-p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}\). Here, p is the observed value of the proportion, \(p_{0}\) is the theoretical proportion value and n is the sample size.

The null hypothesis is that the two proportions are the same while the alternative hypothesis is that they are not the same.

Two Proportion Z Test

A two proportion z test is conducted on two proportions to check if they are the same or not. The test statistic formula is given as follows:

z =\(\frac{p_{1}-p_{2}-0}{\sqrt{p(1-p)\left ( \frac{1}{n_{1}} +\frac{1}{n_{2}}\right )}}\)

where p = \(\frac{x_{1}+x_{2}}{n_{1}+n_{2}}\)

\(p_{1}\) is the proportion of sample 1 with sample size \(n_{1}\) and \(x_{1}\) number of trials.

\(p_{2}\) is the proportion of sample 2 with sample size \(n_{2}\) and \(x_{2}\) number of trials.

How to Calculate Z Test Statistic?

The most important step in calculating the z test statistic is to interpret the problem correctly. It is necessary to determine which tailed test needs to be conducted and what type of test does the z statistic belong to. Suppose a teacher claims that his section's students will score higher than his colleague's section. The mean score is 22.1 for 60 students belonging to his section with a standard deviation of 4.8. For his colleague's section, the mean score is 18.8 for 40 students and the standard deviation is 8.1. Test his claim at \(\alpha\) = 0.05. The steps to calculate the z test statistic are as follows:

  • Identify the type of test. In this example, the means of two populations have to be compared in one direction thus, the test is a right-tailed two-sample z test.
  • Set up the hypotheses. \(H_{0}\): \(\mu_{1} = \mu_{2}\), \(H_{1}\): \(\mu_{1} > \mu_{2}\).
  • Find the critical value at the given alpha level using the z table. The critical value is 1.645.
  • Determine the z test statistic using the appropriate formula. This is given by z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\). Substitute values in this equation. \(\overline{x_{1}}\) = 22.1, \(\sigma_{1}\) = 4.8, \(n_{1}\) = 60, \(\overline{x_{2}}\) = 18.8, \(\sigma_{2}\) = 8.1, \(n_{2}\) = 40 and \(\mu_{1} - \mu_{2} = 0\). Thus, z = 2.32
  • Compare the critical value and test statistic to arrive at a conclusion. As 2.32 > 1.645 thus, the null hypothesis can be rejected. It can be concluded that there is enough evidence to support the teacher's claim that the scores of students are better in his class.

Z Test vs T-Test

Both z test and t-test are univariate tests used on the means of two datasets. The differences between both tests are outlined in the table given below:

Related Articles:

  • Probability and Statistics
  • Data Handling
  • Summary Statistics

Important Notes on Z Test

  • Z test is a statistical test that is conducted on normally distributed data to check if there is a difference in means of two data sets.
  • The sample size should be greater than 30 and the population variance must be known to perform a z test.
  • The one-sample z test checks if there is a difference in the sample and population mean,
  • The two sample z test checks if the means of two different groups are equal.

Examples on Z Test

Example 1: A teacher claims that the mean score of students in his class is greater than 82 with a standard deviation of 20. If a sample of 81 students was selected with a mean score of 90 then check if there is enough evidence to support this claim at a 0.05 significance level.

Solution: As the sample size is 81 and population standard deviation is known, this is an example of a right-tailed one-sample z test.

\(H_{0}\) : \(\mu = 82\)

\(H_{1}\) : \(\mu > 82\)

From the z table the critical value at \(\alpha\) = 1.645

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\)

\(\overline{x}\) = 90, \(\mu\) = 82, n = 81, \(\sigma\) = 20

As 3.6 > 1.645 thus, the null hypothesis is rejected and it is concluded that there is enough evidence to support the teacher's claim.

Answer: Reject the null hypothesis

Example 2: An online medicine shop claims that the mean delivery time for medicines is less than 120 minutes with a standard deviation of 30 minutes. Is there enough evidence to support this claim at a 0.05 significance level if 49 orders were examined with a mean of 100 minutes?

Solution: As the sample size is 49 and population standard deviation is known, this is an example of a left-tailed one-sample z test.

\(H_{0}\) : \(\mu = 120\)

\(H_{1}\) : \(\mu < 120\)

From the z table the critical value at \(\alpha\) = -1.645. A negative sign is used as this is a left tailed test.

\(\overline{x}\) = 100, \(\mu\) = 120, n = 49, \(\sigma\) = 30

As -4.66 < -1.645 thus, the null hypothesis is rejected and it is concluded that there is enough evidence to support the medicine shop's claim.

Example 3: A company wants to improve the quality of products by reducing defects and monitoring the efficiency of assembly lines. In assembly line A, there were 18 defects reported out of 200 samples while in line B, 25 defects out of 600 samples were noted. Is there a difference in the procedures at a 0.05 alpha level?

Solution: This is an example of a two-tailed two proportion z test.

\(H_{0}\): The two proportions are the same.

\(H_{1}\): The two proportions are not the same.

As this is a two-tailed test the alpha level needs to be divided by 2 to get 0.025.

Using this, the critical value from the z table is 1.96.

\(n_{1}\) = 200, \(n_{2}\) = 600

\(p_{1}\) = 18 / 200 = 0.09

\(p_{2}\) = 25 / 600 = 0.0416

p = (18 + 25) / (200 + 600) = 0.0537

z =\(\frac{p_{1}-p_{2}-0}{\sqrt{p(1-p)\left ( \frac{1}{n_{1}} +\frac{1}{n_{2}}\right )}}\) = 2.62

As 2.62 > 1.96 thus, the null hypothesis is rejected and it is concluded that there is a significant difference between the two lines.

go to slide go to slide go to slide

hypothesis mean z test

Book a Free Trial Class

FAQs on Z Test

What is a z test in statistics.

A z test in statistics is conducted on data that is normally distributed to test if the means of two datasets are equal. It can be performed when the sample size is greater than 30 and the population variance is known.

What is a One-Sample Z Test?

A one-sample z test is used when the population standard deviation is known, to compare the sample mean and the population mean. The z test statistic is given by the formula \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).

What is the Two-Sample Z Test Formula?

The two sample z test is used when the means of two populations have to be compared. The z test formula is given as \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

What is a One Proportion Z test?

A one proportion z test is used to check if the value of the observed proportion is different from the value of the theoretical proportion. The z statistic is given by \(\frac{p-p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}\).

What is a Two Proportion Z Test?

When the proportions of two samples have to be compared then the two proportion z test is used. The formula is given by \(\frac{p_{1}-p_{2}-0}{\sqrt{p(1-p)\left ( \frac{1}{n_{1}} +\frac{1}{n_{2}}\right )}}\).

How Do You Find the Z Test?

The steps to perform the z test are as follows:

  • Set up the null and alternative hypotheses.
  • Find the critical value using the alpha level and z table.
  • Calculate the z statistic.
  • Compare the critical value and the test statistic to decide whether to reject or not to reject the null hypothesis.

What is the Difference Between the Z Test and the T-Test?

A z test is used on large samples n ≥ 30 and normally distributed data while a t-test is used on small samples (n < 30) following a student t distribution . Both tests are used to check if the means of two datasets are the same.

Introduction to Statistics and Data Analysis

6 hypothesis testing: the z-test.

We’ve all had the experience of standing at a crosswalk waiting staring at a pedestrian traffic light showing the little red man. You’re waiting for the little green man so you can cross. After a little while you’re still waiting and there aren’t any cars around. You might think ‘this light is really taking a long time’, but you continue waiting. Minutes pass and there’s still no little green man. At some point you come to the conclusion that the light is broken and you’ll never see that little green man. You cross on the little red man when it’s clear.

You may not have known this but you just conducted a hypothesis test. When you arrived at the crosswalk, you assumed that the light was functioning properly, although you will always entertain the possibility that it’s broken. In terms of hypothesis testing, your ‘null hypothesis’ is that the light is working and your ‘alternative hypothesis’ is that it’s broken. As time passes, it seems less and less likely that light is working properly. Eventually, the probability of the light working given how long you’ve been waiting becomes so low that you reject the null hypothesis in favor of the alternative hypothesis.

This sort of reasoning is the backbone of hypothesis testing and inferential statistics. It’s also the point in the course where we turn the corner from descriptive statistics to inferential statistics. Rather than describing our data in terms of means and plots, we will now start using our data to make inferences, or generalizations, about the population that our samples are drawn from. In this course we’ll focus on standard hypothesis testing where we set up a null hypothesis and determine the probability of our observed data under the assumption that the null hypothesis is true (the much maligned p-value). If this probability is small enough, then we conclude that our data suggests that the null hypothesis is false, so we reject it.

In this chapter, we’ll introduce hypothesis testing with examples from a ‘z-test’, when we’re comparing a single mean to what we’d expect from a population with known mean and standard deviation. In this case, we can convert our observed mean into a z-score for the standard normal distribution. Hence the name z-test.

It’s time to introduce the hypothesis test flow chart . It’s pretty self explanatory, even if you’re not familiar with all of these hypothesis tests. The z-test is (1) based on means, (2) with only one mean, and (3) where we know \(\sigma\) , the standard deviation of the population. Here’s how to find the z-test in the flow chart:

hypothesis mean z test

6.1 Women’s height example

Let’s work with the example from the end of the last chapter where we started with the fact that the heights of US women has a mean of 63 and a standard deviation of 2.5 inches. We calculated that the average height of the 122 women in Psych 315 is 64.7 inches. We then used the central limit theorem and calculated the probability of a random sample 122 heights from this population having a mean of 64.7 or greater is 2.4868996^{-14}. This is a very, very small number.

Here’s how we do it using R:

Let’s think of our sample as a random sample of UW psychology students, which is a reasonable assumption since all psychology students have to take a statistics class. What does this sample say about the psychology students that are women at UW compared to the US population? It could be that these psychology students at UW have the same mean and standard deviation as the US population, but our sample just happens to have an unusual number of tall women, but we calculated that the probability of this happening is really low. Instead, it makes more sense that the population that we’re drawing from has a mean that’s greater than the US population mean. Notice that we’re making a conclusion about the whole population of women psychology students based on our one sample.

Using the terminology of hypothesis testing, we first assumed the null hypothesis that UW women psych students have the same mean (and standard deviation) as the US population. The null hypothesis is written as:

\[ H_{0}: \mu = 63 \] In this example, our alternative hypothesis is that the mean of our population is larger than the mean of null hypothesis population. We write this as:

\[ H_{A}: \mu > 63 \]

Next, after obtaining a random sample and calculate the mean, we calculate the probability of drawing a mean this large (or larger) from the null hypothesis distribution.

If this probability is low enough, we reject the null hypothesis in favor of the alternative hypothesis. When our probability allows us to reject the null hypothesis, we say that our observed results are ‘statistically significant’.

In statistics terms, we never say we ‘accept that alternative hypothesis’ as true. All we can say is that we don’t think the null hypothesis is true. I know it’s subtle, but in science can never prove that a hypothesis is true or not. There’s always the possibility that we just happened to grab an unusual sample from the null hypothesis distribution.

6.2 The hated p<.05

The probability that we obtain our observed mean or greater given that the null hypothesis is true is called the p-value. How improbable is improbable enough to reject the null hypothesis? The p-value for our example above on women’s heights is astronomically low, so it’s clear that we should reject \(H_{0}\) .

The p-value that’s on the border of rejection is called the alpha ( \(\alpha\) ) value. We reject \(H_{0}\) when our p-value is less than \(\alpha\) .

You probably know that the most common value of alpha is \(\alpha = .05\) .

The first publication of this value dates back to Sir Ronald Fisher, in his seminal 1925 book Statistical Methods for Research Workers where he states:

“It is convenient to take this point as a limit in judging whether a deviation is considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant.” (p. 47)

If you read the chapter on the normal distribution, then you should know that 95% of the area under the normal distribution lies within \(\pm\) two standard deviations of the mean. So the probability of obtaining a sample that exceeds two standard deviations from the mean (in either direction) is .05.

6.3 IQ example

Let’s do an example using IQ scores. IQ scores are normalized to have a mean of 100 and a standard deviation of 15 points. Because they’re normalized, they are a rare example of a population which has a known mean and standard deviation. In the next chapter we’ll discuss the t-test, which is used in the more common situation when we don’t know the population standard deviation.

Suppose you have the suspicion that graduate students have higher IQ’s than the general population. You have enough time to go and measure the IQ’s of 25 randomly sampled grad students and obtain a mean of 105. Is this difference between our this observed mean and 100 statistically significant using an alpha value of \(\alpha = 0.05\) ?

Here the null hypothesis is:

\[ H_{0}: \mu = 100\]

And the alternative hypothesis is:

\[ H_{A}: \mu > 100 \]

We know that the parameters for the null hypothesis are:

\[ \mu = 100 \] and \[ \sigma = 15 \]

From this, we can calculate the probability of observing our mean of 105 or higher using the central limit theorem and what we know about the normal distribution:

\[ \sigma_{\bar{x}} = \frac{\sigma_{x}}{\sqrt{n}} = \frac{15}{\sqrt{25}} = 3 \] From this, we can calculate the probability of our observed mean using R’s ‘pnorm’ function. Here’s how to do the whole thing in R.

Since our p-value of 0.0478 is (just barely) less than our chosen value of \(\alpha = 0.05\) as our criterion, we reject \(H_{0}\) for this (contrived) example and conclude that our observed mean of 105 is significantly greater than 100, so our study suggests that the average graduate student has a higher IQ than the overall population.

You should feel uncomfortable making such a hard, binary decision for such a borderline case. After all, if we had chosen our second favorite value of alpha, \(\alpha = .01\) , we would have failed to reject \(H_{0}\) . This discomfort is a primary reason why statisticians are moving away from this discrete decision making process. Later on we’ll discuss where things are going, including reporting effect sizes, and using confidence intervals.

6.4 Alpha values vs. critical values

Using R’s qnorm function, we can find the z-score for which only 5% of the area lies above:

So the probability of a randomly sampled z-score exceeding 1.644854 is less than 5%. It follows that if we convert our observed mean into z-score values, we will reject \(H_{0}\) if and only if our z-score is greater than 1.644854. This value is called the ‘critical value’ because it lies on the boundary between rejecting and failing to reject \(H_{0}\) .

In our last example, the z-score for our observed mean is:

\[ z = \frac{X-\mu}{\frac{\sigma}{\sqrt{n}}} = \frac{105 - 100}{3} = 1.67 \] Our z-score is just barely greater than the critical value of 1.644854, which makes sense because our p-value is just barely less than 0.05.

Sometimes you’ll see textbooks will compare critical values to observed scores for the decision making process in hypothesis testing. This dates back to days were computers were less available and we had to rely on tables instead. There wasn’t enough space in a book to hold complete tables which prohibited the ability to look up a p-value for any observed value. Instead only critical values for specific values of alpha were included. If you look at really old papers, you’ll see statistics reported as \(p<.05\) or \(p<.01\) instead of actual p-values for this reason.

It may help to visualize the relationship between p-values, alpha values and critical values like this:

hypothesis mean z test

The red shaded region is the upper 5% of the standard normal distribution which starts at the critical value of z=1.644854. This is sometimes called the ‘rejection region’. The blue vertical line is drawn at our observed value of z=1.67. You can see that the red line falls just inside the rejection region, so we Reject \(H_{0}\) !

6.5 One vs. two-tailed tests

Recall that our alternative hypothesis was to reject if our mean IQ was significantly greater than the null hypothesis mean: \(H_{A}: \mu > 100\) . This implies that the situation where \(\mu < 100\) is never even in consideration, which is weird. In science, we’re trying to understand the true state of the world. Although we have a hunch that grad student IQ’s are higher than average, there is always the possibility that they are lower than average. If our sample came up with an IQ well below 100, we’d simply fail to reject \(H_{0}\) and move on. This feels like throwing out important information.

The test we just ran is called a ‘one-tailed’ test because we only reject \(H_{0}\) if our results fall in one of the two tails of the population distribution.

Instead, it might make more sense to reject \(H_{0}\) if we get either an unusually large or small score. This means we need two critical values - one above and one below zero. At first thought you might think we just duplicate our critical value from a one-tailed test to the other side. But will double the area of the rejection region. That’s not a good thing because if \(H_{0}\) is true, there’s actually a \(2\alpha\) probability that we’ll draw a score in the rejection region.

Instead, we divide the area into two tails, each containing an area of \(\frac{\alpha}{2}\) . For \(\alpha\) = 0.05, we can find the critical value of z with qnorm:

So with a two-tailed test at \(\alpha = 0.05\) we reject \(H_{0}\) if our observed z-score is either above z = 1.96 or less than -1.96. This is that value around 2 that Sir Ronald Fischer was talking about!

Here’s what the critical regions and observed value of z looks like for our example with a two-tailed test:

hypothesis mean z test

You can see that splitting the area of \(\alpha = 0.05\) into two halves increased the critical value in the positive direction from 1.64 to 1.96, making it harder to reject \(H_{0}\) . For our example, this changes our decision: our observed value of z = 1.67 no longer falls into the rejection region, so now we fail to reject \(H_{0}\) .

If we now fail to reject \(H_{0}\) , what about the p-value? Remember, for a one-tailed test, p = \(\alpha\) if our observed z-score lands right on the critical value of z. The same is true for a two-tailed test. But the z-score moved so that the area above that score is \(\frac{\alpha}{2}\) . So for a two-tailed test, in order to have a p-value of \(\alpha\) when our z-score lands right on the critical value, we need to double p-value hat we’d get for a one-tailed test.

For our example, the p-value for the one tailed test was \(p=0.0478\) . So if we use a two-tailed test, our p-value is \((2)(0.0478) = 0.0956\) . This value is greater than \(\alpha\) = 0.05, which makes sense because we just showed above that we fail to reject \(H_{0}\) with a two tailed test.

Which is the right test, one-tailed or two-tailed? Ideally, as scientists, we should be agnostic about the results of our experiment. But in reality, we all know that the results are more interesting if they are statistically significant. So you can imagine that for this example, given a choice between one and two-tailed, we’d choose a one-tailed test so that we can reject \(H_{0}\) .

There are two problems with this. First, we should never adjust our choice of hypothesis test after we observe the data. That would be an example of ‘p-hacking’, a topic we’ll discuss later. Second, most statisticians these days strongly recommend against one-tailed tests. The only reason for a one-tailed test is if there is no logical or physical possibility for a population mean to fall below the null hypothesis mean.

Logo for Maricopa Open Digital Press

10 Chapter 10: Hypothesis Testing with Z

Setting up the hypotheses.

When setting up the hypotheses with z, the parameter is associated with a sample mean (in the previous chapter examples the parameters for the null used 0). Using z is an occasion in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the US, as our null value and test for differences against that. For now, we will focus on testing a value of a single mean against what we expect from the population.

Using birthweight as an example, our null hypothesis takes the form: H 0 : μ = 7.47 Notice that we are testing the value for μ, the population parameter, NOT the sample statistic ̅X (or M). We are referring to the data right now in raw form (we have not standardized it using z yet). Again, using inferential statistics, we are interested in understanding the population, drawing from our sample observations. For the research question, we have a mean value from the sample to use, we have specific data is – it is observed and used as a comparison for a set point.

As mentioned earlier, the alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. We will set the criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative.

If we expect our obtained sample mean to be above or below the null hypothesis value (knowing which direction), we set a directional hypothesis. O ur alternative hypothesis takes the form based on the research question itself. In our example with birthweight, this could be presented as H A : μ > 7.47 or H A : μ < 7.47. 

Note that we should only use a directional hypothesis if we have a good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative hypothesis. In our birthweight example, this could be set as H A : μ ≠ 7.47

In working with data for this course we will need to set a critical value of the test statistic for alpha (α) for use of test statistic tables in the back of the book. This is determining the critical rejection region that has a set critical value based on α.

Determining Critical Value from α

We set alpha (α) before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use.

When a research hypothesis predicts an effect but does not predict a direction for the effect, it is called a non-directional hypothesis . To test the significance of a non-directional hypothesis, we have to consider the possibility that the sample could be extreme at either tail of the comparison distribution. We call this a two-tailed test .

hypothesis mean z test

Figure 1. showing a 2-tail test for non-directional hypothesis for z for area C is the critical rejection region.

When a research hypothesis predicts a direction for the effect, it is called a directional hypothesis . To test the significance of a directional hypothesis, we have to consider the possibility that the sample could be extreme at one-tail of the comparison distribution. We call this a one-tailed test .

hypothesis mean z test

Figure 2. showing a 1-tail test for a directional hypothesis (predicting an increase) for z for area C is the critical rejection region.

Determining Cutoff Scores with Two-Tailed Tests

Typically we specify an α level before analyzing the data. If the data analysis results in a probability value below the α level, then the null hypothesis is rejected; if it is not, then the null hypothesis is not rejected. In other words, if our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis ; if not, we fail to reject the null (we never “accept” the null). According to this perspective, if a result is significant, then it does not matter how significant it is. Moreover, if it is not significant, then it does not matter how close to being significant it is. Therefore, if the 0.05 level is being used, then probability values of 0.049 and 0.001 are treated identically. Similarly, probability values of 0.06 and 0.34 are treated identically. Note we will discuss ways to address effect size (which is related to this challenge of NHST).

When setting the probability value, there is a special complication in a two-tailed test. We have to divide the significance percentage between the two tails. For example, with a 5% significance level, we reject the null hypothesis only if the sample is so extreme that it is in either the top 2.5% or the bottom 2.5% of the comparison distribution. This keeps the overall level of significance at a total of 5%. A one-tailed test does have such an extreme value but with a one-tailed test only one side of the distribution is considered.

hypothesis mean z test

Figure 3. Critical value differences in one and two-tail tests. Photo Credit

Let’s re view th e set critical values for Z.

We discussed z-scores and probability in chapter 8.  If we revisit the z-score for 5% and 1%, we can identify the critical regions for the critical rejection areas from the unit standard normal table.

  • A two-tailed test at the 5% level has a critical boundary Z score of +1.96 and -1.96
  • A one-tailed test at the 5% level has a critical boundary Z score of +1.64 or -1.64
  • A two-tailed test at the 1% level has a critical boundary Z score of +2.58 and -2.58
  • A one-tailed test at the 1% level has a critical boundary Z score of +2.33 or -2.33.

Review: Critical values, p-values, and significance level

There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effec t. The value laid out in H 0 is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true.

Now recall that values of z which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as or more extreme than z is very small as we get into the tails of the distribution. Our significance level corresponds to the area under the tail that is exactly equal to α: if we use our normal criterion of α = .05, then 5% of the area under the curve becomes what we call the rejection region (also called the critical region) of the distribution. This is illustrated in Figure 4.

image

Figure 4: The rejection region for a one-tailed test

The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.

The rejection region is bounded by a specific z-value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value, z crit (“z-crit”) or z* (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z-score corresponding to any area under the curve like we did in Unit 1. If we go to the normal table, we will find that the z-score corresponding to 5% of the area under the curve is equal to 1.645 (z = 1.64 corresponds to 0.0405 and z = 1.65 corresponds to 0.0495, so .05 is exactly in between them) if we go to the right and -1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing then shading the distribution is helpful for keeping directionality straight.

Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For α = .05, this means 2.5% of the area is in each tail, which, based on the z-table, corresponds to critical values of z* = ±1.96. This is shown in Figure 5.

image

Figure 5: Two-tailed rejection region

Thus, any z-score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z-scores in this way, the obtained value of z (sometimes called z-obtained) is something known as a test statistic, which is simply an inferential statistic used to test a null hypothesis.

Calculate the test statistic: Z

Now that we understand setting up the hypothesis and determining the outcome, let’s examine hypothesis testing with z!  The next step is to carry out the study and get the actual results for our sample. Central to hypothesis test is comparison of the population and sample means. To make our calculation and determine where the sample is in the hypothesized distribution we calculate the Z for the sample data.

Make a decision

To decide whether to reject the null hypothesis, we compare our sample’s Z score to the Z score that marks our critical boundary. If our sample Z score falls inside the rejection region of the comparison distribution (is greater than the z-score critical boundary) we reject the null hypothesis.

The formula for our z- statistic has not changed:

hypothesis mean z test

To formally test our hypothesis, we compare our obtained z-statistic to our critical z-value. If z obt > z crit , that means it falls in the rejection region (to see why, draw a line for z = 2.5 on Figure 1 or Figure 2) and so we reject H 0 . If z obt < z crit , we fail to reject. Remember that as z gets larger, the corresponding area under the curve beyond z gets smaller. Thus, the proportion, or p-value, will be smaller than the area for α, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.

Conversely, if we fail to reject, we know that the proportion will be larger than α because the z-statistic will not be as far into the tail. This is illustrated for a one- tailed test in Figure 6.

image

Figure 6. Relation between α, z obt , and p

When the null hypothesis is rejected, the effect is said to be statistically significant . Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.

Review: Steps of the Hypothesis Testing Process

The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remained of the textbook and course, and though the hypothesis and statistics we use will change, this process will not.

Step 1: State the Hypotheses

Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above AND in words, explaining in normal English what each one means in terms of the research question.

Step 2: Find the Critical Values

Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: α, which determines how much of the area under the curve composes our rejection region, and the directionality of the test, which determines where the region will be.

Step 3: Compute the Test Statistic

Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic, in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.

Step 4: Make the Decision

Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.

Example: Movie Popcorn

Let’s see how hypothesis testing works in action by working through an example. Say that a movie theater owner likes to keep a very close eye on how much popcorn goes into each bag sold, so he knows that the average bag has 8 cups of popcorn and that this varies a little bit, about half a cup. That is, the known population mean is μ = 8.00 and the known population standard deviation is σ =0.50. The owner wants to make sure that the newest employee is filling bags correctly, so over the course of a week he randomly assesses 25 bags filled by the employee to test for a difference (n = 25). He doesn’t want bags overfilled or under filled, so he looks for differences in both directions. This scenario has all of the information we need to begin our hypothesis testing procedure.

Our manager is looking for a difference in the mean cups of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:

H 0 : There is no difference in the cups of popcorn bags from this employee H 0 : μ = 8.00

Notice that we phrase the hypothesis in terms of the population parameter μ, which in this case would be the true average cups of bags filled by the new employee.

Our assumption of no difference, the null hypothesis, is that this mean is exactly

the same as the known population mean value we want it to match, 8.00. Now let’s do the alternative:

H A : There is a difference in the cups of popcorn bags from this employee H A : μ ≠ 8.00

In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.

Our critical values are based on two things: the directionality of the test and the level of significance. We decided in step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that α = 0.05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z-test at α = 0.05 are z* = ±1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution so we can visualize the rejection region and make sure it makes sense

image

Figure 7: Rejection region for z* = ±1.96

Step 3: Calculate the Test Statistic

Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average cups of this employee’s popcorn bags is ̅X = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z:

So our test statistic is z = -2.50, which we can draw onto our rejection region distribution:

image

Figure 8: Test statistic location

Looking at Figure 5, we can see that our obtained z-statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, -2.50 > -1.96, so we reject the null hypothesis. We can now write our conclusion:

When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average sample size we calculated (the exact location doesn’t matter, just somewhere that flows naturally and makes sense) and the z-statistic and p-value. We don’t know the exact p-value, but we do know that because we rejected the null, it must be less than α.

Effect Size

When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect sizes give us an idea of how large, important, or meaningful a statistically significant effect is.

For mean differences like we calculated here, our effect size is Cohen’s d :

hypothesis mean z test

Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Whenever you find a significant result, you should always calculate an effect size

Table 1. Interpretation of Cohen’s d

Example: Office Temperature

Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degree Fahrenheit but is allowed

to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.

You start by laying out the null hypothesis:

H 0 : There is no difference in the average building temperature H 0 : μ = 74

Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:

H A : The average building temperature is higher than claimed H A : μ > 74

image

Now that you have everything set up, you spend one week collecting temperature data:

You calculate the average of these scores to be 𝑋̅ = 76.6 degrees. You use this to calculate the test statistic, using μ = 74 (the supposed average temperature), σ = 1.00 (how much the temperature should vary), and n = 5 (how many data points you collected):

z = 76.60 − 74.00 = 2.60    = 5.78

          1.00/√5            0.45

This value falls so far into the tail that it cannot even be plotted on the distribution!

image

Figure 7: Obtained z-statistic

You compare your obtained z-statistic, z = 5.77, to the critical value, z* = 1.645, and find that z > z*. Therefore you reject the null hypothesis, concluding: Based on 5 observations, the average temperature (𝑋̅ = 76.6 degrees) is statistically significantly higher than it is supposed to be, z = 5.77, p < .05.

d = (76.60-74.00)/ 1= 2.60

The effect size you calculate is definitely large, meaning someone has some explaining to do!

Example: Different Significance Level

First, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, α = 0.01, to test the hypothesis.

We will use 60 as an arbitrary null hypothesis value: H 0 : The average score does not differ from the population H 0 : μ = 50

We will assume a two-tailed test: H A : The average score does differ H A : μ ≠ 50

We have seen the critical values for z-tests at α = 0.05 levels of significance several times. To find the values for α = 0.01, we will go to the standard normal table and find the z-score cutting of 0.005 (0.01 divided by 2 for a two-tailed test) of the area in the tail, which is z crit * = ±2.575. Notice that this cutoff is much higher than it was for α = 0.05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.

We can now calculate our test statistic.  The average of 10 scores is M = 60.40 with a µ = 60. We will use σ = 10 as our known population standard deviation. From this information, we calculate our z-statistic as:

Our obtained z-statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:

Notice two things about the end of the conclusion. First, we wrote that p is greater than instead of p is less than, like we did in the previous two examples. This is because we failed to reject the null hypothesis. We don’t know exactly what the p- value is, but we know it must be larger than the α level we used to test our hypothesis. Second, we used 0.01 instead of the usual 0.05, because this time we tested at a different level. The number you compare to the p-value should always be the significance level you test at. Because we did not detect a statistically significant effect, we do not need to calculate an effect size. Note: some statisticians will suggest to always calculate effects size as a possibility of Type II error. Although insignificant, calculating d = (60.4-60)/10 = .04 which suggests no effect (and not a possibility of Type II error).

Review Considerations in Hypothesis Testing

Errors in hypothesis testing.

Keep in mind that rejecting the null hypothesis is not an all-or-nothing decision. The Type I error rate is affected by the α level: the lower the α level the lower the Type I error rate. It might seem that α is the probability of a Type I error. However, this is not correct. Instead, α is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error. The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error.

Statistical Power

The statistical power of a research design is the probability of rejecting the null hypothesis given the sample size and expected relationship strength. Statistical power is the complement of the probability of committing a Type II error. Clearly, researchers should be interested in the power of their research designs if they want to avoid making Type II errors. In particular, they should make sure their research design has adequate power before collecting data. A common guideline is that a power of .80 is adequate. This means that there is an 80% chance of rejecting the null hypothesis for the expected relationship strength.

Given that statistical power depends primarily on relationship strength and sample size, there are essentially two steps you can take to increase statistical power: increase the strength of the relationship or increase the sample size. Increasing the strength of the relationship can sometimes be accomplished by using a stronger manipulation or by more carefully controlling extraneous variables to reduce the amount of noise in the data (e.g., by using a within-subjects design rather than a between-subjects design). The usual strategy, however, is to increase the sample size. For any expected relationship strength, there will always be some sample large enough to achieve adequate power.

Inferential statistics uses data from a sample of individuals to reach conclusions about the whole population. The degree to which our inferences are valid depends upon how we selected the sample (sampling technique) and the characteristics (parameters) of population data. Statistical analyses assume that sample(s) and population(s) meet certain conditions called statistical assumptions.

It is easy to check assumptions when using statistical software and it is important as a researcher to check for violations; if violations of statistical assumptions are not appropriately addressed then results may be interpreted incorrectly.

Learning Objectives

Having read the chapter, students should be able to:

  • Conduct a hypothesis test using a z-score statistics, locating critical region, and make a statistical decision including.
  • Explain the purpose of measuring effect size and power, and be able to compute Cohen’s d.

Exercises – Ch. 10

  • List the main steps for hypothesis testing with the z-statistic. When and why do you calculate an effect size?
  • z = 1.99, two-tailed test at α = 0.05
  • z = 1.99, two-tailed test at α = 0.01
  • z = 1.99, one-tailed test at α = 0.05
  • You are part of a trivia team and have tracked your team’s performance since you started playing, so you know that your scores are normally distributed with μ = 78 and σ = 12. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on the following 8 weeks’ worth of score data: 82, 74, 62, 68, 79, 94, 90, 81, 80.
  • A study examines self-esteem and depression in teenagers.  A sample of 25 teens with a low self-esteem are given the Beck Depression Inventory.  The average score for the group is 20.9.  For the general population, the average score is 18.3 with σ = 12.  Use a two-tail test with α = 0.05 to examine whether teenagers with low self-esteem show significant differences in depression.
  • You get hired as a server at a local restaurant, and the manager tells you that servers’ tips are $42 on average but vary about $12 (μ = 42, σ = 12). You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don’t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is $44.50 from tips. Test for a difference between this value and the population mean at the α = 0.05 level of significance.

Answers to Odd- Numbered Exercises – Ch. 10

1. List hypotheses. Determine critical region. Calculate z.  Compare z to critical region. Draw Conclusion.  We calculate an effect size when we find a statistically significant result to see if our result is practically meaningful or important

5. Step 1: H 0 : μ = 42 “My average tips does not differ from other servers”, H A : μ ≠ 42 “My average tips do differ from others”

Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • Search Search Please fill out this field.

What Is a Z-Test?

Understanding z-tests, one-sample z-test example.

  • Z-Test FAQs

The Bottom Line

  • Corporate Finance
  • Financial Analysis

Z-Test Definition: Its Uses in Statistics Simply Explained With Example

James Chen, CMT is an expert trader, investment adviser, and global market strategist.

hypothesis mean z test

Investopedia / Julie Bang

A z-test is a statistical test used to determine whether two population means are different when the variances are known and the sample size is large. It can also be used to compare one mean to a hypothesized value.

The data must approximately fit a normal distribution , otherwise the test doesn't work. Parameters such as variance and standard deviation should be calculated for a z-test to be performed.

Key Takeaways

  • A z-test is a statistical test to determine whether two population means are different or to compare one mean to a hypothesized value when the variances are known and the sample size is large.
  • A z-test is a hypothesis test for data that follows a normal distribution. 
  • A z-statistic, or z-score, is a number representing the result from the z-test.
  • Z-tests are closely related to t-tests, but t-tests are best performed when an experiment has a small sample size.
  • Z-tests assume the standard deviation is known, while t-tests assume it is unknown.

The z-test is also a hypothesis test in which the z-statistic follows a normal distribution. The z-test is best used for greater-than-30 samples because, under the central limit theorem , as the number of samples gets larger, the samples are considered to be approximately normally distributed.

When conducting a z-test, the null and alternative hypotheses, and alpha level should be stated. The z-score , also called a test statistic, should be calculated, and the results and conclusion stated. A z-statistic, or z-score, is a number representing how many standard deviations above or below the mean population a score derived from a z-test is.

Examples of tests that can be conducted as z-tests include a one-sample location test, a two-sample location test, a paired difference test, and a maximum likelihood estimate. Z-tests are closely related to t-tests, but t-tests are best performed when an experiment has a small sample size. Also, t-tests assume the standard deviation is unknown, while z-tests assume it is known. If the standard deviation of the population is unknown, the assumption of the sample variance equaling the population variance is made.

Assume an investor wishes to test whether the average daily return of a stock is greater than 3%. A simple random sample of 50 returns is calculated and has an average of 2%. Assume the standard deviation of the returns is 2.5%. Therefore, the null hypothesis is when the average, or mean, is equal to 3%.

Conversely, the alternative hypothesis is whether the mean return is greater or less than 3%. Assume an alpha of 0.05% is selected with a two-tailed test . Consequently, there is 0.025% of the samples in each tail, and the alpha has a critical value of 1.96 or -1.96. If the value of z is greater than 1.96 or less than -1.96, the null hypothesis is rejected.

The value for z is calculated by subtracting the value of the average daily return selected for the test, or 3% in this case, from the observed average of the samples. Next, divide the resulting value by the standard deviation divided by the square root of the number of observed values.

Therefore, the test statistic is:

(0.02 - 0.03) ÷ (0.025 ÷ √ 50) = -2.83

The investor rejects the null hypothesis since z is less than -1.96 and concludes that the average daily return is less than 3%.

What's the Difference Between a T-Test and Z-Test?

Z-tests are closely related to t-tests, but t-tests are best performed when the data consists of a small sample size, i.e., less than 30. Also, t-tests assume the standard deviation is unknown, while z-tests assume it is known.

When Should You Use a Z-Test?

If the standard deviation of the population is known and the sample size is greater than or equal to 30, the z-test can be used. Regardless of the sample size, if the population standard deviation is unknown, a t-test should be used instead.

What Is a Z-Score?

A z-score, or z-statistic, is a number representing how many standard deviations above or below the mean population the score derived from a z-test is. Essentially, it is a numerical measurement that describes a value's relationship to the mean of a group of values. If a z-score is 0, it indicates that the data point's score is identical to the mean score. A z-score of 1.0 would indicate a value that is one standard deviation from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean.

What Is Central Limit Theorem (CLT)?

In the study of probability theory, the central limit theorem (CLT) states that the distribution of sample approximates a normal distribution (also known as a “bell curve”) as the sample size becomes larger, assuming that all samples are identical in size, and regardless of the population distribution shape. Sample sizes equal to or greater than 30 are considered sufficient for the CLT to predict the characteristics of a population accurately. The z-test's fidelity relies on the CLT holding.

A z-test is used in hypothesis testing to evaluate whether a finding or association is statistically significant or not. In particular, it tests whether two means are the same (the null hypothesis). A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger. Otherwise, a t-test should be employed.

hypothesis mean z test

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

A Z-test is a type of statistical hypothesis test used to test the mean of a normally distributed test statistic. It tests whether there is a significant difference between an observed population mean and the population mean under the null hypothesis, H 0 .

A Z-test can only be used when the population variance is known (or can be estimated with a high degree of accuracy), or if the sample size of the experiment is large (typically n>30). Also, the test statistic must exhibit a normal distribution; if it exhibits a distribution that is clearly not normal, the Z-test is not applicable. In many cases, population parameters may not be known, or it may not be possible to estimate them accurately. In such cases, or in cases where the sample size is small, a Student's t-test is more appropriate.

How to conduct a Z-test

The procedure for conducting a Z-test is similar to that of other statistical hypothesis tests, and is generally as follows:

  • State the null (H 0 ) and alternative hypotheses (H a ).
  • Select a significance level, α.
  • Calculate the Z-score.
  • Determine the critical value(s) of Z or the p-value.
  • Compare the Z-score of the observed value to the critical value of Z (or compare the p-value to α) to determine if the null hypothesis should be rejected in favor of the alternative hypothesis, or if the null hypothesis should not be rejected.

H 0 and H a

The null hypothesis is typically a statement of no difference. For example, assume that the average score received on the SAT by high schoolers in a given state was a 1200 with a known standard deviation. If the average score of students in a given high school is a 1230, we may use a Z-test to determine whether this result is better, statistically, than the state average. The null hypothesis in this case would be that the average score of students in the high school is not better than the state average, or H 0 : μ ≤ μ 0 , or μ ≤ 1200.

The alternative hypothesis is a statement of difference from the null hypothesis. It can take one of three forms:

  • Given H 0 : μ ≤ μ 0 , H a : μ > μ 0
  • Given H 0 : μ ≥ μ 0 , H a : μ 0
  • Given H 0 : μ = μ 0 , H a : μ ≠ μ 0

In this example, it is believed that a score of 1230 is statistically significant, and that students in this high school performed better than the state average. Therefore, the alternative hypothesis takes on the first form in the list, H a : μ > μ 0 , or μ > 1200.

Significance level

The significance level, α, is the probability of a study rejecting the null hypothesis when the null hypothesis is true. Commonly used significance levels include 0.01, 0.05, and 0.10. A significance level of 0.05, or 5%, means that there is a 5% chance of concluding that a difference exists (thus rejecting H 0 ) when there is no actual difference. The lower the significance level, the more evidence required before the null hypothesis can be rejected. The significance level is compared to the p-value: if a p-value is less than the significance level, the null hypothesis is rejected in favor of the alternative hypothesis.

Calculating a Z-score is a necessary part of conducting a Z-test. A Z-score indicates the number of standard deviations that an observed value is from the mean in a standard normal distribution. For example, an observed value with a Z-score of 1.2 indicates that the observed value is 1.2 standard deviations from the mean. If the population mean and standard deviation are known, the Z-score is calculated using the following formula:

where μ is the mean of the population, σ is the standard deviation of the population, and x is the observed value. In many cases the population mean and standard deviation are not known. In such cases, these population parameters can be estimated using a sample mean and sample standard deviation, and the Z-score can be computed as follows:

where x is the sample mean, s is the sample standard deviation, and x is the observed value.

Critical value and p-value

Once a Z-score has been calculated, there are two methods for drawing conclusions about the test statistic: using the critical value(s), or using a p-value. To form a conclusion for a hypothesis test using a critical value, the Z-score of the observed value is compared to the critical value(s) of the selected significance level; to use a p-value, the p-value of the observed value is compared to the significance level.

Critical value

A critical value is a value that indicates the critical region(s) (or rejection region) of the standard normal distribution, where a critical region is the area of the distribution in which a value must lie in order to reject the null hypothesis.

The critical value is dependent on the significance level as well as whether a one-tailed or two-tailed test is being conducted. A one-tailed test is used when we want to know if a value is significantly larger or smaller than the Z-score. There is only one critical region in a one-tailed Z-test. It is either a left-tailed test (or lower-tailed) or right-tailed test (or upper-tailed) based on the position of the critical region, as shown in the figure below.

The critical regions are shown in pink. If a test statistic lies within the pink region, the null hypothesis is rejected in favor of the alternative hypothesis. Otherwise, the null hypothesis is not rejected.

hypothesis mean z test

If a test value lies in either of the critical regions shown in pink, the null hypothesis is rejected in favor of the alternative hypothesis; if it lies within the green region, the null hypothesis is not rejected.

After selecting the significance level and type of test, the critical Z value can be determined using a Z table by finding the Z value that corresponds to the selected significance level. For example, for a one-tailed test and a significance level of 0.05, find the probability closest to 0.05 and read the Z value that results in this probability; the Z value for α = 0.05 for a one-tailed Z-test is -1.96 for a left-tailed Z-test and 1.96 for a right-tailed Z-test. For a two-tailed Z-test, divide α by 2, then determine the corresponding Z-value. For α = 0.05, each tail will comprise an area of 0.025 in the standard normal distribution, which corresponds to Z-values of -1.645 and 1.645. Thus, the critical regions are Z 1.645. The critical values for common significance levels are shown in the table below:

The p-value indicates the probability of obtaining test results that are at least as extreme as the observed results, assuming that the null hypothesis is true. It tells us how likely it is for an outcome to occur solely based on chance. For example, a p-value of 0.05 means that there is a 5% chance that an outcome occurred solely by chance. The smaller the p-value, the less likely it is for an outcome to occur solely by chance, and the more evidence there is to reject the null hypothesis.

Like critical values, a p-value can be determined using a Z table. For a left-tailed Z-test, the p-value is the area under the standard normal distribution to the left of the Z-score of the observed value; for a right-tailed Z-test, it is the area to the right of the Z-score; for a two-tailed Z-test, it is the sum of the area to the left and right of the Z-score. If the p-value is less than or equal to the significance level, the null hypothesis is rejected in favor of the alternative hypothesis. Otherwise, the null hypothesis is not rejected.

It is important to note that the p-value is not the probability that the null hypothesis is true. It is the probability that the data could deviate from the null hypothesis as much, or more than it did. The calculation of the p-value assumes that the null hypothesis is true, so it is not a measure of whether or not the null hypothesis is correct. Rather, it is a measure of how well the data fits the null hypothesis. Also, the p-value (or critical value) may provide evidence that the null hypothesis should be rejected in favor of the alternative hypothesis at the chosen level of significance . This does not mean that the alternative hypothesis is being accepted, because it is possible that the null hypothesis would not be rejected at a different significance level. Similarly, if the p-value is greater than the significance level, this does not mean that the null hypothesis is being accepted, just that the null hypothesis is not rejected.

Finally, p-values and critical values only indicate statistical significance, and may not necessarily indicate that the study's findings are significant within their context. For example, if a new medicine and a placebo are tested on different populations, and the medicine is found to have a statistically significant effect, it may not necessarily mean that there is clinical significance. It is possible for a finding to be both statistically and clinically significant, or only one or the other. For large sample sizes, it is possible for results to indicate statistical significance even when the effect is actually small and unimportant. Conversely, a small sample may not exhibit statistical significance even when the effect is large and potentially important. Thus, it is important to fully understand the scope of a study, as well as the statistical methods used, in order to effectively interpret the results and draw accurate, unbiased conclusions.

The average score on a national mathematics exam taken by high school seniors is an 82 with a standard deviation of 8. A sample of 1000 seniors achieved an average score of 68. Perform a Z-test to determine whether there is a statistically significant difference between the national average and that of the sample of seniors at a significance level of 0.05.

We want to determine whether there is any difference, so the null hypothesis is that there is no difference, or

H 0 : μ = 82

and the alternative hypothesis is:

H a : μ ≠ 82

Thus, a two-tailed Z-test should be conducted since differences on either side of the distribution must be accounted for.

The selected significance level is:

α = 0.05

This value must be greater than the p-value in order to conclude that the difference in scores is statistically significant.

Since the population standard deviation and mean are known, the Z-score can be computed as:

Based on the selected significance level and the use of a two-tailed Z-test, the critical values are Z = ± 1.96. Since the Z-score of the observed value lies between both tails (rather than within one of them), we fail to reject the null hypothesis, as depicted in the figure below.

hypothesis mean z test

Thus, we conclude that the difference between the observed mean and the population mean is not statistically significant for a significance level of 0.05.

However, had we selected a significance level of 0.10, the critical values would be Z = ±1.645, and Z = -1.75 would lie within the left tail of the distribution. In this case, we would reject the null hypothesis in favor of the alternative hypothesis, and conclude that the observed value is statistically significant for a significance level of 0.10.

The above discussion involved hypothesis testing for one sample, where an observed value was compared to the expected population parameter. In certain cases, scientists may want to compare the means of two samples. In such cases, a two-sample Z-test is used instead.

Two-sample Z-test

A two-sample Z-test is conducted using the same procedures described above for a one-sample Z-test, with the exception that the Z-score is computed using the following formula:

where μ 1 and μ 2 are the means of the two respective populations, x 1 and x 2 are the sample means, and n 1 and n 2 are the sample sizes.

Researchers want to test whether a certain drug has any effect on the scores received by patients who are administered the drug prior to performing a physical stress test. The researchers place patients into 2 groups: 500 are placed into the experimental group and are administered the drug; 300 are placed into the control group and are administered a placebo. Both groups then perform the physical stress test, the results of which are as follows:

Determine whether or not there is a statistically significant difference between the two groups at a significance level of 0.05.

The null hypothesis is that there is no difference, so:

H 0 : μ 1 = μ 2

Also, since it is assumed that the null hypothesis is true, μ 1 - μ 2 = 0.

The alternative hypothesis is that there is a difference, so:

H a : μ 1 ≠ μ 2

The selected significance level is 0.05, and we conduct a two-tailed test since we are looking for any observable difference.

The Z-score is then calculated as follows:

Using a Z table (or a p-value calculator), the p-value for a two-tailed Z-test for a Z-score of 2.604 is 0.009214. Since the p-value is less than the selected significance level, we reject the null hypothesis in favor of the alternative hypothesis, and conclude that the drug has a statistically significant effect on the performance of the patients. Since the Z-score lies in the right tail, we may conclude that patients who received the drug scored significantly better than those who received the placebo. If the Z-score were to lie in left tail, we would conclude the opposite: that patients who received the drug performed significantly worse.

We could also have used the critical values Z = ±1.96 for a significance level of 0.05 to reach the same conclusion, since 2.604 lies within the critical region denoted by the right tail of the distribution, as shown in the figure below.

hypothesis mean z test

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

10.1: The One-Sample z-test

  • Last updated
  • Save as PDF
  • Page ID 29507

  • Danielle Navarro
  • University of New South Wales

In this section, we'll discuss one of the most useless tests in all of statistics: the z-test . Seriously – this test is almost never used in real life. Its only real purpose is that, when teaching statistics, it’s a very convenient stepping stone along the way towards the t -test, which is probably the most (over)used tool in all statistics.

The Inference Problem that the Test Addresses

To introduce the idea behind the z -test, let’s use a simple example. A friend of mine, Dr. Zeppo, grades his introductory statistics class on a curve. Let’s suppose that the average grade in his class is 67.5, and the standard deviation is 9.5. Of his many hundreds of students, it turns out that 20 of them also take psychology classes. Out of curiosity, I find myself wondering: do the psychology students tend to get the same grades as everyone else (i.e., mean 67.5), or do they tend to score higher or lower? He emails me the zeppo.sav file, which contains the grades of those students, and calculate the mean:

clipboard_e8c9575cef1e5cf12fb4f5f62aa2716ac.png

Hm. It might be that the psychology students are scoring a bit higher than normal: that sample mean of \(\bar{X}\) = 72.3 is a fair bit higher than the hypothesized population mean of μ =67.5, but on the other hand, a sample size of N=20 isn’t all that big. Maybe it’s pure chance.

To answer the question, it helps to be able to write down what it is that we think we know. Firstly, we know that the sample mean is \(\bar{X}\) =72.3. If we're willing to assume that the psychology students have the same standard deviation as the rest of the class then we can say that the population standard deviation is σ=9.5. We’ll also assume that since Dr. Zeppo is grading to a curve, the psychology student grades are normally distributed.

Next, it helps to be clear about what we want to learn from the data. In this case, the research hypothesis relates to the population mean μ for the psychology student grades, which is unknown. Specifically, I want to know if μ =67.5 or not. Given that this is what I know, can we devise a hypothesis test to solve our problem? The data, along with the hypothesized distribution from which they are thought to arise, are shown in Figure 10.1. Not entirely obvious what the right answer is, is it? For this, we are going to need some statistics.

zeppo-1.png

Constructing the Hypothesis Test

The first step in constructing a hypothesis test is to be clear about what the null and alternative hypotheses are. This isn’t too hard to do. Our null hypothesis, H 0 , is that the true population mean μ for psychology student grades is 67.5%; and our alternative hypothesis is that the population mean isn’t 67.5%. If we write this in mathematical notation, these hypotheses become,

H 0 : μ =67.5

H 1 : μ ≠67.5

though to be honest this notation doesn’t add much to our understanding of the problem, it’s just a compact way of writing down what we’re trying to learn from the data. The null hypothesis H 0 and the alternative hypothesis H 1 for our test are both illustrated in Figure 10.2. In addition to providing us with these hypotheses, the scenario outlined above provides us with a fair amount of background knowledge that might be useful. Specifically, there are two special pieces of information that we can add:

  • The psychology grades are normally distributed.
  • The true standard deviation of these scores σ is known to be 9.5.

For the moment, we’ll act as if these are absolutely trustworthy facts. In real life, this kind of absolutely trustworthy background knowledge doesn’t exist, and so if we want to rely on these facts we’ll just have to make the assumption that these things are true. However, since these assumptions may or may not be warranted, we might need to check them. For now, though, we’ll keep things simple.

ztesthyp-1.png

The next step is to figure out what would be a good choice for a diagnostic test statistic; something that would help us discriminate between H 0 and H 1 . Given that the hypotheses all refer to the population mean μ , you’d feel pretty confident that the sample mean \(\bar{X}\) would be a pretty useful place to start. What we could do, is look at the difference between the sample mean \(\bar{X}\) and the value that the null hypothesis predicts for the population mean. In our example, that would mean we calculate \(\bar{X}\) - 67.5. More generally, if we let μ 0 refer to the value that the null hypothesis claims is our population mean, then we’d want to calculate

\(\bar{X}-\mu_{0}\)

If this quantity equals or is very close to 0, things are looking good for the null hypothesis. If this quantity is a long way away from 0, then it’s looking less likely that the null hypothesis is worth retaining. But how far away from zero should it be for us to reject H 0 ?

To figure that out, we need to be a bit more sneaky, and we’ll need to rely on those two pieces of background knowledge that we wrote down previously, namely that the raw data are normally distributed, and we know the value of the population standard deviation σ . If the null hypothesis is actually true, and the true mean is μ 0 , then these facts together mean that we know the complete population distribution of the data: a normal distribution with mean μ 0 and standard deviation σ . Adopting the notation from Section ???.5, a statistician might write this as:

X∼Normal(μ 0 ,σ 2 )

Okay, if that’s true, then what can we say about the distribution of \(\bar{X}\)? Well, as we discussed earlier (see Section ????.3.3), the sampling distribution of the mean \(\bar{X}\) is also normal and has mean μ . But the standard deviation of this sampling distribution SE (\(\bar{X}\)), which is called the standard error of the mean , is

\(\operatorname{SE}(\bar{X})=\frac{\sigma}{\sqrt{N}}\)

In other words, if the null hypothesis is true then the sampling distribution of the mean can be written as follows:

\(\bar{X}\)∼Normal(μ 0 ,SE(\(\bar{X}\)))

Now comes the trick. What we can do is convert the sample mean \(\bar{X}\) into a standard score (Section ??.6). This is conventionally written as z , but for now I’m going to refer to it as \(z_{\bar{X}}\). (The reason for using this expanded notation is to help you remember that we’re calculating standardized version of a sample mean, not a standardized version of a single observation, which is what a z -score usually refers to). When we do so, the z -score for our sample mean is

\(\ z_{\bar{X}} = {{\bar{X}-\mu_{0}} \over SE(\bar{X})}\)

or, equivalently

\(\ z_{\bar{X}} = {{\bar{X}-\mu_{0}} \over \sigma/ \sqrt{N} }\)

This z -score is our test statistic. The nice thing about using this as our test statistic is that like all z -scores, it has a standard normal distribution:

\(\ z_{\bar{X}}\)∼Normal(0,1)

(again, see Section ???.6 if you’ve forgotten why this is true). In other words, regardless of what scale the original data are on, the z -statistic itself always has the same interpretation: it’s equal to the number of standard errors that separate the observed sample mean\(\bar{X}\) from the population mean μ 0 predicted by the null hypothesis. Better yet, regardless of what the population parameters for the raw scores actually are, the 5% critical regions for the z -test are always the same, as illustrated in Figures 10.4 and 10.3. And what this meant, way back in the days when people did all their statistics by hand, is that someone could publish a table like this:

which in turn meant that researchers could calculate their z-statistic by hand, and then look up the critical value in a text book. That was an incredibly handy thing to be able to do back then, but it’s kind of unnecessary these days, since it’s trivially easy to do it with software like R.

ztest2-1.png

A Worked Example Using SPSS

Now, as I mentioned earlier, the z -test is almost never used in practice. It’s so rarely used in real life that the drop-down windows in SPSS don’t have a built-in function for it. However, the test is so incredibly simple that it’s really easy to do one manually. Let’s go back to the data from Dr. Zeppo’s class. Having loaded the zeppo.sav data, the first thing I need to do is calculate the sample mean:

clipboard_ea832950152f00b1a08e1ad74ada9ed9d.png

Then, using syntax (yes, I know we haven't talked about that), we can create a set of variables corresponding to the sample size, known population standard deviation ( σ =9.5), and the value of the population mean that the null hypothesis specifies ( μ 0 =67.5):

The next bit of syntax (courtesy of www.how2stats.net ) will compute the necessary stuff for a z -test and barf out the z -score, the p -value, and Cohen's d :

The results of all these mathematical shenanigans is:

At this point, we would traditionally look up the value 2.26 in our table of critical values. Our original hypothesis was two-sided (we didn’t really have any theory about whether psych students would be better or worse at statistics than other students) so our hypothesis test is two-sided (or two-tailed) also. Looking at the little table that I showed earlier, we can see that 2.26 is bigger than the critical value of 1.96 required to be significant at α =.05, but smaller than the value of 2.58 that would be required to be significant at a level of α=.01. Therefore, we can conclude that we have a significant effect, which we might write up by saying something like this:

With a mean grade of 73.2 in the sample of psychology students, and assuming a true population standard deviation of 9.5, we can conclude that the psychology students have significantly different statistics scores to the class average (z=2.26, N=20, p<.05).

If you would like to report the exact p-value, this bit of syntax has got you covered. If you look, the output includes the p-value of .02385. Oh, and for good measure we got Cohen's d (the effect size) of .50526.

Assumptions of the z -test

As stated before, all statistical tests make assumptions. Some tests make reasonable assumptions, while other tests do not. The test we’ve just discussed – the one-sample z -test – makes three basic assumptions. These are:

  • Normality . As usually described, the z -test assumes that the true population distribution is normal. This is often pretty reasonable, and not only that, it’s an assumption that we can check if we feel worried about it (see Section 10.9).
  • Independence . The second assumption of the test is that the observations in your data set are not correlated with each other, or related to each other in some funny way. This isn’t as easy to check statistically: it relies a bit on good experimental design. An obvious (and stupid) example of something that violates this assumption is a data set where you “copy” the same observation over and over again in your data file: so you end up with a massive “sample size”, consisting of only one genuine observation. More realistically, you have to ask yourself if it’s really plausible to imagine that each observation is a completely random sample from the population that you’re interested in. In practice, this assumption is never met; but we try our best to design studies that minimize the problems of correlated data.
  • Known standard deviation . The third assumption of the z -test is that the true standard deviation of the population is known to the researcher. This is just stupid. In no real-world data analysis problem do you know the standard deviation σ of some population, but are completely ignorant about the mean μ . In other words, this assumption is always wrong.

In view of the stupidity of assuming that σ is known, let’s see if we can live without it. This takes us out of the dreary domain of the z -test, and into the magical kingdom of the t -test, with unicorns and fairies and leprechauns, and um…

Hypothesis Testing for Means & Proportions

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

hypothesis mean z test

Introduction

This is the first of three modules that will addresses the second area of statistical inference, which is hypothesis testing, in which a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The process of hypothesis testing involves setting up two competing hypotheses, the null hypothesis and the alternate hypothesis. One selects a random sample (or multiple samples when there are more comparison groups), computes summary statistics and then assesses the likelihood that the sample data support the research or alternative hypothesis. Similar to estimation, the process of hypothesis testing is based on probability theory and the Central Limit Theorem.  

This module will focus on hypothesis testing for means and proportions. The next two modules in this series will address analysis of variance and chi-squared tests. 

Learning Objectives

After completing this module, the student will be able to:

  • Define null and research hypothesis, test statistic, level of significance and decision rule
  • Distinguish between Type I and Type II errors and discuss the implications of each
  • Explain the difference between one and two sided tests of hypothesis
  • Estimate and interpret p-values
  • Explain the relationship between confidence interval estimates and p-values in drawing inferences
  • Differentiate hypothesis testing procedures based on type of outcome variable and number of sample

Introduction to Hypothesis Testing

Techniques for hypothesis testing  .

The techniques for hypothesis testing depend on

  • the type of outcome variable being analyzed (continuous, dichotomous, discrete)
  • the number of comparison groups in the investigation
  • whether the comparison groups are independent (i.e., physically separate such as men versus women) or dependent (i.e., matched or paired such as pre- and post-assessments on the same participants).

In estimation we focused explicitly on techniques for one and two samples and discussed estimation for a specific parameter (e.g., the mean or proportion of a population), for differences (e.g., difference in means, the risk difference) and ratios (e.g., the relative risk and odds ratio). Here we will focus on procedures for one and two samples when the outcome is either continuous (and we focus on means) or dichotomous (and we focus on proportions).

General Approach: A Simple Example

The Centers for Disease Control (CDC) reported on trends in weight, height and body mass index from the 1960's through 2002. 1 The general trend was that Americans were much heavier and slightly taller in 2002 as compared to 1960; both men and women gained approximately 24 pounds, on average, between 1960 and 2002.   In 2002, the mean weight for men was reported at 191 pounds. Suppose that an investigator hypothesizes that weights are even higher in 2006 (i.e., that the trend continued over the subsequent 4 years). The research hypothesis is that the mean weight in men in 2006 is more than 191 pounds. The null hypothesis is that there is no change in weight, and therefore the mean weight is still 191 pounds in 2006.  

In order to test the hypotheses, we select a random sample of American males in 2006 and measure their weights. Suppose we have resources available to recruit n=100 men into our sample. We weigh each participant and compute summary statistics on the sample data. Suppose in the sample we determine the following:

Do the sample data support the null or research hypothesis? The sample mean of 197.1 is numerically higher than 191. However, is this difference more than would be expected by chance? In hypothesis testing, we assume that the null hypothesis holds until proven otherwise. We therefore need to determine the likelihood of observing a sample mean of 197.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true or under the null hypothesis). We can compute this probability using the Central Limit Theorem. Specifically,

(Notice that we use the sample standard deviation in computing the Z score. This is generally an appropriate substitution as long as the sample size is large, n > 30. Thus, there is less than a 1% probability of observing a sample mean as large as 197.1 when the true population mean is 191. Do you think that the null hypothesis is likely true? Based on how unlikely it is to observe a sample mean of 197.1 under the null hypothesis (i.e., <1% probability), we might infer, from our data, that the null hypothesis is probably not true.

Suppose that the sample data had turned out differently. Suppose that we instead observed the following in 2006:

How likely it is to observe a sample mean of 192.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true)? We can again compute this probability using the Central Limit Theorem. Specifically,

There is a 33.4% probability of observing a sample mean as large as 192.1 when the true population mean is 191. Do you think that the null hypothesis is likely true?  

Neither of the sample means that we obtained allows us to know with certainty whether the null hypothesis is true or not. However, our computations suggest that, if the null hypothesis were true, the probability of observing a sample mean >197.1 is less than 1%. In contrast, if the null hypothesis were true, the probability of observing a sample mean >192.1 is about 33%. We can't know whether the null hypothesis is true, but the sample that provided a mean value of 197.1 provides much stronger evidence in favor of rejecting the null hypothesis, than the sample that provided a mean value of 192.1. Note that this does not mean that a sample mean of 192.1 indicates that the null hypothesis is true; it just doesn't provide compelling evidence to reject it.

In essence, hypothesis testing is a procedure to compute a probability that reflects the strength of the evidence (based on a given sample) for rejecting the null hypothesis. In hypothesis testing, we determine a threshold or cut-off point (called the critical value) to decide when to believe the null hypothesis and when to believe the research hypothesis. It is important to note that it is possible to observe any sample mean when the true population mean is true (in this example equal to 191), but some sample means are very unlikely. Based on the two samples above it would seem reasonable to believe the research hypothesis when x̄ = 197.1, but to believe the null hypothesis when x̄ =192.1. What we need is a threshold value such that if x̄ is above that threshold then we believe that H 1 is true and if x̄ is below that threshold then we believe that H 0 is true. The difficulty in determining a threshold for x̄ is that it depends on the scale of measurement. In this example, the threshold, sometimes called the critical value, might be 195 (i.e., if the sample mean is 195 or more then we believe that H 1 is true and if the sample mean is less than 195 then we believe that H 0 is true). Suppose we are interested in assessing an increase in blood pressure over time, the critical value will be different because blood pressures are measured in millimeters of mercury (mmHg) as opposed to in pounds. In the following we will explain how the critical value is determined and how we handle the issue of scale.

First, to address the issue of scale in determining the critical value, we convert our sample data (in particular the sample mean) into a Z score. We know from the module on probability that the center of the Z distribution is zero and extreme values are those that exceed 2 or fall below -2. Z scores above 2 and below -2 represent approximately 5% of all Z values. If the observed sample mean is close to the mean specified in H 0 (here m =191), then Z will be close to zero. If the observed sample mean is much larger than the mean specified in H 0 , then Z will be large.  

In hypothesis testing, we select a critical value from the Z distribution. This is done by first determining what is called the level of significance, denoted α ("alpha"). What we are doing here is drawing a line at extreme values. The level of significance is the probability that we reject the null hypothesis (in favor of the alternative) when it is actually true and is also called the Type I error rate.

α = Level of significance = P(Type I error) = P(Reject H 0 | H 0 is true).

Because α is a probability, it ranges between 0 and 1. The most commonly used value in the medical literature for α is 0.05, or 5%. Thus, if an investigator selects α=0.05, then they are allowing a 5% probability of incorrectly rejecting the null hypothesis in favor of the alternative when the null is in fact true. Depending on the circumstances, one might choose to use a level of significance of 1% or 10%. For example, if an investigator wanted to reject the null only if there were even stronger evidence than that ensured with α=0.05, they could choose a =0.01as their level of significance. The typical values for α are 0.01, 0.05 and 0.10, with α=0.05 the most commonly used value.  

Suppose in our weight study we select α=0.05. We need to determine the value of Z that holds 5% of the values above it (see below).

Standard normal distribution curve showing an upper tail at z=1.645 where alpha=0.05

The critical value of Z for α =0.05 is Z = 1.645 (i.e., 5% of the distribution is above Z=1.645). With this value we can set up what is called our decision rule for the test. The rule is to reject H 0 if the Z score is 1.645 or more.  

With the first sample we have

Because 2.38 > 1.645, we reject the null hypothesis. (The same conclusion can be drawn by comparing the 0.0087 probability of observing a sample mean as extreme as 197.1 to the level of significance of 0.05. If the observed probability is smaller than the level of significance we reject H 0 ). Because the Z score exceeds the critical value, we conclude that the mean weight for men in 2006 is more than 191 pounds, the value reported in 2002. If we observed the second sample (i.e., sample mean =192.1), we would not be able to reject the null hypothesis because the Z score is 0.43 which is not in the rejection region (i.e., the region in the tail end of the curve above 1.645). With the second sample we do not have sufficient evidence (because we set our level of significance at 5%) to conclude that weights have increased. Again, the same conclusion can be reached by comparing probabilities. The probability of observing a sample mean as extreme as 192.1 is 33.4% which is not below our 5% level of significance.

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.  

  • Step 1. Set up hypotheses and select the level of significance α.

H 0 : Null hypothesis (no change, no difference);  

H 1 : Research hypothesis (investigator's belief); α =0.05

  • Step 2. Select the appropriate test statistic.  

The test statistic is a single number that summarizes the sample information.   An example of a test statistic is the Z statistic computed as follows:

When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.

  • Step 3.  Set up decision rule.  

The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.

  • The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value.  In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
  • The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.  
  • The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value.   For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.  

The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.

Standard normal distribution with lower tail at -1.645 and alpha=0.05

Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < 1.645.

Standard normal distribution with two tails

Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.

The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."

Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."

  • Step 4. Compute the test statistic.  

Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.

  • Step 5. Conclusion.  

The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).  

If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .

Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .  

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 191 H 1 : μ > 191                 α =0.05

The research hypothesis is that weights have increased, and therefore an upper tailed test is used.

  • Step 2. Select the appropriate test statistic.

Because the sample size is large (n > 30) the appropriate test statistic is

  • Step 3. Set up decision rule.  

In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05.   Reject H 0 if Z > 1.645.

We now substitute the sample data into the formula for the test statistic identified in Step 2.  

We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.                  

Type I and Type II Errors

In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).

Table - Conclusions in Test of Hypothesis

In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).

When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.

Lightbulb icon signifying an important idea

 The most common reason for a Type II error is a small sample size.

Tests with One Sample, Continuous Outcome

Hypothesis testing applications with a continuous outcome variable in a single population are performed according to the five-step procedure outlined above. A key component is setting up the null and research hypotheses. The objective is to compare the mean in a single population to known mean (μ 0 ). The known value is generally derived from another study or report, for example a study in a similar, but not identical, population or a study performed some years ago. The latter is called a historical control. It is important in setting up the hypotheses in a one sample test that the mean specified in the null hypothesis is a fair and reasonable comparator. This will be discussed in the examples that follow.

Test Statistics for Testing H 0 : μ= μ 0

  • if n > 30
  • if n < 30

Note that statistical computing packages will use the t statistic exclusively and make the necessary adjustments for comparing the test statistic to appropriate values from probability tables to produce a p-value. 

The National Center for Health Statistics (NCHS) published a report in 2005 entitled Health, United States, containing extensive information on major trends in the health of Americans. Data are provided for the US population as a whole and for specific ages, sexes and races.  The NCHS report indicated that in 2002 Americans paid an average of $3,302 per year on health care and prescription drugs. An investigator hypothesizes that in 2005 expenditures have decreased primarily due to the availability of generic drugs. To test the hypothesis, a sample of 100 Americans are selected and their expenditures on health care and prescription drugs in 2005 are measured.   The sample data are summarized as follows: n=100, x̄

=$3,190 and s=$890. Is there statistical evidence of a reduction in expenditures on health care and prescription drugs in 2005? Is the sample mean of $3,190 evidence of a true reduction in the mean or is it within chance fluctuation? We will run the test using the five-step approach. 

  • Step 1.  Set up hypotheses and determine level of significance

H 0 : μ = 3,302 H 1 : μ < 3,302           α =0.05

The research hypothesis is that expenditures have decreased, and therefore a lower-tailed test is used.

This is a lower tailed test, using a Z statistic and a 5% level of significance.   Reject H 0 if Z < -1.645.

  •   Step 4. Compute the test statistic.  

We do not reject H 0 because -1.26 > -1.645. We do not have statistically significant evidence at α=0.05 to show that the mean expenditures on health care and prescription drugs are lower in 2005 than the mean of $3,302 reported in 2002.  

Recall that when we fail to reject H 0 in a test of hypothesis that either the null hypothesis is true (here the mean expenditures in 2005 are the same as those in 2002 and equal to $3,302) or we committed a Type II error (i.e., we failed to reject H 0 when in fact it is false). In summarizing this test, we conclude that we do not have sufficient evidence to reject H 0 . We do not conclude that H 0 is true, because there may be a moderate to high probability that we committed a Type II error. It is possible that the sample size is not large enough to detect a difference in mean expenditures.      

The NCHS reported that the mean total cholesterol level in 2002 for all adults was 203. Total cholesterol levels in participants who attended the seventh examination of the Offspring in the Framingham Heart Study are summarized as follows: n=3,310, x̄ =200.3, and s=36.8. Is there statistical evidence of a difference in mean cholesterol levels in the Framingham Offspring?

Here we want to assess whether the sample mean of 200.3 in the Framingham sample is statistically significantly different from 203 (i.e., beyond what we would expect by chance). We will run the test using the five-step approach.

H 0 : μ= 203 H 1 : μ≠ 203                       α=0.05

The research hypothesis is that cholesterol levels are different in the Framingham Offspring, and therefore a two-tailed test is used.

  •   Step 3. Set up decision rule.  

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.960 or is Z > 1.960.

We reject H 0 because -4.22 ≤ -1. .960. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level in the Framingham Offspring is different from the national average of 203 reported in 2002.   Because we reject H 0 , we also approximate a p-value. Using the two-sided significance levels, p < 0.0001.  

Statistical Significance versus Clinical (Practical) Significance

This example raises an important concept of statistical versus clinical or practical significance. From a statistical standpoint, the total cholesterol levels in the Framingham sample are highly statistically significantly different from the national average with p < 0.0001 (i.e., there is less than a 0.01% chance that we are incorrectly rejecting the null hypothesis). However, the sample mean in the Framingham Offspring study is 200.3, less than 3 units different from the national mean of 203. The reason that the data are so highly statistically significant is due to the very large sample size. It is always important to assess both statistical and clinical significance of data. This is particularly relevant when the sample size is large. Is a 3 unit difference in total cholesterol a meaningful difference?  

Consider again the NCHS-reported mean total cholesterol level in 2002 for all adults of 203. Suppose a new drug is proposed to lower total cholesterol. A study is designed to evaluate the efficacy of the drug in lowering cholesterol.   Fifteen patients are enrolled in the study and asked to take the new drug for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows:   n=15, x̄ =195.9 and s=28.7. Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new drug for 6 weeks? We will run the test using the five-step approach. 

H 0 : μ= 203 H 1 : μ< 203                   α=0.05

  •  Step 2. Select the appropriate test statistic.  

Because the sample size is small (n<30) the appropriate test statistic is

This is a lower tailed test, using a t statistic and a 5% level of significance. In order to determine the critical value of t, we need degrees of freedom, df, defined as df=n-1. In this example df=15-1=14. The critical value for a lower tailed test with df=14 and a =0.05 is -2.145 and the decision rule is as follows:   Reject H 0 if t < -2.145.

We do not reject H 0 because -0.96 > -2.145. We do not have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower than the national mean in patients taking the new drug for 6 weeks. Again, because we failed to reject the null hypothesis we make a weaker concluding statement allowing for the possibility that we may have committed a Type II error (i.e., failed to reject H 0 when in fact the drug is efficacious).

Lightbulb icon signifyig an important idea

This example raises an important issue in terms of study design. In this example we assume in the null hypothesis that the mean cholesterol level is 203. This is taken to be the mean cholesterol level in patients without treatment. Is this an appropriate comparator? Alternative and potentially more efficient study designs to evaluate the effect of the new drug could involve two treatment groups, where one group receives the new drug and the other does not, or we could measure each patient's baseline or pre-treatment cholesterol level and then assess changes from baseline to 6 weeks post-treatment. These designs are also discussed here.

Video - Comparing a Sample Mean to Known Population Mean (8:20)

Link to transcript of the video

Tests with One Sample, Dichotomous Outcome

Hypothesis testing applications with a dichotomous outcome variable in a single population are also performed according to the five-step procedure. Similar to tests for means, a key component is setting up the null and research hypotheses. The objective is to compare the proportion of successes in a single population to a known proportion (p 0 ). That known proportion is generally derived from another study or report and is sometimes called a historical control. It is important in setting up the hypotheses in a one sample test that the proportion specified in the null hypothesis is a fair and reasonable comparator.    

In one sample tests for a dichotomous outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the sample proportion which is computed by taking the ratio of the number of successes to the sample size,

We then determine the appropriate test statistic (Step 2) for the hypothesis test. The formula for the test statistic is given below.

Test Statistic for Testing H 0 : p = p 0

if min(np 0 , n(1-p 0 )) > 5

The formula above is appropriate for large samples, defined when the smaller of np 0 and n(1-p 0 ) is at least 5. This is similar, but not identical, to the condition required for appropriate use of the confidence interval formula for a population proportion, i.e.,

Here we use the proportion specified in the null hypothesis as the true proportion of successes rather than the sample proportion. If we fail to satisfy the condition, then alternative procedures, called exact methods must be used to test the hypothesis about the population proportion.

Example:  

The NCHS report indicated that in 2002 the prevalence of cigarette smoking among American adults was 21.1%.  Data on prevalent smoking in n=3,536 participants who attended the seventh examination of the Offspring in the Framingham Heart Study indicated that 482/3,536 = 13.6% of the respondents were currently smoking at the time of the exam. Suppose we want to assess whether the prevalence of smoking is lower in the Framingham Offspring sample given the focus on cardiovascular health in that community. Is there evidence of a statistically lower prevalence of smoking in the Framingham Offspring study as compared to the prevalence among all Americans?

H 0 : p = 0.211 H 1 : p < 0.211                     α=0.05

We must first check that the sample size is adequate.   Specifically, we need to check min(np 0 , n(1-p 0 )) = min( 3,536(0.211), 3,536(1-0.211))=min(746, 2790)=746. The sample size is more than adequate so the following formula can be used:

This is a lower tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.645.

We reject H 0 because -10.93 < -1.645. We have statistically significant evidence at α=0.05 to show that the prevalence of smoking in the Framingham Offspring is lower than the prevalence nationally (21.1%). Here, p < 0.0001.  

The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data?

Calculate this on your own before checking the answer.

Video - Hypothesis Test for One Sample and a Dichotomous Outcome (3:55)

Tests with Two Independent Samples, Continuous Outcome

There are many applications where it is of interest to compare two independent groups with respect to their mean scores on a continuous outcome. Here we compare means between groups, but rather than generating an estimate of the difference, we will test whether the observed difference (increase, decrease or difference) is statistically significant or not. Remember, that hypothesis testing gives an assessment of statistical significance, whereas estimation gives an estimate of effect and both are important.

Here we discuss the comparison of means when the two comparison groups are independent or physically separate. The two groups might be determined by a particular attribute (e.g., sex, diagnosis of cardiovascular disease) or might be set up by the investigator (e.g., participants assigned to receive an experimental treatment or placebo). The first step in the analysis involves computing descriptive statistics on each of the two samples. Specifically, we compute the sample size, mean and standard deviation in each sample and we denote these summary statistics as follows:

for sample 1:

for sample 2:

The designation of sample 1 and sample 2 is arbitrary. In a clinical trial setting the convention is to call the treatment group 1 and the control group 2. However, when comparing men and women, for example, either group can be 1 or 2.  

In the two independent samples application with a continuous outcome, the parameter of interest in the test of hypothesis is the difference in population means, μ 1 -μ 2 . The null hypothesis is always that there is no difference between groups with respect to means, i.e.,

The null hypothesis can also be written as follows: H 0 : μ 1 = μ 2 . In the research hypothesis, an investigator can hypothesize that the first mean is larger than the second (H 1 : μ 1 > μ 2 ), that the first mean is smaller than the second (H 1 : μ 1 < μ 2 ), or that the means are different (H 1 : μ 1 ≠ μ 2 ). The three different alternatives represent upper-, lower-, and two-tailed tests, respectively. The following test statistics are used to test these hypotheses.

Test Statistics for Testing H 0 : μ 1 = μ 2

  • if n 1 > 30 and n 2 > 30
  • if n 1 < 30 or n 2 < 30

NOTE: The formulas above assume equal variability in the two populations (i.e., the population variances are equal, or s 1 2 = s 2 2 ). This means that the outcome is equally variable in each of the comparison populations. For analysis, we have samples from each of the comparison populations. If the sample variances are similar, then the assumption about variability in the populations is probably reasonable. As a guideline, if the ratio of the sample variances, s 1 2 /s 2 2 is between 0.5 and 2 (i.e., if one variance is no more than double the other), then the formulas above are appropriate. If the ratio of the sample variances is greater than 2 or less than 0.5 then alternative formulas must be used to account for the heterogeneity in variances.    

The test statistics include Sp, which is the pooled estimate of the common standard deviation (again assuming that the variances in the populations are similar) computed as the weighted average of the standard deviations in the samples as follows:

Because we are assuming equal variances between groups, we pool the information on variability (sample variances) to generate an estimate of the variability in the population. Note: Because Sp is a weighted average of the standard deviations in the sample, Sp will always be in between s 1 and s 2 .)

Data measured on n=3,539 participants who attended the seventh examination of the Offspring in the Framingham Heart Study are shown below.  

Suppose we now wish to assess whether there is a statistically significant difference in mean systolic blood pressures between men and women using a 5% level of significance.  

H 0 : μ 1 = μ 2

H 1 : μ 1 ≠ μ 2                       α=0.05

Because both samples are large ( > 30), we can use the Z test statistic as opposed to t. Note that statistical computing packages use t throughout. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The guideline suggests investigating the ratio of the sample variances, s 1 2 /s 2 2 . Suppose we call the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted when interpreting the results. The ratio of the sample variances is 17.5 2 /20.1 2 = 0.76, which falls between 0.5 and 2 suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is

We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common standard deviation.

Notice that the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer in value to the standard deviation in the women (20.1) as there were slightly more women in the sample.   Recall, Sp is a weight average of the standard deviations in the comparison groups, weighted by the respective sample sizes.  

Now the test statistic:

We reject H 0 because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to show that there is a difference in mean systolic blood pressures between men and women. The p-value is p < 0.010.  

Here again we find that there is a statistically significant difference in mean systolic blood pressures between men and women at p < 0.010. Notice that there is a very small difference in the sample means (128.2-126.5 = 1.7 units), but this difference is beyond what would be expected by chance. Is this a clinically meaningful difference? The large sample size in this example is driving the statistical significance. A 95% confidence interval for the difference in mean systolic blood pressures is: 1.7 + 1.26 or (0.44, 2.96). The confidence interval provides an assessment of the magnitude of the difference between means whereas the test of hypothesis and p-value provide an assessment of the statistical significance of the difference.  

Above we performed a study to evaluate a new drug designed to lower total cholesterol. The study involved one sample of patients, each patient took the new drug for 6 weeks and had their cholesterol measured. As a means of evaluating the efficacy of the new drug, the mean total cholesterol following 6 weeks of treatment was compared to the NCHS-reported mean total cholesterol level in 2002 for all adults of 203. At the end of the example, we discussed the appropriateness of the fixed comparator as well as an alternative study design to evaluate the effect of the new drug involving two treatment groups, where one group receives the new drug and the other does not. Here, we revisit the example with a concurrent or parallel control group, which is very typical in randomized controlled trials or clinical trials (refer to the EP713 module on Clinical Trials).  

A new drug is proposed to lower total cholesterol. A randomized controlled trial is designed to evaluate the efficacy of the medication in lowering cholesterol. Thirty participants are enrolled in the trial and are randomly assigned to receive either the new drug or a placebo. The participants do not know which treatment they are assigned. Each participant is asked to take the assigned treatment for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows.

Is there statistical evidence of a reduction in mean total cholesterol in patients taking the new drug for 6 weeks as compared to participants taking placebo? We will run the test using the five-step approach.

H 0 : μ 1 = μ 2 H 1 : μ 1 < μ 2                         α=0.05

Because both samples are small (< 30), we use the t test statistic. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The ratio of the sample variances, s 1 2 /s 2 2 =28.7 2 /30.3 2 = 0.90, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is:

This is a lower-tailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table (in More Resources to the right). In order to determine the critical value of t we need degrees of freedom, df, defined as df=n 1 +n 2 -2 = 15+15-2=28. The critical value for a lower tailed test with df=28 and α=0.05 is -1.701 and the decision rule is: Reject H 0 if t < -1.701.

Now the test statistic,

We reject H 0 because -2.92 < -1.701. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower in patients taking the new drug for 6 weeks as compared to patients taking placebo, p < 0.005.

The clinical trial in this example finds a statistically significant reduction in total cholesterol, whereas in the previous example where we had a historical control (as opposed to a parallel control group) we did not demonstrate efficacy of the new drug. Notice that the mean total cholesterol level in patients taking placebo is 217.4 which is very different from the mean cholesterol reported among all Americans in 2002 of 203 and used as the comparator in the prior example. The historical control value may not have been the most appropriate comparator as cholesterol levels have been increasing over time. In the next section, we present another design that can be used to assess the efficacy of the new drug.

Video - Comparison of Two Independent Samples With a Continuous Outcome (8:02)

Tests with Matched Samples, Continuous Outcome

In the previous section we compared two groups with respect to their mean scores on a continuous outcome. An alternative study design is to compare matched or paired samples. The two comparison groups are said to be dependent, and the data can arise from a single sample of participants where each participant is measured twice (possibly before and after an intervention) or from two samples that are matched on specific characteristics (e.g., siblings). When the samples are dependent, we focus on difference scores in each participant or between members of a pair and the test of hypothesis is based on the mean difference, μ d . The null hypothesis again reflects "no difference" and is stated as H 0 : μ d =0 . Note that there are some instances where it is of interest to test whether there is a difference of a particular magnitude (e.g., μ d =5) but in most instances the null hypothesis reflects no difference (i.e., μ d =0).  

The appropriate formula for the test of hypothesis depends on the sample size. The formulas are shown below and are identical to those we presented for estimating the mean of a single sample presented (e.g., when comparing against an external or historical control), except here we focus on difference scores.

Test Statistics for Testing H 0 : μ d =0

A new drug is proposed to lower total cholesterol and a study is designed to evaluate the efficacy of the drug in lowering cholesterol. Fifteen patients agree to participate in the study and each is asked to take the new drug for 6 weeks. However, before starting the treatment, each patient's total cholesterol level is measured. The initial measurement is a pre-treatment or baseline value. After taking the drug for 6 weeks, each patient's total cholesterol level is measured again and the data are shown below. The rightmost column contains difference scores for each patient, computed by subtracting the 6 week cholesterol level from the baseline level. The differences represent the reduction in total cholesterol over 4 weeks. (The differences could have been computed by subtracting the baseline total cholesterol level from the level measured at 6 weeks. The way in which the differences are computed does not affect the outcome of the analysis only the interpretation.)

Because the differences are computed by subtracting the cholesterols measured at 6 weeks from the baseline values, positive differences indicate reductions and negative differences indicate increases (e.g., participant 12 increases by 2 units over 6 weeks). The goal here is to test whether there is a statistically significant reduction in cholesterol. Because of the way in which we computed the differences, we want to look for an increase in the mean difference (i.e., a positive reduction). In order to conduct the test, we need to summarize the differences. In this sample, we have

The calculations are shown below.  

Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new medication for 6 weeks? We will run the test using the five-step approach.

H 0 : μ d = 0 H 1 : μ d > 0                 α=0.05

NOTE: If we had computed differences by subtracting the baseline level from the level measured at 6 weeks then negative differences would have reflected reductions and the research hypothesis would have been H 1 : μ d < 0. 

  • Step 2 . Select the appropriate test statistic.

This is an upper-tailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table at the right, with df=15-1=14. The critical value for an upper-tailed test with df=14 and α=0.05 is 2.145 and the decision rule is Reject H 0 if t > 2.145.

We now substitute the sample data into the formula for the test statistic identified in Step 2.

We reject H 0 because 4.61 > 2.145. We have statistically significant evidence at α=0.05 to show that there is a reduction in cholesterol levels over 6 weeks.  

Here we illustrate the use of a matched design to test the efficacy of a new drug to lower total cholesterol. We also considered a parallel design (randomized clinical trial) and a study using a historical comparator. It is extremely important to design studies that are best suited to detect a meaningful difference when one exists. There are often several alternatives and investigators work with biostatisticians to determine the best design for each application. It is worth noting that the matched design used here can be problematic in that observed differences may only reflect a "placebo" effect. All participants took the assigned medication, but is the observed reduction attributable to the medication or a result of these participation in a study.

Video - Hypothesis Testing With a Matched Sample and a Continuous Outcome (3:11)

Tests with Two Independent Samples, Dichotomous Outcome

There are several approaches that can be used to test hypotheses concerning two independent proportions. Here we present one approach - the chi-square test of independence is an alternative, equivalent, and perhaps more popular approach to the same analysis. Hypothesis testing with the chi-square test is addressed in the third module in this series: BS704_HypothesisTesting-ChiSquare.

In tests of hypothesis comparing proportions between two independent groups, one test is performed and results can be interpreted to apply to a risk difference, relative risk or odds ratio. As a reminder, the risk difference is computed by taking the difference in proportions between comparison groups, the risk ratio is computed by taking the ratio of proportions, and the odds ratio is computed by taking the ratio of the odds of success in the comparison groups. Because the null values for the risk difference, the risk ratio and the odds ratio are different, the hypotheses in tests of hypothesis look slightly different depending on which measure is used. When performing tests of hypothesis for the risk difference, relative risk or odds ratio, the convention is to label the exposed or treated group 1 and the unexposed or control group 2.      

For example, suppose a study is designed to assess whether there is a significant difference in proportions in two independent comparison groups. The test of interest is as follows:

H 0 : p 1 = p 2 versus H 1 : p 1 ≠ p 2 .  

The following are the hypothesis for testing for a difference in proportions using the risk difference, the risk ratio and the odds ratio. First, the hypotheses above are equivalent to the following:

  • For the risk difference, H 0 : p 1 - p 2 = 0 versus H 1 : p 1 - p 2 ≠ 0 which are, by definition, equal to H 0 : RD = 0 versus H 1 : RD ≠ 0.
  • If an investigator wants to focus on the risk ratio, the equivalent hypotheses are H 0 : RR = 1 versus H 1 : RR ≠ 1.
  • If the investigator wants to focus on the odds ratio, the equivalent hypotheses are H 0 : OR = 1 versus H 1 : OR ≠ 1.  

Suppose a test is performed to test H 0 : RD = 0 versus H 1 : RD ≠ 0 and the test rejects H 0 at α=0.05. Based on this test we can conclude that there is significant evidence, α=0.05, of a difference in proportions, significant evidence that the risk difference is not zero, significant evidence that the risk ratio and odds ratio are not one. The risk difference is analogous to the difference in means when the outcome is continuous. Here the parameter of interest is the difference in proportions in the population, RD = p 1 -p 2 and the null value for the risk difference is zero. In a test of hypothesis for the risk difference, the null hypothesis is always H 0 : RD = 0. This is equivalent to H 0 : RR = 1 and H 0 : OR = 1. In the research hypothesis, an investigator can hypothesize that the first proportion is larger than the second (H 1 : p 1 > p 2 , which is equivalent to H 1 : RD > 0, H 1 : RR > 1 and H 1 : OR > 1), that the first proportion is smaller than the second (H 1 : p 1 < p 2 , which is equivalent to H 1 : RD < 0, H 1 : RR < 1 and H 1 : OR < 1), or that the proportions are different (H 1 : p 1 ≠ p 2 , which is equivalent to H 1 : RD ≠ 0, H 1 : RR ≠ 1 and H 1 : OR ≠

1). The three different alternatives represent upper-, lower- and two-tailed tests, respectively.  

The formula for the test of hypothesis for the difference in proportions is given below.

Test Statistics for Testing H 0 : p 1 = p

                                     

The formula above is appropriate for large samples, defined as at least 5 successes (np > 5) and at least 5 failures (n(1-p > 5)) in each of the two samples. If there are fewer than 5 successes or failures in either comparison group, then alternative procedures, called exact methods must be used to estimate the difference in population proportions.

The following table summarizes data from n=3,799 participants who attended the fifth examination of the Offspring in the Framingham Heart Study. The outcome of interest is prevalent CVD and we want to test whether the prevalence of CVD is significantly higher in smokers as compared to non-smokers.

The prevalence of CVD (or proportion of participants with prevalent CVD) among non-smokers is 298/3,055 = 0.0975 and the prevalence of CVD among current smokers is 81/744 = 0.1089. Here smoking status defines the comparison groups and we will call the current smokers group 1 (exposed) and the non-smokers (unexposed) group 2. The test of hypothesis is conducted below using the five step approach.

H 0 : p 1 = p 2     H 1 : p 1 ≠ p 2                 α=0.05

  • Step 2.  Select the appropriate test statistic.  

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group. In this example, we have more than enough successes (cases of prevalent CVD) and failures (persons free of CVD) in each comparison group. The sample size is more than adequate so the following formula can be used:

Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes:

We now substitute to compute the test statistic.

  • Step 5. Conclusion.

We do not reject H 0 because -1.960 < 0.927 < 1.960. We do not have statistically significant evidence at α=0.05 to show that there is a difference in prevalent CVD between smokers and non-smokers.  

A 95% confidence interval for the difference in prevalent CVD (or risk difference) between smokers and non-smokers as 0.0114 + 0.0247, or between -0.0133 and 0.0361. Because the 95% confidence interval for the risk difference includes zero we again conclude that there is no statistically significant difference in prevalent CVD between smokers and non-smokers.    

Smoking has been shown over and over to be a risk factor for cardiovascular disease. What might explain the fact that we did not observe a statistically significant difference using data from the Framingham Heart Study? HINT: Here we consider prevalent CVD, would the results have been different if we considered incident CVD?

A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 0-10 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.

We now test whether there is a statistically significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using the five step approach.  

H 0 : p 1 = p 2     H 1 : p 1 ≠ p 2              α=0.05

Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group, i.e.,

In this example, we have min(50(0.46), 50(1-0.46), 50(0.22), 50(1-0.22)) = min(23, 27, 11, 39) = 11. The sample size is adequate so the following formula can be used

We reject H 0 because 2.526 > 1960. We have statistically significant evidence at a =0.05 to show that there is a difference in the proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever.

A 95% confidence interval for the difference in proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever is 0.24 + 0.18 or between 0.06 and 0.42. Because the 95% confidence interval does not include zero we concluded that there was a statistically significant difference in proportions which is consistent with the test of hypothesis result. 

Again, the procedures discussed here apply to applications where there are two independent comparison groups and a dichotomous outcome. There are other applications in which it is of interest to compare a dichotomous outcome in matched or paired samples. For example, in a clinical trial we might wish to test the effectiveness of a new antibiotic eye drop for the treatment of bacterial conjunctivitis. Participants use the new antibiotic eye drop in one eye and a comparator (placebo or active control treatment) in the other. The success of the treatment (yes/no) is recorded for each participant for each eye. Because the two assessments (success or failure) are paired, we cannot use the procedures discussed here. The appropriate test is called McNemar's test (sometimes called McNemar's test for dependent proportions).  

Vide0 - Hypothesis Testing With Two Independent Samples and a Dichotomous Outcome (2:55)

Here we presented hypothesis testing techniques for means and proportions in one and two sample situations. Tests of hypothesis involve several steps, including specifying the null and alternative or research hypothesis, selecting and computing an appropriate test statistic, setting up a decision rule and drawing a conclusion. There are many details to consider in hypothesis testing. The first is to determine the appropriate test. We discussed Z and t tests here for different applications. The appropriate test depends on the distribution of the outcome variable (continuous or dichotomous), the number of comparison groups (one, two) and whether the comparison groups are independent or dependent. The following table summarizes the different tests of hypothesis discussed here.

  • Continuous Outcome, One Sample: H0: μ = μ0
  • Continuous Outcome, Two Independent Samples: H0: μ1 = μ2
  • Continuous Outcome, Two Matched Samples: H0: μd = 0
  • Dichotomous Outcome, One Sample: H0: p = p 0
  • Dichotomous Outcome, Two Independent Samples: H0: p1 = p2, RD=0, RR=1, OR=1

Once the type of test is determined, the details of the test must be specified. Specifically, the null and alternative hypotheses must be clearly stated. The null hypothesis always reflects the "no change" or "no difference" situation. The alternative or research hypothesis reflects the investigator's belief. The investigator might hypothesize that a parameter (e.g., a mean, proportion, difference in means or proportions) will increase, will decrease or will be different under specific conditions (sometimes the conditions are different experimental conditions and other times the conditions are simply different groups of participants). Once the hypotheses are specified, data are collected and summarized. The appropriate test is then conducted according to the five step approach. If the test leads to rejection of the null hypothesis, an approximate p-value is computed to summarize the significance of the findings. When tests of hypothesis are conducted using statistical computing packages, exact p-values are computed. Because the statistical tables in this textbook are limited, we can only approximate p-values. If the test fails to reject the null hypothesis, then a weaker concluding statement is made for the following reason.

In hypothesis testing, there are two types of errors that can be committed. A Type I error occurs when a test incorrectly rejects the null hypothesis. This is referred to as a false positive result, and the probability that this occurs is equal to the level of significance, α. The investigator chooses the level of significance in Step 1, and purposely chooses a small value such as α=0.05 to control the probability of committing a Type I error. A Type II error occurs when a test fails to reject the null hypothesis when in fact it is false. The probability that this occurs is equal to β. Unfortunately, the investigator cannot specify β at the outset because it depends on several factors including the sample size (smaller samples have higher b), the level of significance (β decreases as a increases), and the difference in the parameter under the null and alternative hypothesis.    

We noted in several examples in this chapter, the relationship between confidence intervals and tests of hypothesis. The approaches are different, yet related. It is possible to draw a conclusion about statistical significance by examining a confidence interval. For example, if a 95% confidence interval does not contain the null value (e.g., zero when analyzing a mean difference or risk difference, one when analyzing relative risks or odds ratios), then one can conclude that a two-sided test of hypothesis would reject the null at α=0.05. It is important to note that the correspondence between a confidence interval and test of hypothesis relates to a two-sided test and that the confidence level corresponds to a specific level of significance (e.g., 95% to α=0.05, 90% to α=0.10 and so on). The exact significance of the test, the p-value, can only be determined using the hypothesis testing approach and the p-value provides an assessment of the strength of the evidence and not an estimate of the effect.

Answers to Selected Problems

Dental services problem - bottom of page 5.

  • Step 1: Set up hypotheses and determine the level of significance.

α=0.05

  • Step 2: Select the appropriate test statistic.

First, determine whether the sample size is adequate.

Therefore the sample size is adequate, and we can use the following formula:

  • Step 3: Set up the decision rule.

Reject H0 if Z is less than or equal to -1.96 or if Z is greater than or equal to 1.96.

  • Step 4: Compute the test statistic
  • Step 5: Conclusion.

We reject the null hypothesis because -6.15<-1.96. Therefore there is a statistically significant difference in the proportion of children in Boston using dental services compated to the national proportion.

Z Table. Z Score Table. Normal Distribution Table. Standard Normal Table.

What is Z-Test?

Z-Test is a statistical test which let’s us approximate the distribution of the test statistic under the null hypothesis using normal distribution .

Z-Test is a test statistic commonly used in hypothesis test when the sample data is large.For carrying out the Z-Test, population parameters such as mean, variance, and standard deviation should be known.

This test is widely used to determine whether the mean of the two samples are different when the variance is known. We make use of the Z score and the Z table for running the Z-Test.

Z-Test as Hypothesis Test

A test statistic is a random variable that we calculate from the sample data to determine whether to reject the null hypothesis. This random variable is used to calculate the P-value, which indicates how strong the evidence is against the null hypothesis. Z-Test is such a test statistic where we make use of the mean value and z score to determine the P-value. Z-Test compares the mean of two large samples taken from a population when the variance is known.

Z-Test is usually used to conduct a hypothesis test when the sample size is greater than 30. This is because of the central limit theorem where when the sample gets larger, the distributed data graph starts resembling a bell curve and is considered to be distributed normally. Since the Z-Test follows normal distribution under the null hypothesis, it is the most suitable test statistic for large sample data.

Why do we use a large sample for conducting a hypothesis test?

In a hypothesis test, we are trying to reject a null hypothesis with the evidence that we should collect from sample data which represents only a portion of the population. When the population has a large size, and the sample data is small, we will not be able to draw an accurate conclusion from the test to prove our null hypothesis is false. As sample data provide us a door to the entire population, it should be large enough for us to arrive at a significant inference. Hence a sufficiently large data should be considered for a hypothesis test especially if the population is huge.

How to Run a Z-Test

Z-Test can be considered as a test statistic for a hypothesis test to calculate the P-value. However, there are certain conditions that should be satisfied by the sample to run the Z-Test.

The conditions are as follows:

  • The sample size should be greater than 30.

This is already mentioned above. The size of the sample is an important factor in Z-Testing as the Z-Test follows a normal distribution and so should the data. If the same size is less than 30, it is recommended to use a t-test instead

  • All the data point should be independent and doesn’t affect each other.

Each element in the sample, when considered single should be independent and shouldn’t have a relationship with another element.

  • The data must be distributed normally.

This is ensured if the sample data is large.

  • The sample should be selected randomly from a population.

Each data in the population should have an equal chance to be selected as one of the sample data.

  • The sizes of the selected samples should be equal if at all possible.

When considering multiple sample data, ensuring that the size of each sample is the same for an accurate calculation of population parameters.

  • The standard deviation of the population is known.

The population parameter, standard deviation must be given to run a Z-Test as we cannot perform the calculation without knowing it. If it is not directly given, then it assumed that the variance of the sample data is equal to the variance of the entire population.

If the conditions are satisfied, the Z-Test can be successfully implemented.

Following are steps to run the Z-Test:

  • State the null hypothesis

The null hypothesis is a statement of no effect and it supports the data which is already given. It is generally represented as :

  • State the alternate hypothesis

The statement that we are trying to prove is the alternate hypothesis. It is represented as:

This is the representation of a bidirectional alternate hypothesis.

  • H 1 :µ > k

This is the representation of a one-directional alternate hypothesis that is represented in the right region of the graph.

  • H 1 :µ < k

This is the representation of a one-directional alternate hypothesis that is represented in the left region of the graph.

hypothesis mean z test

  • Choose an alpha level for the test.

Alpha level or significant level is the probability of rejecting the null hypothesis when it is true. It is represented by ( α ). An alpha level must be chosen wisely so as to avoid the Type I and Type II errors.

If we choose a large alpha value such as 10%, it is likely to reject a null hypothesis when it is true. There is a probability of 10% for us to reject the null hypothesis. This is an error known as the Type I error.

On the other hand, if we choose an alpha level as low as 1%, there is a chance to accept the null hypothesis even if it is false. That is we reject the alternate hypothesis to favor the null hypothesis. This is the Type II error.

Hence the alpha level should be chosen in such a way that the chance of making Type I or Type II error is minimal. For this reason, the alpha level is commonly selected as 5% which is proven best to avoid errors.

  • Determining the critical value of Z from the Z table.

The critical value is the point in the normal distribution graph that splits the graph into two regions: the acceptance region and the rejection regions. It can be also described as the extreme value for which a null hypothesis can be accepted. This critical value of Z can be found from the Z table .

  • Calculate the test statistic.

The sample data that we choose to test is converted into a single value. This is known as the test statistic. This value is compared to the null value. If the test statistic significantly differs from the null value, the null value is rejected.

  • Comparing the test statistic with the critical value.

Now, we have to determine whether the test statistic we have calculated comes under the acceptance region or the rejection region. For this, the test statistic is compared with the critical value to know whether we should accept or reject a null hypothesis.

Types of Z-Test

Z-Test can be used to run a hypothesis test for a single sample or to compare the mean of two samples. There are two common types of Z-Test

One-Sample Z-Test

This is the most basic type of hypothesis test that is widely used. For running an one-sample Z-Test, all we need to know is the mean and standard deviation of the population. We consider only a single sample for a one-sample Z-Test. One-sample Z-Test is used to test whether the population parameter is different from the hypothesized value i.e whether the mean of the population is equal to, less than or greater than the hypothesized value.

The equation for finding the value of Z is:

hypothesis mean z test

The following are the assumptions that are generally taken for a one-sampled Z-Test:

  • The sample size is equal to or greater than 30.
  • One normally distributed sample is considered with the standard deviation known.
  • The null hypothesis is that the population mean that is calculated from the sample is equal to the hypothetically determined population mean.

Two-Sample Z-Test

A two-sample Z-Test is used whenever there is a comparison between two independent samples. It is used to check whether the difference between the means is equal to zero or not. Suppose if we want to know whether men or women prefer to drive more in a city, we use a two-sample Z-Test as it is the comparison of two independent samples of men and women.

hypothesis mean z test

  • x 1 and x 2 represent the mean of the two samples.
  • µ 1 and µ 2 are the hypothesized mean values.
  • σ 1 and σ 2 are the standard deviations.
  • n 1 and n 2 are the sizes of the samples.

The following are the assumptions that are generally taken for a two-sample Z-Test:

  • Two independent, normally distributed samples are considered for the Z-Test with the standard deviation known.
  • Each sample is equal to or greater than 30.
  • The null hypothesis is stated that the population mean of the two samples taken does not differ.

Critical value

A critical value is a line that splits a normally distributed graph into two different sections. Namely the ‘Rejection region’ and ‘Acceptance region’. If your test value falls in the ‘Rejection region’, then the null hypothesis is rejected and if your test value falls in the ‘Accepted region’, then the null hypothesis is accepted.

hypothesis mean z test

Critical Value Vs Significant Value

Significant level, alpha is the probability of rejecting a null hypothesis when it is actually true. While the critical value is the extreme value up to which a null hypothesis is true. There migh come a confusion regarding both of these parameters.

Critical value is a value that lies in critical region. It is in fact the boundary value of the rejection region. Also, it is the value up to which the null hypothesis is true. Hence the critical value is considered to be the point at which the null hypothesis is true or is rejected.

Critical value gives a point of extremity whose probability is indicated by the significant level. Significant level is pre-selected for a hypothesis test and critical value is calculated from this Alpha value. Critical value is a point represented as Z score and Significant level is a probability.

Z-Test Vs T-Test

Z-Test are used when the sample size exceeds 30. As Z-Test follows normal distribution, large sample size can be taken for the Z-Test. Z-Test indicates the distance of a data point from the mean of the data set in terms of standard deviation. Also. this test can only be used if the standard deviation of the data set is known.

T-Test is based on T distribution in which the mean value is known and the variance could be calculated from the sample. T-Test is most preferred to know the difference between the statistical parameters of two samples as the standard deviation of the samples are not usually given in a two-sample test for running the Z-Test. Also, if the sample size is less than 30, T-Test is preferred.

  • Prompt Library
  • DS/AI Trends
  • Stats Tools
  • Interview Questions
  • Generative AI
  • Machine Learning
  • Deep Learning

Z-tests for Hypothesis testing: Formula & Examples

Different types of Z-test - One sample and two samples

Z-tests are statistical hypothesis testing techniques that are used to determine whether the null hypothesis relating to comparing sample means or proportions with that of population at a given significance level can be rejected or otherwise based on the z-statistics or z-score. As a data scientist , you must get a good understanding of the z-tests and its applications to test the hypothesis for your statistical models. In this blog post, we will discuss an overview of different types of z-tests and related concepts with the help of examples. You may want to check my post on hypothesis testing titled – Hypothesis testing explained with examples

Table of Contents

What are Z-tests & Z-statistics?

Z-tests can be defined as statistical hypothesis testing techniques that are used to quantify the hypothesis testing related to claim made about the population parameters such as mean and proportion. Z-test uses the sample data to test the hypothesis about the population parameters (mean or proportion). There are different types of Z-tests which are used to estimate the population mean or proportion, or, perform hypotheses testing related to samples’ means or proportions.

Different types of Z-tests 

There are following different types of Z-tests which are used to perform different types of hypothesis testing.  

Different types of Z-test - One sample and two samples

  • One-sample Z-test for means
  • Two-sample Z-test for means
  • One sample Z-test for proportion
  • Two sample Z-test for proportions

Four variables are involved in the Z-test for performing hypothesis testing for different scenarios. They are as follows:

  • An independent variable that is called the “sample” and assumed to be normally distributed;
  • A dependent variable that is known as the test statistic (Z) and calculated based on sample data
  • Different types of Z-test that can be used for performing hypothesis testing
  • A significance level or “alpha” is usually set at 0.05 but can take the values such as 0.01, 0.05, 0.1

When to use Z-test – Explained with examples

The following are different scenarios when Z-test can be used:

  • Compare the sample or a single group with that of the population with respect to the parameter, mean. This is called as one-sample Z-test for means. For example, whether the student of a particular school has been scoring marks in Mathematics which is statistically significant than the other schools. This can also be thought of as a hypothesis test to check whether the sample belongs to the population or otherwise.
  • Compare two groups with respect to the population parameter, mean. This is called as two-samples Z-test for means. For example, you want to compare class X students from different schools and determine if students of one school are better than others based on their score of Mathematics.
  • Compare hypothesized proportion of the population to that of population theoritical proportion. For example, whether the unemployment rate of a given state is different than the well-established rate for the ccountry
  • Compare the proportion of one population with the proportion of othe rproportion. For example, whether the efficacy rate of vaccination in two different population are statistically significant or otherwise.

Z-test Interview Questions 

Here is a list of a few interview questions you may expect in your data scientists interview:

  • What is Z-test?
  • What is Z-statistics or Z-score?
  • When to use Z-test vs other tests such as T-test or Chi-square test?
  • What is Z-distribution?
  • What is the difference between Z-distribution and T-distribution?
  • What is sampling distribution?
  • What are different types of Z-tests?
  • Explain different types of Z-tests with the help of real-world examples?
  • What’s the difference two samples Z-test for means and two-samples Z-test for proportions? Explain with one example each.
  • As data scientists, give some scenarios when you would like to use Z-test when building machine learning models?

Recent Posts

Ajitesh Kumar

  • OKRs vs KPIs vs KRAs: Differences and Examples - February 21, 2024
  • CEP vs Traditional Database Examples - February 2, 2024
  • Retrieval Augmented Generation (RAG) & LLM: Examples - February 1, 2024

Ajitesh Kumar

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Search for:
  • Excellence Awaits: IITs, NITs & IIITs Journey

ChatGPT Prompts (250+)

  • Generate Design Ideas for App
  • Expand Feature Set of App
  • Create a User Journey Map for App
  • Generate Visual Design Ideas for App
  • Generate a List of Competitors for App
  • OKRs vs KPIs vs KRAs: Differences and Examples
  • CEP vs Traditional Database Examples
  • Retrieval Augmented Generation (RAG) & LLM: Examples
  • Attention Mechanism in Transformers: Examples
  • NLP Tokenization in Machine Learning: Python Examples

Data Science / AI Trends

  • • Prepend any arxiv.org link with talk2 to load the paper into a responsive chat application
  • • Custom LLM and AI Agents (RAG) On Structured + Unstructured Data - AI Brain For Your Organization
  • • Guides, papers, lecture, notebooks and resources for prompt engineering
  • • Common tricks to make LLMs efficient and stable
  • • Machine learning in finance

Free Online Tools

  • Create Scatter Plots Online for your Excel Data
  • Histogram / Frequency Distribution Creation Tool
  • Online Pie Chart Maker Tool
  • Z-test vs T-test Decision Tool
  • Independent samples t-test calculator

Recent Comments

Very Nice Explaination. Thankyiu very much,

in your case E respresent Member or Oraganization which include on e or more peers?

Such a informative post. Keep it up

Thank you....for your support. you given a good solution for me.

Hi Ajitesh, I came across this article while doing research for my university project where I am meant Define a…

Statology

Statistics Made Easy

Two Sample Z-Test: Definition, Formula, and Example

A  two sample z-test is used to test whether two population means are equal.

This test assumes that the standard deviation of each population is known.

This tutorial explains the following:

  • The formula to perform a two sample z-test.
  • The assumptions of a two sample z-test.
  • An example of how to perform a two sample z-test.

Let’s jump in!

Two Sample Z-Test: Formula

A two sample z-test uses the following null and alternative hypotheses:

  • H 0 :  μ 1 = μ 2 (the two population means are equal)
  • H A :  μ 1 ≠ μ 2 (the two population means are not equal)

We use the following formula to calculate the z test statistic:

  • z = ( x 1 – x 2 ) / √ σ 1 2 /n 1 + σ 2 2 /n 2 )
  • x 1 , x 2 : sample means
  • σ 1 , σ 2 : population standard deviations
  • n 1 , n 2 : sample sizes

If the p-value that corresponds to the z test statistic is less than your chosen significance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis .

Two Sample Z-Test: Assumptions

For the results of a two sample z-test to be valid, the following assumptions should be met:

  • The data from each population are continuous (not discrete).
  • Each sample is a simple random sample from the population of interest.
  • The data in each population is approximately normally distributed .
  • The population standard deviations are known.

Two Sample Z-Test : Example

Suppose the IQ levels among individuals in two different cities are known to be normally distributed each with population standard deviations of 15.

A scientist wants to know if the mean IQ level between individuals in city A and city B are different, so she selects a simple random sample of  20 individuals from each city and records their IQ levels.

To test this, she will perform a two sample z-test at significance level α = 0.05 using the following steps:

Step 1: Gather the sample data.

Suppose she collects two simple random samples with the following information:

  •   x 1  (sample 1 mean IQ) = 100.65
  • n 1 (sample 1 size) = 20
  • x 2 (sample 2 mean IQ) = 108.8
  • n 2 (sample 2 size) = 20

Step 2: Define the hypotheses.

She will perform the two sample z-test with the following hypotheses:

Step 3: Calculate the z test statistic.

The z test statistic is calculated as:

  • z = (100.65-108.8) / √ 15 2 /20 + 15 2 /20)

Step 4: Calculate the p-value of the z test statistic.

According to the Z Score to P Value Calculator , the two-tailed p-value associated with z = -1.718 is 0.0858 .

Step 5: Draw a conclusion.

Since the p-value (0.0858) is not less than the significance level (.05), the scientist will fail to reject the null hypothesis.

There is not sufficient evidence to say that the mean IQ level is different between the two populations.

Note:  You can also perform this entire two sample z-test by using the Two Sample Z-Test Calculator .

Additional Resources

The following tutorials explain how to perform a two sample z-test using different statistical software:

How to Perform Z-Tests in Excel How to Perform Z-Tests in R How to Perform Z-Tests in Python

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Z-test for One Population Mean

Instructions: This calculator conducts a Z-test for one population mean (\(\mu\)), with known population standard deviation (\(\sigma\)). Please select the null and alternative hypotheses, type the hypothesized mean, the significance level, the sample mean, the population standard deviation, and the sample size, and the results of the z-test will be displayed for you:

hypothesis mean z test

How to Conduct a Z-Test for One Population Mean?

More about the z-test for one mean so you can better interpret the results obtained by this solver: A z-test for one mean is a hypothesis test that attempts to make a claim about the population mean (\(\mu\)).

The test has two non-overlapping hypotheses, the null and the alternative hypothesis. The null hypothesis is a statement about the population mean, under the assumption of no effect, and the alternative hypothesis is the complementary hypothesis to the null hypothesis. The main properties of a one sample z-test for one population mean are:

  • Depending on our knowledge about the "no effect" situation, the z-test can be two-tailed, left-tailed or right-tailed
  • The main principle of hypothesis testing is that the null hypothesis is rejected if the test statistic obtained is sufficiently unlikely under the assumption that the null hypothesis is true
  • The p-value is the probability of obtaining sample results as extreme or more extreme than the sample results obtained, under the assumption that the null hypothesis is true
  • In a hypothesis tests there are two types of errors. Type I error occurs when we reject a true null hypothesis, and the Type II error occurs when we fail to reject a false null hypothesis

Uses of this z-test calculator

What can you do with this z-test statistic calculator for hypothesis testing? The formula for a z-statistic is

The null hypothesis is rejected when the z-statistic lies on the rejection region, which is determined by the significance level (\(\alpha\)) and the type of tail (two-tailed, left-tailed or right-tailed).

What if the population standard deviation is not known?

It frequently happens that you don't actually know the population standard deviation, in which case you need to use a t-test for one mean calculator instead, which adjusts for that by using the sample standard deviation, by using a slightly different distribution (the t-distribution)

How to calculate p-value in the context of a z-test?

The answer depends on whether you are using a two-tailed, a left-tailed or a right-tailed test. Say you have the calculated z-statistic, \(Z_{obs}\).

  • For a two-tailed test, the p-value is computed as: \(p = \Pr( Z > |Z_{obs}|) \)
  • For a left-tailed test, the p-value is computed as: \(p = \Pr( Z < Z_{obs}) \)
  • For a right-tailed test, the p-value is computed as: \(p = \Pr( Z > Z_{obs}) \)

where \(Z\) has a standard normal distribution.

Other types of Z-calculators

In case that you need to compare two population means, when you know the corresponding population standard deviations, you need to use this z-test for two means with known population standard deviations instead.

Outlier Detection

Don't forget to detect outliers before running a z-test for one mean. It is important that outliers are detected and removed before conducting the test, but the results of the test statistics may be slanted.

Z-test for one mean

Example: Application of the Z-test calculator

Question : Assume that you want to test whether or not the population mean is 12.3. You collect a representative random sample of size n = 16, and you find that the sample mean is 11.3. Also, you know that the population is 2.3. Do the sample data provide enough evidence to reject the claim that the population mean is 12.3? Use a two-tailed test, with a significance level of 0.01.

The following information has been provided:

(1) Null and Alternative Hypotheses

The following null and alternative hypotheses need to be tested:

This corresponds to a two-tailed test, for which a z-test for one mean, with known population standard deviation will be used.

(2) Rejection Region

Based on the information provided, the significance level is \(\alpha = 0.01\), and the critical value for a two-tailed test is \(z_c = 2.58\).

The rejection region for this two-tailed test is \(R = \{z: |z| > 2.576\}\)

(3) Test Statistics

The z-statistic is computed as follows:

(4) Decision about the null hypothesis

Since it is observed that \(|z| = 1.739 \le z_c = 2.576\), it is then concluded that the null hypothesis is not rejected.

Using the P-value approach: The p-value is \(p = 0.082\), and since \(p = 0.082 \ge 0.01\), it is concluded that the null hypothesis is not rejected.

(5) Conclusion

It is concluded that the null hypothesis Ho is not rejected. Therefore, there is not enough evidence to claim that the population mean \(\mu\) is different than 12.3, at the \(\alpha = 0.01\) significance level.

Confidence Interval

The 99% confidence interval is \(9.819 < \mu < 12.781\).

Related Calculators

Descriptive Statistics Calculator of Grouped Data

log in to your account

Reset password.

  • Practice Mathematical Algorithm
  • Mathematical Algorithms
  • Pythagorean Triplet
  • Fibonacci Number
  • Euclidean Algorithm
  • LCM of Array
  • GCD of Array
  • Binomial Coefficient
  • Catalan Numbers
  • Sieve of Eratosthenes
  • Euler Totient Function
  • Modular Exponentiation
  • Modular Multiplicative Inverse
  • Stein's Algorithm
  • Juggler Sequence
  • Chinese Remainder Theorem
  • Quiz on Fibonacci Numbers
  • Solve Coding Problems
  • Machine Learning Mathematics

Linear Algebra and Matrix

  • Scalar and Vector
  • Python Program to Add Two Matrices
  • Python program to multiply two matrices
  • Vector Operations
  • Product of Vectors
  • Scalar Product of Vectors
  • Dot and Cross Products on Vectors
  • Transpose a matrix in Single line in Python
  • Transpose of a Matrix
  • Adjoint and Inverse of a Matrix
  • How to inverse a matrix using NumPy
  • Determinant of a Matrix
  • Program to find Normal and Trace of a matrix
  • Data Science | Solving Linear Equations
  • Data Science - Solving Linear Equations with Python
  • System of Linear Equations
  • System of Linear Equations in three variables using Cramer's Rule
  • Eigenvalues
  • Applications of Eigenvalues and Eigenvectors
  • How to compute the eigenvalues and right eigenvectors of a given square array using NumPY?

Statistics for Machine Learning

  • Descriptive Statistic
  • Measures of Central Tendency
  • Measures of Dispersion | Types, Formula and Examples
  • Mean, Variance and Standard Deviation
  • Calculate the average, variance and standard deviation in Python using NumPy
  • Random Variables
  • Difference between Parametric and Non-Parametric Methods
  • Probability Distribution
  • Confidence Interval
  • Mathematics | Covariance and Correlation
  • Program to find correlation coefficient
  • Robust Correlation
  • Normal Probability Plot
  • Quantile Quantile plots
  • True Error vs Sample Error
  • Bias-Variance Trade Off - Machine Learning
  • Understanding Hypothesis Testing
  • Paired T-Test - A Detailed Overview
  • P-value in Machine Learning
  • F-Test in Statistics
  • Residual Leverage Plot (Regression Diagnostic)
  • Difference between Null and Alternate Hypothesis
  • Mann and Whitney U test
  • Wilcoxon Signed Rank Test
  • Kruskal Wallis Test
  • Friedman Test
  • Mathematics | Probability

Probability and Probability Distributions

  • Mathematics - Law of Total Probability
  • Bayes's Theorem for Conditional Probability
  • Mathematics | Probability Distributions Set 1 (Uniform Distribution)
  • Mathematics | Probability Distributions Set 4 (Binomial Distribution)
  • Mathematics | Probability Distributions Set 5 (Poisson Distribution)
  • Uniform Distribution Formula
  • Mathematics | Probability Distributions Set 2 (Exponential Distribution)
  • Mathematics | Probability Distributions Set 3 (Normal Distribution)
  • Mathematics | Beta Distribution Model
  • Gamma Distribution Model in Mathematics
  • Chi-Square Test for Feature Selection - Mathematical Explanation
  • Student's t-distribution in Statistics
  • Python - Central Limit Theorem
  • Mathematics | Limits, Continuity and Differentiability
  • Implicit Differentiation

Calculus for Machine Learning

  • Engineering Mathematics - Partial Derivatives
  • Advanced Differentiation
  • How to find Gradient of a Function using Python?
  • Optimization techniques for Gradient Descent
  • Higher Order Derivatives
  • Taylor Series
  • Application of Derivative - Maxima and Minima | Mathematics
  • Absolute Minima and Maxima
  • Optimization for Data Science
  • Unconstrained Multivariate Optimization
  • Lagrange Multipliers
  • Lagrange's Interpolation
  • Linear Regression in Machine learning
  • Ordinary Least Squares (OLS) using statsmodels

Regression in Machine Learning

Different tests are used in statistics to compare distinct samples or groups and make conclusions about populations. These tests, also referred to as statistical tests, concentrate on examining the probability or possibility of acquiring the observed data under particular premises or hypotheses. They offer a framework for evaluating the evidence for or against a given hypothesis. 

A statistical test starts with the formulation of a null hypothesis (H0) and an alternative hypothesis (Ha). The alternative hypothesis proposes a particular link or effect, whereas the null hypothesis reflects the default assumption and often states no effect or no difference.

The p-value indicates the likelihood of observing the data or more extreme results assuming the null hypothesis is true. Researchers compare the calculated p-value to a predetermined significance level, often denoted as α, to make a decision regarding the null hypothesis. If the p-value is smaller than α, the results are considered statistically significant, leading to the rejection of the null hypothesis in favor of the alternative hypothesis.

The p-value is calculated using a variety of statistical tests, including the Z-test, T-test , Chi-squared test , ANOVA , Z-test , and F-test , among others. In this article, we will focus on explaining the Z-test.

What is Z-Test?

Z-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30).

Z-test can also be defined as a statistical method that is used to determine whether the distribution of the test statistics can be approximated using the normal distribution or not. It is the method to determine whether two sample means are approximately the same or different when their variance is known and the sample size is large (should be >= 30).

The Z-test compares the difference between the sample mean and the population means by considering the standard deviation of the sampling distribution. The resulting Z-score represents the number of standard deviations that the sample mean deviates from the population mean. This Z-Score is also known as Z-Statistics, and can be formulated as:

\text{Z-Score} = \frac{\bar{x}-\mu}{\sigma}

The average family annual income in India is 200k, with a standard deviation of 5k, and the average family annual income in Delhi is 300k.

Then Z-Score for Delhi will be.

\begin{aligned} \text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma} \\&=\frac{300-200}{5} \\&=20 \end{aligned}

This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).

When to Use Z-test:

  • The sample size should be greater than 30. Otherwise, we should use the t-test.
  • Samples should be drawn at random from the population.
  • The standard deviation of the population should be known.
  • Samples that are drawn from the population should be independent of each other.
  • The data should be normally distributed , however, for a large sample size, it is assumed to have a normal distribution because central limit theorem

Hypothesis Testing

A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis testing is a way to validate the claim of an experiment.

  • Null Hypothesis: The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null hypothesis. The null hypothesis is denoted by H 0 .
  • Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has a value that is different from the claimed value. It is denoted by H A .

Level of significance: It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝).

Steps to perform Z-test:

  • First, identify the null and alternate hypotheses.
  • Determine the level of significance (∝).
  • Find the critical value of z in the z-test using

Z  =  \frac{(\overline{X}- \mu)}{\left ( \sigma /\sqrt{n} \right )}

  • n: sample size.
  • Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis

Type of Z-test

  • Left-tailed Test: In this test, our region of rejection is located to the extreme left of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

hypothesis mean z test

  • Right-tailed Test: In this test, our region of rejection is located to the extreme right of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

hypothesis mean z test

  • Two-tailed test: In this test, our region of rejection is located to both extremes of the distribution. Here our null hypothesis is that the claimed value is equal to the mean population value.

hypothesis mean z test

Below is an example of performing the z-test:

Example One-Tailed Test:

 A school claimed that the students who study that are more intelligent than the average school. On calculating the IQ scores of 50 students, the average turns out to be 110. The mean of the population IQ is 100 and the standard deviation is 15. State whether the claim of the principal is right or not at a 5% significance level.

H_0 : \mu  = 100

  • Now, we look up to the z-table. For the value of ∝=0.05, the z-score for the right-tailed test is 1.645.
  • Here 4.71 >1.645, so we reject the null hypothesis. 
  • If the z-test statistics are less than the z-score, then we will not reject the null hypothesis.

Code Implementations

Two-sampled z-test:.

In this test, we have provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider u 1 and u 2 to be the population mean, and X 1 and X 2 to be the observed sample mean. Here, our null hypothesis could be like this:

H_{0} : \mu_{1} -\mu_{2} = 0

and alternative hypothesis

H_{1} :  \mu_{1} - \mu_{2} \ne 0

and the formula for calculating the z-test score:

Z = \frac{\left ( \overline{X_{1}} - \overline{X_{2}} \right ) - \left ( \mu_{1} - \mu_{2} \right )}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}}

There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination, the score of each student comes. Now we want to determine whether the online or offline classes are better.

Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10 Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12

Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.

Step 1: Null & Alternate Hypothesis

 \mu_1 -\mu_2 = 0

Step 2: Significance Label

\alpha = 0.05

Step 3: Z-Score

\begin{aligned} \text{Z-score} &= \frac{(x_1-x_2)-(\mu_1 -\mu_2)} {\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_1}}}  \\ &= \frac{(75-80)-0} {\sqrt{\frac{10^2}{50}+\frac{12^2}{60}}} \\ &= \frac{-5} {\sqrt{2+2.4}} \\ &= \frac{-5} {2.0976} \\&=-2.384 \end{aligned}

Step 4: Check to Critical Z-Score value in the Z-Table for apha/2 = 0.025

  •  Critical Z-Score = 1.96

Step 5: Compare with the absolute Z-Score value

  • absolute(Z-Score) > Critical Z-Score
  • Reject the null hypothesis. There is a significant difference between the online and offline classes.

Type 1 error and Type II error:

  • Type I error: Type 1 error has occurred when we reject the null hypothesis, even when the hypothesis is true. This error is denoted by alpha.
  • Type II error: Type II error occurred when we didn’t reject the null hypothesis, even when the hypothesis is false. This error is denoted by beta.

Please Login to comment...

  • Engineering Mathematics
  • Machine Learning
  • Mathematical
  • Node.js 21 is here: What’s new
  • Zoom: World’s Most Innovative Companies of 2024
  • 10 Best Skillshare Alternatives in 2024
  • 10 Best Task Management Apps for Android in 2024
  • 30 OOPs Interview Questions and Answers (2024)

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Interpret all statistics for Kruskal-Wallis Test

In this topic, null hypothesis and alternative hypothesis.

The sample size (N) is the total number of observations in each group.

Interpretation

The sample size affects the confidence interval and the power of the test.

Usually, a larger sample yields a narrower confidence interval. A larger sample size also gives the test more power to detect a difference. For more information, go to What is power? .

The median is the midpoint of the data set. This midpoint value is the point at which half of the observations are above the value and half of the observations are below the value. The median is determined by ranking the observations and finding the observation at the number [N + 1] / 2 in the ranked order. If your data contain an even number of observations, the median is the average value of the observations that are ranked at numbers N / 2 and [N / 2] + 1.

The sample median is an estimate of the population median of each group. The overall median is the median of all observations.

The mean rank is the average of the ranks for all observations within each sample. Minitab uses the mean rank to calculate the H-value, which is the test statistic for the Kruskal-Wallis test.

To calculate the mean rank, Minitab ranks the combined samples. Minitab assigns the smallest observation a rank of 1, the second smallest observation a rank of 2, and so on. If two or more observations are tied, Minitab assigns the average rank to each tied observation. Minitab calculates the mean rank for each sample.

When a group's mean rank is higher than the overall average rank, the observation values in that group tend to be higher than those of the other groups.

The z-value indicates how the average rank for each group compares to the average rank of all observations.

  • The higher the absolute value, the further a group's average rank is from the overall average rank.
  • A negative z-value indicates that a group's average rank is less than the overall average rank.
  • A positive z-value indicates that a group's average rank is greater than the overall average rank.

The degrees of freedom (DF) equals the number of groups in your data minus 1. Under the null hypothesis, chi-square distribution approximates the distribution of the test statistic, with the specified degrees of freedom. Minitab uses the chi-square distribution to estimate the p-value for this test.

H is the test statistic for the Kruskal-Wallis test. Under the null hypothesis, the chi-square distribution approximates the distribution of H. The approximation is reasonably accurate when no group has fewer than five observations.

Minitab uses the test statistic to calculate the p-value, which you use to make a decision about the statistical significance of the terms and the model. The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.

A sufficiently high test statistic indicates that at least one difference between the medians is statistically significant.

You can use the test statistic to determine whether to reject the null hypothesis. However, using the p-value of the test to make the same determination is usually more practical and convenient.

The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.

Use the p-value to determine whether any of the differences between the medians are statistically significant.

  • Minitab.com
  • License Portal
  • Cookie Settings

You are now leaving support.minitab.com.

Click Continue to proceed to:

IMAGES

  1. Z Test

    hypothesis mean z test

  2. Hypothesis Testing using Z-test Statistics

    hypothesis mean z test

  3. Z-test- definition, formula, examples, uses, z-test vs t-test

    hypothesis mean z test

  4. Z Test

    hypothesis mean z test

  5. Hypothesis testing: Z-Test

    hypothesis mean z test

  6. Hypothesis Testing A Level Maths Questions And Answers

    hypothesis mean z test

VIDEO

  1. Hypothesis Mean data

  2. April 10, 2023 Hypothesis Testing Process, Normality, One-mean z-test

  3. Hypothesis Testing about Mean Z Test

  4. Two sample hypothesis testing: T test and z test

  5. Hypothesis Test for population mean

  6. Hypothesis Test for One Population Mean (z-test)

COMMENTS

  1. Z Test: Uses, Formula & Examples

    Related posts: Null Hypothesis: Definition, Rejecting & Examples and Understanding Significance Levels. Two-Sample Z Test Hypotheses. Null hypothesis (H 0): Two population means are equal (µ 1 = µ 2).; Alternative hypothesis (H A): Two population means are not equal (µ 1 ≠ µ 2).; Again, when the p-value is less than or equal to your significance level, reject the null hypothesis.

  2. Z-Test for Statistical Hypothesis Testing Explained

    A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution. The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations' mean. Z-tests are the most common statistical tests ...

  3. Z-test Calculator

    This Z-test calculator is a tool that helps you perform a one-sample Z-test on the population's mean. Two forms of this test - a two-tailed Z-test and a one-tailed Z-tests - exist, and can be used depending on your needs. ... The critical regions depend on a significance level, α \alpha α, of the test, and on the alternative hypothesis.

  4. Z-test

    A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution.Z-test tests the mean of a distribution. For each significance level in the confidence interval, the Z-test has a single critical value (for example, 1.96 for 5% two tailed) which makes it more convenient than the Student's t-test whose ...

  5. Z Test

    The z test formula compares the z statistic with the z critical value to test whether there is a difference in the means of two populations. In hypothesis testing, the z critical value divides the distribution graph into the acceptance and the rejection regions.If the test statistic falls in the rejection region then the null hypothesis can be rejected otherwise it cannot be rejected.

  6. PDF The Z-test

    The Z-test January 9, 2021 Contents Example 1: (one tailed z-test) Example 2: (two tailed z-test) Questions Answers The z-test is a hypothesis test to determine if a single observed mean is signi cantly di erent (or greater or less than) the mean under the null hypothesis, hypwhen you know the standard deviation of the population.

  7. 6 Hypothesis Testing: the z-test

    Hence the name z-test. It's time to introduce the hypothesis test flow chart. It's pretty self explanatory, even if you're not familiar with all of these hypothesis tests. The z-test is (1) based on means, (2) with only one mean, and (3) where we know \(\sigma\), the standard deviation of the population. Here's how to find the z-test in ...

  8. 10 Chapter 10: Hypothesis Testing with Z

    Chapter 10: Hypothesis Testing with Z. This chapter lays out the basic logic and process of hypothesis testing using a z. We will perform a test statistics using z, we use the z formula from chapter 8 and data from a sample mean to make an inference about a population.

  9. Z Test: Definition & Two Proportion Z-Test

    This tests for a difference in proportions. A two proportion z-test allows you to compare two proportions to see if they are the same. The null hypothesis (H 0) for the test is that the proportions are the same.; The alternate hypothesis (H 1) is that the proportions are not the same.; Example question: let's say you're testing two flu drugs A and B. Drug A works on 41 people out of a ...

  10. Hypothesis Testing: Two-tailed z test for mean

    This tutorial explains the basics of hypothesis testing. It also shows how to conduct a two-tailed hypothesis z-test for a population mean.Intro to hypothesi...

  11. One Sample Z-Test: Definition, Formula, and Example

    A one sample z-test is used to test whether the mean of a population is less than, greater than, or equal to some specific value. This test assumes that the population standard deviation is known. This tutorial explains the following: The formula to perform a one sample z-test. The assumptions of a one sample z-test.

  12. Z-Test Definition: Its Uses in Statistics Simply ...

    Z-Test: A z-test is a statistical test used to determine whether two population means are different when the variances are known and the sample size is large. The test statistic is assumed to have ...

  13. Z-test

    Z-test. A Z-test is a type of statistical hypothesis test used to test the mean of a normally distributed test statistic. It tests whether there is a significant difference between an observed population mean and the population mean under the null hypothesis, H 0. A Z-test can only be used when the population variance is known (or can be ...

  14. Hypothesis Testing: Z-Scores

    The z statistic gives us the number of standard deviations that a sample mean deviates from the population mean in a standard normal distribution. The z statistic is calculated by taking the sample mean minus the population mean (defined in the null hypothesis), divided by the standard deviation, as shown in equation 2.

  15. PDF Hypothesis Testing with z Tests

    Critical Values: Test statistic values beyond which we will reject the null hypothesis (cutoffs) p levels (α): Probabilities used to determine the critical value 5. Calculate test statistic (e.g., z statistic) 6. Make a decision Statistically Significant: Instructs us to reject the null hypothesis because the pattern in the data differs from

  16. 10.1: The One-Sample z-test

    Constructing the Hypothesis Test. The first step in constructing a hypothesis test is to be clear about what the null and alternative hypotheses are. This isn't too hard to do. Our null hypothesis, H 0, is that the true population mean μ for psychology student grades is 67.5%; and our alternative hypothesis is that the population mean isn ...

  17. Z-Test

    Z-Test Formula. The z-test formula is as follows: Z = (x̅ - μ0) / (σ /√n) Here, x̅ is the sample mean; μ0 is the population mean; σ is the standard deviation; n is the sample size. Based on the Z-test result, the research derives the hypothesis conclusion. It can either be a null or an alternative.

  18. Hypothesis Testing for Means & Proportions

    Here the parameter of interest is the difference in proportions in the population, RD = p 1 -p 2 and the null value for the risk difference is zero. In a test of hypothesis for the risk difference, the null hypothesis is always H 0: RD = 0. This is equivalent to H 0: RR = 1 and H 0: OR = 1.

  19. Z Test

    Z-Test is a statistical test which let's us approximate the distribution of the test statistic under the null hypothesis using normal distribution. Z-Test is a test statistic commonly used in hypothesis test when the sample data is large.For carrying out the Z-Test, population parameters such as mean, variance, and standard deviation should ...

  20. Z-tests for Hypothesis testing: Formula & Examples

    Z-tests are statistical hypothesis testing techniques that are used to determine whether the null hypothesis relating to comparing sample means or proportions with that of population at a given significance level can be rejected or otherwise based on the z-statistics or z-score. As a data scientist, you must get a good understanding of the z-tests and its applications to test the hypothesis ...

  21. Two Sample Z-Test: Definition, Formula, and Example

    A two sample z-test is used to test whether two population means are equal. This test assumes that the standard deviation of each population is known. ... 0.05, and 0.01) then you can reject the null hypothesis. Two Sample Z-Test: Assumptions. For the results of a two sample z-test to be valid, the following assumptions should be met:

  22. Z-test for One Population Mean

    Solution: The following null and alternative hypotheses need to be tested: This corresponds to a two-tailed test, for which a z-test for one mean, with known population standard deviation will be used. Based on the information provided, the significance level is \alpha = 0.01 α = 0.01, and the critical value for a two-tailed test is z_c = 2.58 ...

  23. Z-test

    Calculate the z-test statistics. Below is the formula for calculating the z-test statistics. where,: mean of the sample.: mean of the population.: Standard deviation of the population. n: sample size. Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis; Type of Z-test

  24. Interpret all statistics for Kruskal-Wallis Test

    A hypothesis test uses sample data to determine whether to reject the null hypothesis. Null hypothesis The null hypothesis states that a population parameter (such as the mean, the standard deviation, and so on) is equal to a hypothesized value. The null hypothesis is often an initial claim that is based on previous analyses or specialized ...