Find Study Materials for

  • Business Studies
  • Combined Science
  • Computer Science
  • Engineering
  • English Literature
  • Environmental Science
  • Human Geography
  • Macroeconomics
  • Microeconomics
  • Social Studies
  • Browse all subjects
  • Read our Magazine

Create Study Materials

Hypothesis tests for the normal distribution can be conducted in a very similar way to binomial distribution , e xcept this time we switch our test statistic.  These tests are useful as again they help us test claims of normally distributed items.

Mockup Schule

Explore our app and discover over 50 million learning materials for free.

  • Normal Distribution Hypothesis Test
  • Explanations
  • StudySmarter AI
  • Textbook Solutions
  • Applied Mathematics
  • Decision Maths
  • Discrete Mathematics
  • Logic and Functions
  • Mechanics Maths
  • Probability and Statistics
  • Bayesian Statistics
  • Bias in Experiments
  • Binomial Distribution
  • Binomial Hypothesis Test
  • Biostatistics
  • Bivariate Data
  • Categorical Data Analysis
  • Categorical Variables
  • Causal Inference
  • Central Limit Theorem
  • Chi Square Test for Goodness of Fit
  • Chi Square Test for Homogeneity
  • Chi Square Test for Independence
  • Chi-Square Distribution
  • Cluster Analysis
  • Combining Random Variables
  • Comparing Data
  • Comparing Two Means Hypothesis Testing
  • Conditional Probability
  • Conducting A Study
  • Conducting a Survey
  • Conducting an Experiment
  • Confidence Interval for Population Mean
  • Confidence Interval for Population Proportion
  • Confidence Interval for Slope of Regression Line
  • Confidence Interval for the Difference of Two Means
  • Confidence Intervals
  • Correlation Math
  • Cox Regression
  • Cumulative Distribution Function
  • Cumulative Frequency
  • Data Analysis
  • Data Interpretation
  • Decision Theory
  • Degrees of Freedom
  • Discrete Random Variable
  • Discriminant Analysis
  • Distributions
  • Empirical Bayes Methods
  • Empirical Rule
  • Errors In Hypothesis Testing
  • Estimation Theory
  • Estimator Bias
  • Events (Probability)
  • Experimental Design
  • Factor Analysis
  • Frequency Polygons
  • Generalization and Conclusions
  • Geometric Distribution
  • Geostatistics
  • Hierarchical Modeling
  • Hypothesis Test for Correlation
  • Hypothesis Test for Regression Slope
  • Hypothesis Test of Two Population Proportions
  • Hypothesis Testing
  • Inference For Distributions Of Categorical Data
  • Inferences in Statistics
  • Item Response Theory
  • Kaplan-Meier Estimate
  • Kernel Density Estimation
  • Large Data Set
  • Lasso Regression
  • Latent Variable Models
  • Least Squares Linear Regression
  • Linear Interpolation
  • Linear Regression
  • Logistic Regression
  • Machine Learning
  • Mann-Whitney Test
  • Markov Chains
  • Mean and Variance of Poisson Distributions
  • Measures of Central Tendency
  • Methods of Data Collection
  • Mixed Models
  • Multilevel Modeling
  • Multivariate Analysis
  • Neyman-Pearson Lemma
  • Non-parametric Methods
  • Normal Distribution
  • Normal Distribution Percentile
  • Ordinal Regression
  • Paired T-Test
  • Parametric Methods
  • Path Analysis
  • Point Estimation
  • Poisson Regression
  • Principle Components Analysis
  • Probability
  • Probability Calculations
  • Probability Density Function
  • Probability Distribution
  • Probability Generating Function
  • Product Moment Correlation Coefficient
  • Quantile Regression
  • Quantitative Variables
  • Random Effects Model
  • Random Variables
  • Randomized Block Design
  • Regression Analysis
  • Residual Sum of Squares
  • Robust Statistics
  • Sample Mean
  • Sample Proportion
  • Sampling Distribution
  • Sampling Theory
  • Scatter Graphs
  • Sequential Analysis
  • Single Variable Data
  • Spearman's Rank Correlation
  • Spearman's Rank Correlation Coefficient
  • Standard Deviation
  • Standard Error
  • Standard Normal Distribution
  • Statistical Graphs
  • Statistical Inference
  • Statistical Measures
  • Stem and Leaf Graph
  • Stochastic Processes
  • Structural Equation Modeling
  • Sum of Independent Random Variables
  • Survey Bias
  • Survival Analysis
  • Survivor Function
  • T-distribution
  • The Power Function
  • Time Series Analysis
  • Transforming Random Variables
  • Tree Diagram
  • Two Categorical Variables
  • Two Quantitative Variables
  • Type I Error
  • Type II Error
  • Types of Data in Statistics
  • Variance for Binomial Distribution
  • Venn Diagrams
  • Wilcoxon Test
  • Zero-Inflated Models
  • Theoretical and Mathematical Physics

Lerne mit deinen Freunden und bleibe auf dem richtigen Kurs mit deinen persönlichen Lernstatistiken

Nie wieder prokastinieren mit unseren Lernerinnerungen.

Hypothesis tests for the normal distribution can be conducted in a very similar way to binomial distribution , e xcept this time we switch our test statistic. These tests are useful as again they help us test claims of normally distributed items.

How do we carry out a hypothesis test for normal distribution?

When we hypothesis test for the mean of a normal distribution we think about looking at the mean of a sample from a population.

So for a random sample of size n of a population, taken from the random variable \(X \sim N(\mu, \sigma^2)\) , the sample mean \(\bar{X}\) can be normally distributed by \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\) .

Let's look at an example.

The weight of crisps is each packet is normally distributed with a standard deviation of 2.5g.

The crisp company claims that the crisp packets have a mean weight of 28g. There were numerous complaints that each crisp packet weighs less than this. Therefore a trading inspector investigated this and found in a sample of 50 crisp packets, the mean weight was 27.2g.

Using a 5% significance level and stating the hypothesis, clearly test whether or not the evidence upholds the complaints.

This is an example of a one tailed test. Let's look at an example of a two tailed test.

A machine produces circular discs with a radius R, where R is normally distributed with a mean of 2cm and a standard deviation of 0.3cm.

The machine is serviced and after the service, a random sample of 40 discs is taken to see if the mean has changed from 2cm. The radius is still normally distributed with a standard deviation of 0.3 cm.

The mean is found to be 1.9cm.

Has the mean changed? Test this to a 5% significance level.

Step 5 may be confusing – do we carry out the calculation with \(P(\bar{X} \leq \bar{x})\) or \(P(\bar{X} \geq \bar{x})\)? As a general rule of thumb if the value is between 0 and the mean, then we use \(P(\bar{X} \leq \bar{x})\) . If it is greater than the mean then we use \(P(\bar{X} \geq \bar{x})\) .

How about finding critical values and critical regions?

This is the same idea as in binomial distribution . However, in normal distribution, a calculator can make our lives easier.

The distributions menu has an option called inverse normal.

Here, we enter the significance level (Area), the mean (\(\mu\) ) and the standard deviation (\(\sigma\) ).

The calculator will give us an answer. Let's have a look at an example below.

Wheels are made to measure for a bike. The diameter of the wheel is normally distributed with a mean of 40cm and a standard deviation of 5cm. Some people think that their wheels are too small. Find the critical value of this to a 5% significance level.

In our calculator, in the inverse normal function, we need to enter:

If we perform the inverse normal function we get 31.775732 .

So that is our critical value and our critical region is \(X \leq 31.775732\) .

Let's look at an example with two tails.

Normal distribution hypothesis test two tailed test studysmarter

Hypothesis Test for Normal Distribution - Key takeaways

  • When we hypothesis test for a normal distribution we are trying to see if the mean is different from the mean stated in the null hypothesis.
  • We use the sample mean which is \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\) .
  • In two-tailed tests we divide the significance level by two and test on both tails.
  • When finding critical values we use the calculator inverse normal function entering the area as the significance level.
  • For two-tailed tests we need to find two critical values on either end of the distribution.

Frequently Asked Questions about Normal Distribution Hypothesis Test

--> how do you test a hypothesis for a normal distribution, --> is hypothesis testing only for a normal distribution.

No, pretty much any distribution can be used when testing a hypothesis. The two distributions that you learn at A-Level are Normal and Binomial.

--> What statistical hypothesis can be tested in the means of a normal distribution?

We test whether or not the data can support the value of a mean being too low or too high.

What is our test statistic with normal distribution?

What calculator tool do we use to work backwards with normal distribution?

The Inverse Normal

A coach thinks his athletes will achieve less than 12 seconds in their 100 metre race. His assistant thinks they won't be this fast. If this claim was tested is this a one or two-tailed test?

One-tailed test

How do we find the critical region of a normal distribution?

By using the calculator inverse normal setting.

Flashcards

Learn with 4 Normal Distribution Hypothesis Test flashcards in the free StudySmarter app

Already have an account? Log in

of the users don't pass the Normal Distribution Hypothesis Test quiz! Will you pass the quiz?

How would you like to learn this content?

Free math cheat sheet!

Everything you need to know on . A perfect summary so you can easily remember everything.

Join over 22 million students in learning with our StudySmarter App

The first learning app that truly has everything you need to ace your exams in one place

  • Flashcards & Quizzes
  • AI Study Assistant
  • Study Planner
  • Smart Note-Taking

Join over 22 million students in learning with our StudySmarter App

Sign up to highlight and take notes. It’s 100% free.

This is still free to read, it's not a paywall.

You need to register to keep reading, create a free account to save this explanation..

Save explanations to your personalised space and access them anytime, anywhere!

By signing up, you agree to the Terms and Conditions and the Privacy Policy of StudySmarter.

Entdecke Lernmaterial in der StudySmarter-App

Google Popup

Privacy Overview

Treasure chest

Video Crash Courses

Junior Math

Math Essentials

Tutor-on-Demand

Encyclopedia

Digital Tools

How to Do Hypothesis Testing with Normal Distribution

Hypothesis tests compare a result against something you already believe is true. Let X 1 , X 2 , … ⁡ , X n be n independent random variables with equal expected value   μ and standard deviation  σ . Let X be the mean of these n random variables, so

The stochastic variable  X has an expected value  μ and a standard deviation  σ n . You want to perform a hypothesis test on this expected value. You have a null hypothesis  H 0 : μ = μ 0 and three possible alternative hypotheses: H a : μ < μ 0 , H a : μ > μ 0 or H a : μ ≠ μ 0 . The first two alternative hypotheses belong to what you call a one-sided test, while the latter is two-sided.

In hypothesis testing, you calculate using the alternative hypothesis in order to say something about the null hypothesis.

Hypothesis Testing (Normal Distribution)

Note! For two-sided testing, multiply the p -value by 2 before checking against the critical region.

As the production manager at the new soft drink factory, you are worried that the machines don’t fill the bottles to their proper capacity. Each bottle should be filled with 0 . 5 L soda, but random samples show that 48 soda bottles have an average of 0 . 4 8 L , with an empirical standard deviation of 0 . 1 . You are wondering if you need to recalibrate the machines so that they become more precise.

This is a classic case of hypothesis testing by normal distribution. You now follow the instructions above and select 1 0 % level of significance, since it is only a quantity of soda and not a case of life and death.

The alternative hypothesis in this case is that the bottles do not contain 0 . 5 L and that the machines are not precise enough. This thus becomes a two-sided hypothesis test and you must therefore remember to multiply the p -value by 2 before deciding whether the p -value is in the critical region. This is because the normal distribution is symmetric, so P ( X ≥ k ) = P ( X ≤ − k ) . Thus it is just as likely to observe an equally extremely high value as an equally extreme low:

so H 0 must be kept, and the machines are deemed to be fine as is.

Had the p -value been less than the level of significance, that would have meant that the calibration represented by the alternative hypothesis would be significantly better for the business.

White arrow pointing to the Left

Logo for UH Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Hypothesis Testing with One Sample

Distribution Needed for Hypothesis Testing

OpenStaxCollege

[latexpage]

Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student’s t -distribution . (Remember, use a Student’s t -distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.) We perform tests of a population proportion using a normal distribution (usually n is large or the sample size is large).

If you are testing a single population mean , the distribution for the test is for means :

\(\overline{X}~N\left({\mu }_{X},\frac{{\sigma }_{X}}{\sqrt{n}}\right)\) or \({t}_{df}\)

The population parameter is μ . The estimated value (point estimate) for μ is \(\overline{x}\), the sample mean.

If you are testing a single population proportion , the distribution for the test is for proportions or percentages:

\({P}^{\prime }~N\left(p,\sqrt{\frac{p\cdot q}{n}}\right)\)

The population parameter is p . The estimated value (point estimate) for p is p′ . p′ = \(\frac{x}{n}\) where x is the number of successes and n is the sample size.

Assumptions

When you perform a hypothesis test of a single population mean μ using a Student’s t -distribution (often called a t-test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed . You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a t-test will work even if the population is not approximately normally distributed).

When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z -test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.

When you perform a hypothesis test of a single population proportion p , you take a simple random sample from the population. You must meet the conditions for a binomial distribution which are: there are a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success p . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five ( np > 5 and nq > 5). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p and \(\sigma =\sqrt{\frac{pq}{n}}\).

Remember that q = 1 – p .

Chapter Review

In order for a hypothesis test’s results to be generalized to a population, certain requirements must be satisfied.

When testing for a single population mean:

  • A Student’s t -test should be used if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with an unknown standard deviation.
  • The normal test will work if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with a known standard deviation.

When testing a single population proportion use a normal test for a single population proportion if the data comes from a simple, random sample, fill the requirements for a binomial distribution, and the mean number of success and the mean number of failures satisfy the conditions: np > 5 and nq > n where n is the sample size, p is the probability of a success, and q is the probability of a failure.

Formula Review

If there is no given preconceived α , then use α = 0.05.

  • Single population mean, known population variance (or standard deviation): Normal test .
  • Single population mean, unknown population variance (or standard deviation): Student’s t -test .
  • Single population proportion: Normal test .
  • For a single population mean , we may use a normal distribution with the following mean and standard deviation. Means: \(\mu ={\mu }_{\overline{x}}\) and \({\sigma }_{\overline{x}}=\frac{{\sigma }_{x}}{\sqrt{n}}\)
  • A single population proportion , we may use a normal distribution with the following mean and standard deviation. Proportions: µ = p and \(\sigma =\sqrt{\frac{pq}{n}}\).

Which two distributions can you use for hypothesis testing for this chapter?

A normal distribution or a Student’s t -distribution

Which distribution do you use when you are testing a population mean and the standard deviation is known? Assume sample size is large.

Which distribution do you use when the standard deviation is not known and you are testing one population mean? Assume sample size is large.

Use a Student’s t -distribution

A population mean is 13. The sample mean is 12.8, and the sample standard deviation is two. The sample size is 20. What distribution should you use to perform a hypothesis test? Assume the underlying population is normal.

A population has a mean is 25 and a standard deviation of five. The sample mean is 24, and the sample size is 108. What distribution should you use to perform a hypothesis test?

a normal distribution for a single population mean

It is thought that 42% of respondents in a taste test would prefer Brand A . In a particular test of 100 people, 39% preferred Brand A . What distribution should you use to perform a hypothesis test?

You are performing a hypothesis test of a single population mean using a Student’s t -distribution. What must you assume about the distribution of the data?

It must be approximately normally distributed.

You are performing a hypothesis test of a single population mean using a Student’s t -distribution. The data are not from a simple random sample. Can you accurately perform the hypothesis test?

You are performing a hypothesis test of a single population proportion. What must be true about the quantities of np and nq ?

They must both be greater than five.

You are performing a hypothesis test of a single population proportion. You find out that np is less than five. What must you do to be able to perform a valid hypothesis test?

You are performing a hypothesis test of a single population proportion. The data come from which distribution?

binomial distribution

It is believed that Lake Tahoe Community College (LTCC) Intermediate Algebra students get less than seven hours of sleep per night, on average. A survey of 22 LTCC Intermediate Algebra students generated a mean of 7.24 hours with a standard deviation of 1.93 hours. At a level of significance of 5%, do LTCC Intermediate Algebra students get less than seven hours of sleep per night, on average? The distribution to be used for this test is \(\overline{X}\) ~ ________________

  • \(N\left(7.24,\frac{1.93}{\sqrt{22}}\right)\)
  • \(N\left(7.24,1.93\right)\)
  • It is continuous and assumes any real values.
  • The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at the apex than the normal distribution.
  • It approaches the standard normal distribution as n gets larger.
  • There is a “family” of t distributions: every representative of the family is completely defined by the number of degrees of freedom which is one less than the number of data items.

Distribution Needed for Hypothesis Testing Copyright © 2013 by OpenStaxCollege is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

6.1 The Standard Normal Distribution

The standard normal distribution is a normal distribution of standardized values called z -scores . A z -score is measured in units of the standard deviation. For example, if the mean of a normal distribution is five and the standard deviation is two, the value 11 is three standard deviations above (or to the right of) the mean. The calculation is as follows:

x = μ + ( z )( σ ) = 5 + (3)(2) = 11

The z -score is three.

The mean for the standard normal distribution is zero, and the standard deviation is one. The transformation z = x − μ σ x − μ σ produces the distribution Z ~ N (0, 1). The value x in the given equation comes from a normal distribution with mean μ and standard deviation σ .

If X is a normally distributed random variable and X ~ N(μ, σ) , then the z -score is:

The z -score tells you how many standard deviations the value x is above (to the right of) or below (to the left of) the mean, μ . Values of x that are larger than the mean have positive z -scores, and values of x that are smaller than the mean have negative z -scores. If x equals the mean, then x has a z -score of zero.

Example 6.1

Suppose X ~ N(5, 6) . This says that X is a normally distributed random variable with mean μ = 5 and standard deviation σ = 6. Suppose x = 17. Then:

This means that x = 17 is two standard deviations (2 σ ) above or to the right of the mean μ = 5.

Notice that: 5 + (2)(6) = 17 (The pattern is μ + zσ = x )

Now suppose x = 1. Then: z = x – μ σ x – μ σ = 1 – 5 6 1 – 5 6 = –0.67 (rounded to two decimal places)

This means that x = 1 is 0.67 standard deviations (–0.67 σ ) below or to the left of the mean μ = 5. Notice that: 5 + (–0.67)(6) is approximately equal to one (This has the pattern μ + (–0.67)σ = 1)

Summarizing, when z is positive, x is above or to the right of μ and when z is negative, x is to the left of or below μ . Or, when z is positive, x is greater than μ , and when z is negative x is less than μ .

What is the z -score of x , when x = 1 and X ~ N (12,3)?

Example 6.2

Some doctors believe that a person can lose five pounds, on the average, in a month by reducing their fat intake and by exercising consistently. Suppose weight loss has a normal distribution. Let X = the amount of weight lost (in pounds) by a person in a month. Use a standard deviation of two pounds. X ~ N (5, 2). Fill in the blanks.

a. Suppose a person lost ten pounds in a month. The z -score when x = 10 pounds is z = 2.5 (verify). This z -score tells you that x = 10 is ________ standard deviations to the ________ (right or left) of the mean _____ (What is the mean?).

a. This z -score tells you that x = 10 is 2.5 standard deviations to the right of the mean five .

b. Suppose a person gained three pounds (a negative weight loss). Then z = __________. This z -score tells you that x = –3 is ________ standard deviations to the __________ (right or left) of the mean.

b. z = –4 . This z -score tells you that x = –3 is four standard deviations to the left of the mean.

c. Suppose the random variables X and Y have the following normal distributions: X ~ N (5, 6) and Y ~ N (2, 1). If x = 17, then z = 2. (This was previously shown.) If y = 4, what is z ?

c. z = y − μ σ y − μ σ = 4 − 2 1 4 − 2 1 = 2 where µ = 2 and σ = 1.

The z -score for y = 4 is z = 2. This means that four is z = 2 standard deviations to the right of the mean. Therefore, x = 17 and y = 4 are both two (of their own ) standard deviations to the right of their respective means.

The z -score allows us to compare data that are scaled differently. To understand the concept, suppose X ~ N (5, 6) represents weight gains for one group of people who are trying to gain weight in a six week period and Y ~ N (2, 1) measures the same weight gain for a second group of people. A negative weight gain would be a weight loss. Since x = 17 and y = 4 are each two standard deviations to the right of their means, they represent the same, standardized weight gain relative to their means .

Fill in the blanks.

Jerome averages 16 points a game with a standard deviation of four points. X ~ N (16,4). Suppose Jerome scores ten points in a game. The z –score when x = 10 is –1.5. This score tells you that x = 10 is _____ standard deviations to the ______(right or left) of the mean______(What is the mean?).

The Empirical Rule If X is a random variable and has a normal distribution with mean µ and standard deviation σ , then the Empirical Rule states the following:

  • About 68% of the x values lie between –1 σ and +1 σ of the mean µ (within one standard deviation of the mean).
  • About 95% of the x values lie between –2 σ and +2 σ of the mean µ (within two standard deviations of the mean).
  • About 99.7% of the x values lie between –3 σ and +3 σ of the mean µ (within three standard deviations of the mean). Notice that almost all the x values lie within three standard deviations of the mean.
  • The z -scores for +1 σ and –1 σ are +1 and –1, respectively.
  • The z -scores for +2 σ and –2 σ are +2 and –2, respectively.
  • The z -scores for +3 σ and –3 σ are +3 and –3 respectively.

The empirical rule is also known as the 68-95-99.7 rule.

Example 6.3

The mean height of 15 to 18-year-old males from Chile from 2009 to 2010 was 170 cm with a standard deviation of 6.28 cm. Male heights are known to follow a normal distribution. Let X = the height of a 15 to 18-year-old male from Chile in 2009 to 2010. Then X ~ N (170, 6.28).

a. Suppose a 15 to 18-year-old male from Chile was 168 cm tall from 2009 to 2010. The z -score when x = 168 cm is z = _______. This z -score tells you that x = 168 is ________ standard deviations to the ________ (right or left) of the mean _____ (What is the mean?).

b. Suppose that the height of a 15 to 18-year-old male from Chile from 2009 to 2010 has a z -score of z = 1.27. What is the male’s height? The z -score ( z = 1.27) tells you that the male’s height is ________ standard deviations to the __________ (right or left) of the mean.

a. –0.32, 0.32, left, 170

b. 177.98 cm, 1.27, right

Use the information in Example 6.3 to answer the following questions.

  • Suppose a 15 to 18-year-old male from Chile was 176 cm tall from 2009 to 2010. The z -score when x = 176 cm is z = _______. This z -score tells you that x = 176 cm is ________ standard deviations to the ________ (right or left) of the mean _____ (What is the mean?).
  • Suppose that the height of a 15 to 18-year-old male from Chile from 2009 to 2010 has a z -score of z = –2. What is the male’s height? The z -score ( z = –2) tells you that the male’s height is ________ standard deviations to the __________ (right or left) of the mean.

Example 6.4

From 1984 to 1985, the mean height of 15 to 18-year-old males from Chile was 172.36 cm, and the standard deviation was 6.34 cm. Let Y = the height of 15 to 18-year-old males from 1984 to 1985. Then Y ~ N (172.36, 6.34).

Find the z -scores for x = 160.58 cm and y = 162.85 cm. Interpret each z -score. What can you say about x = 160.58 cm and y = 162.85 cm as they compare to their respective means and standard deviations?

The z -score for x = -160.58 is z = –1.5. The z -score for y = 162.85 is z = –1.5. Both x = 160.58 and y = 162.85 deviate the same number of standard deviations from their respective means and in the same direction.

In 2012, 1,664,479 students took the SAT exam. The distribution of scores in the verbal section of the SAT had a mean µ = 496 and a standard deviation σ = 114. Let X = a SAT exam verbal section score in 2012. Then X ~ N (496, 114).

Find the z -scores for x 1 = 325 and x 2 = 366.21. Interpret each z -score. What can you say about x 1 = 325 and x 2 = 366.21 as they compare to their respective means and standard deviations?

Example 6.5

Suppose x has a normal distribution with mean 50 and standard deviation 6.

  • About 68% of the x values lie within one standard deviation of the mean. Therefore, about 68% of the x values lie between –1 σ = (–1)(6) = –6 and 1 σ = (1)(6) = 6 of the mean 50. The values 50 – 6 = 44 and 50 + 6 = 56 are within one standard deviation from the mean 50. The z -scores are –1 and +1 for 44 and 56, respectively.
  • About 95% of the x values lie within two standard deviations of the mean. Therefore, about 95% of the x values lie between –2 σ = (–2)(6) = –12 and 2 σ = (2)(6) = 12. The values 50 – 12 = 38 and 50 + 12 = 62 are within two standard deviations from the mean 50. The z -scores are –2 and +2 for 38 and 62, respectively.
  • About 99.7% of the x values lie within three standard deviations of the mean. Therefore, about 99.7% of the x values lie between –3 σ = (–3)(6) = –18 and 3 σ = (3)(6) = 18 from the mean 50. The values 50 – 18 = 32 and 50 + 18 = 68 are within three standard deviations of the mean 50. The z -scores are –3 and +3 for 32 and 68, respectively.

Suppose X has a normal distribution with mean 25 and standard deviation five. Between what values of x do 68% of the values lie?

Example 6.6

From 1984 to 1985, the mean height of 15 to 18-year-old males from Chile was 172.36 cm, and the standard deviation was 6.34 cm. Let Y = the height of 15 to 18-year-old males in 1984 to 1985. Then Y ~ N (172.36, 6.34).

  • About 68% of the y values lie between what two values? These values are ________________. The z -scores are ________________, respectively.
  • About 95% of the y values lie between what two values? These values are ________________. The z -scores are ________________ respectively.
  • About 99.7% of the y values lie between what two values? These values are ________________. The z -scores are ________________, respectively.
  • About 68% of the values lie between 166.02 cm and 178.7 cm. The z -scores are –1 and 1.
  • About 95% of the values lie between 159.68 cm and 185.04 cm. The z -scores are –2 and 2.
  • About 99.7% of the values lie between 153.34 cm and 191.38 cm. The z -scores are –3 and 3.

The scores on a college entrance exam have an approximate normal distribution with mean, µ = 52 points and a standard deviation, σ = 11 points.

  • About 95% of the y values lie between what two values? These values are ________________. The z -scores are ________________, respectively.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Statistics 2e
  • Publication date: Dec 13, 2023
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-statistics-2e/pages/6-1-the-standard-normal-distribution

© Dec 6, 2023 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

  • 8. Hypothesis Testing

2. Tests in the Normal Model

Basic theory, the normal model.

The normal distribution is perhaps the most important distribution in the study of mathematical statistics, in part because of the central limit theorem . As a consequence of this theorem, a measured quantity that is subject to numerous small, random errors will have, at least approximately, a normal distribution. Such variables are ubiquitous in statistical experiments, in subjects varying from the physical and biological sciences to the social sciences.

So in this section, we assume that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a random sample from the normal distribution with mean \(\mu\) and standard deviation \(\sigma\). Our goal in this section is to to construct hypothesis tests for \(\mu\) and \(\sigma\); these are among of the most important special cases of hypothesis testing. This section parallels the section on estimation in the normal model in the chapter on set estimation , and in particular, the duality between interval estimation and hypothesis testing will play an important role. But first we need to review some basic facts that will be critical for our analysis.

Recall that the sample mean \( M \) and sample variance \( S^2 \) are \[ M = \frac{1}{n} \sum_{i=1}^n X_i, \quad S^2 = \frac{1}{n - 1} \sum_{i=1}^n (X_i - M)^2\]

From our study of point estimation , recall that \( M \) is an unbiased and consistent estimator of \( \mu \) while \( S^2 \) is an unbiased and consistent estimator of \( \sigma^2 \). From these basic statistics we can construct the test statistics that will be used to construct our hypothesis tests. The following results are special properties of samples form the normal distribution .

Define \[ Z = \frac{M - \mu}{\sigma \big/ \sqrt{n}}, \quad T = \frac{M - \mu}{S \big/ \sqrt{n}}, \quad V = \frac{n - 1}{\sigma^2} S^2 \]

  • \( Z \) has the standard normal distribution .
  • \( T \) has the student \( t \) distribution with \( n - 1 \) degrees of freedom.
  • \( V \) has the chi-square distribution with \( n - 1 \) degrees of freedom.
  • \( Z \) and \( V \) are independent.

It follows that each of these random variables is a pivot variable for \( (\mu, \sigma) \) since the distributions do not depend on the parameters, but the variables themselves functionally depend on one or both parameters. The pivot variables will lead to natural test statistics that can then be used to perform the hypothesis tests of the parameters. To construct our tests, we will need quantiles of these standard distributions. The quantiles can be computed using the quantile app or from most mathematical and statistical software packages. Here is the notation we will use:

Let \( p \in (0, 1) \) and \( k \in \N_+ \).

  • \( z(p) \) denotes the quantile of order \( p \) for the standard normal distribution.
  • \(t_k(p)\) denotes the quantile of order \( p \) for the student \( t \) distribution with \( k \) degrees of freedom.
  • \( \chi^2_k(p) \) denotes the quantile of order \( p \) for the chi-square distribution with \( k \) degrees of freedom

Since the standard normal and student \( t \) distributions are symmetric about 0, it follows that \( z(1 - p) = -z(p) \) and \( t_k(1 - p) = -t_k(p) \) for \( p \in (0, 1) \) and \( k \in \N_+ \). On the other hand, the chi-square distribution is not symmetric.

Tests for the Mean with Known Standard Deviation

For our first discussion, we assume that the distribution mean \( \mu \) is unknown but the standard deviation \( \sigma \) is known. This is not always an artificial assumption. There are often situations where \( \sigma \) is stable over time, and hence is at least approximately known, while \( \mu \) changes because of different treatments . Examples are given in the computational exercises below.

For a conjectured \( \mu_0 \in \R \), define the test statistic \[ Z = \frac{M - \mu_0}{\sigma \big/ \sqrt{n}} \]

  • If \( \mu = \mu_0 \) then \( Z \) has the standard normal distribution.
  • If \( \mu \ne \mu_0 \) then \( Z \) has the normal distribution with mean \( \frac{\mu - \mu_0}{\sigma / \sqrt{n}} \) and variance 1.

So in case (b), \( \frac{\mu - \mu_0}{\sigma / \sqrt{n}} \) can be viewed as a non-centrality parameter . The graph of the probability density function of \( Z \) is like that of the standard normal probability density function, but shifted to the right or left by the non-centrality parameter, depending on whether \( \mu \gt \mu_0 \) or \( \mu \lt \mu_0 \).

For \( \alpha \in (0, 1) \), each of the following tests has significance level \( \alpha \):

  • Reject \( H_0: \mu = \mu_0 \) versus \( H_1: \mu \ne \mu_0 \) if and only if \( Z \lt -z(1 - \alpha /2) \) or \( Z \gt z(1 - \alpha / 2) \) if and only if \( M \lt \mu_0 - z(1 - \alpha / 2) \frac{\sigma}{\sqrt{n}} \) or \( M \gt \mu_0 + z(1 - \alpha / 2) \frac{\sigma}{\sqrt{n}} \).
  • Reject \( H_0: \mu \le \mu_0 \) versus \( H_1: \mu \gt \mu_0 \) if and only if \( Z \gt z(1 - \alpha) \) if and only if \( M \gt \mu_0 + z(1 - \alpha) \frac{\sigma}{\sqrt{n}} \).
  • Reject \( H_0: \mu \ge \mu_0 \) versus \( H_1: \mu \lt \mu_0 \) if and only if \( Z \lt -z(1 - \alpha) \) if and only if \( M \lt \mu_0 - z(1 - \alpha) \frac{\sigma}{\sqrt{n}} \).

In part (a), \( H_0 \) is a simple hypothesis, and under \( H_0 \), \( Z \) has the standard normal distribution. So \( \alpha \) is probability of falsely rejecting \( H_0 \) by definition of the quantiles. In parts (b) and (c), \( Z \) has a non-central normal distribution under \( H_0 \) as discussed in . So if \( H_0 \) is true, the the maximum type 1 error probability \( \alpha \) occurs when \( \mu = \mu_0 \). The decision rules in terms of \( M \) are equivalent to the corresponding ones in terms of \( Z \) by simple algebra.

Part (a) is the standard two-sided test, while (b) is the right-tailed test and (c) is the left-tailed test. Note that in each case, the hypothesis test is the dual of the corresponding interval estimate constructed in the section on estimation in the normal model .

For each of the tests in , we fail to reject \(H_0\) at significance level \(\alpha\) if and only if \(\mu_0\) is in the corresponding \(1 - \alpha\) confidence interval, that is

  • \( M - z(1 - \alpha / 2) \frac{\sigma}{\sqrt{n}} \le \mu_0 \le M + z(1 - \alpha / 2) \frac{\sigma}{\sqrt{n}} \)
  • \( \mu_0 \le M + z(1 - \alpha) \frac{\sigma}{\sqrt{n}}\)
  • \( \mu_0 \ge M - z(1 - \alpha) \frac{\sigma}{\sqrt{n}}\)

This follows from . In each case, we start with the inequality that corresponds to not rejecting \( H_0 \) and solve for \( \mu_0 \).

The two-sided test in (a) corresponds to \( \alpha / 2 \) in each tail of the distribution of the test statistic \( Z \), under \( H_0 \). This set is said to be unbiased . But of course we can construct other biased tests by partitioning the confidence level \( \alpha \) between the left and right tails in a non-symmetric way.

For every \(\alpha, \, p \in (0, 1)\), the following test has significance level \(\alpha\): Reject \(H_0: \mu = \mu_0\) versus \(H_1: \mu \ne \mu_0\) if and only if \(Z \lt z(\alpha - p \alpha)\) or \(Z \ge z(1 - p \alpha)\).

  • \( p = \frac{1}{2} \) gives the symmetric, unbiased test.
  • \( p \downarrow 0 \) gives the left-tailed test.
  • \( p \uparrow 1 \) gives the right-tailed test.

As before \( H_0 \) is a simple hypothesis, and if \( H_0 \) is true, \( Z \) has the standard normal distribution. So the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. Parts (a)–(c) follow from properties of the standard normal quantile function.

The \(P\)-value of these test can be computed in terms of the standard normal distribution function \(\Phi\).

The \(P\)-values of the standard tests in are respectively

  • \( 2 \left[1 - \Phi\left(\left|Z\right|\right)\right]\)
  • \( 1 - \Phi(Z) \)
  • \( \Phi(Z) \)

Recall that the power function of a test of a parameter is the probability of rejecting the null hypothesis, as a function of the true value of the parameter. Our next series of results will explore the power functions of the tests in .

The power function of the general two-sided test in is given by \[ Q(\mu) = \Phi \left( z(\alpha - p \alpha) - \frac{\sqrt{n}}{\sigma} (\mu - \mu_0) \right) + \Phi \left( \frac{\sqrt{n}}{\sigma} (\mu - \mu_0) - z(1 - p \alpha) \right), \quad \mu \in \R \]

  • \(Q\) is decreasing on \((-\infty, m_0)\) and increasing on \((m_0, \infty)\) where \(m_0 = \mu_0 + \left[z(\alpha - p \alpha) + z(1 - p \alpha)\right] \frac{\sqrt{n}}{2 \sigma}\).
  • \(Q(\mu_0) = \alpha\).
  • \(Q(\mu) \to 1\) as \(\mu \uparrow \infty\) and \(Q(\mu) \to 1\) as \(\mu \downarrow -\infty\).
  • If \(p = \frac{1}{2}\) then \(Q\) is symmetric about \(\mu_0\) (and \( m_0 = \mu_0 \)).
  • As \(p\) increases, \(Q(\mu)\) increases if \(\mu \gt \mu_0\) and decreases if \(\mu \lt \mu_0\).

So by varying \( p \), we can make the test more powerful for some values of \( \mu \), but only at the expense of making the test less powerful for other values of \( \mu \).

The power function of the left-tailed test in is given by \[ Q(\mu) = \Phi \left( z(\alpha) + \frac{\sqrt{n}}{\sigma}(\mu - \mu_0) \right), \quad \mu \in \R \]

  • \(Q\) is increasing on \(\R\).
  • \(Q(\mu) \to 1\) as \(\mu \uparrow \infty\) and \(Q(\mu) \to 0\) as \(\mu \downarrow -\infty\).

The power function of the right-tailed test in , is given by \[ Q(\mu) = \Phi \left( z(\alpha) - \frac{\sqrt{n}}{\sigma}(\mu - \mu_0) \right), \quad \mu \in \R \]

  • \(Q\) is decreasing on \(\R\).
  • \(Q(\mu) \to 0\) as \(\mu \uparrow \infty\) and \(Q(\mu) \to 1\) as \(\mu \downarrow -\infty\).

For any of the three tests in above in , increasing the sample size \(n\) or decreasing the standard deviation \(\sigma\) results in a uniformly more powerful test.

In the mean test experiment , select the normal test statistic and select the normal sampling distribution with standard deviation \(\sigma = 2\) sample size \(n = 20\), and \(\mu_0 = 0\). Run the experiment 1000 times for several values of the true distribution mean \(\mu\). For each value of \(\mu\), note the distribution of the \(P\)-value.

In the mean estimate experiment , select the normal pivot variable and select the normal distribution with \(\mu = 0\) and standard deviation \(\sigma = 2\), confidence level \(1 - \alpha = 0.90\), and sample size \(n = 10\). For each of the three types of confidence intervals, run the experiment 20 times. State the corresponding hypotheses and significance level, and for each run, give the set of \(\mu_0\) for which the null hypothesis would be rejected.

In many cases, the first step is to design the experiment so that the significance level is \(\alpha\) and so that the test has a given power \(\beta\) for a given alternative \(\mu_1\).

For either of the one-sided tests in , the sample size \(n\) needed for a test with significance level \(\alpha\) and power \(\beta\) for the alternative \(\mu_1\) is \[ n = \left( \frac{\sigma \left[z(\beta) - z(\alpha)\right]}{\mu_1 - \mu_0} \right)^2 \]

This follows from setting the power function equal to \(\beta\) and solving for \(n\)

For the unbiased, two-sided test, the sample size \(n\) needed for a test with significance level \(\alpha\) and power \(\beta\) for the alternative \(\mu_1\) is approximately \[ n = \left( \frac{\sigma \left[z(\beta) - z(\alpha / 2)\right]}{\mu_1 - \mu_0} \right)^2 \]

In the power function for the two-sided test given in , we can neglect the first term if \(\mu_1 \lt \mu_0\) and neglect the second term if \(\mu_1 \gt \mu_0\).

Tests of the Mean with Unknown Standard Deviation

For our next discussion, we construct tests of \(\mu\) without requiring the assumption that \(\sigma\) is known. And in applications of course, \( \sigma \) is usually unknown.

For a conjectured \( \mu_0 \in \R \), define the test statistic \[ T = \frac{M - \mu_0}{S \big/ \sqrt{n}} \]

  • If \( \mu = \mu_0 \), the statistic \( T \) has the student \( t \) distribution with \( n - 1 \) degrees of freedom.
  • If \( \mu \ne \mu_0 \) then \( T \) has a non-central \( t \) distribution with \( n - 1 \) degrees of freedom and non-centrality parameter \( \frac{\mu - \mu_0}{\sigma / \sqrt{n}} \).

In case (b), the graph of the probability density function of \( T \) is much (but not exactly) the same as that of the ordinary \( t \) distribution with \( n - 1 \) degrees of freedom, but shifted to the right or left by the non-centrality parameter, depending on whether \( \mu \gt \mu_0 \) or \( \mu \lt \mu_0 \).

  • Reject \( H_0: \mu = \mu_0 \) versus \( H_1: \mu \ne \mu_0 \) if and only if \( T \lt -t_{n-1}(1 - \alpha /2) \) or \( T \gt t_{n-1}(1 - \alpha / 2) \) if and only if \( M \lt \mu_0 - t_{n-1}(1 - \alpha / 2) \frac{S}{\sqrt{n}} \) or \( T \gt \mu_0 + t_{n-1}(1 - \alpha / 2) \frac{S}{\sqrt{n}} \).
  • Reject \( H_0: \mu \le \mu_0 \) versus \( H_1: \mu \gt \mu_0 \) if and only if \( T \gt t_{n-1}(1 - \alpha) \) if and only if \( M \gt \mu_0 + t_{n-1}(1 - \alpha) \frac{S}{\sqrt{n}} \).
  • Reject \( H_0: \mu \ge \mu_0 \) versus \( H_1: \mu \lt \mu_0 \) if and only if \( T \lt -t_{n-1}(1 - \alpha) \) if and only if \( M \lt \mu_0 - t_{n-1}(1 - \alpha) \frac{S}{\sqrt{n}} \).

In part (a), \( T \) has the chi-square distribution with \( n - 1 \) degrees of freedom under \( H_0 \). So if \( H_0 \) is true, the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. In parts (b) and (c), \( T \) has a non-central \( t \) distribution with \( n - 1 \) degrees of freedom under \( H_0 \), as in . Hence if \( H_0 \) is true, the maximum type 1 error probability \( \alpha \) occurs when \( \mu = \mu_0 \). The decision rules in terms of \( M \) are equivalent to the corresponding ones in terms of \( T \) by simple algebra.

For each of the tests in , we fail to reject \(H_0\) at significance level \(\alpha\) if and only if \(\mu_0\) is in the corresponding \(1 - \alpha\) confidence interval.

  • \( M - t_{n-1}(1 - \alpha / 2) \frac{S}{\sqrt{n}} \le \mu_0 \le M + t_{n-1}(1 - \alpha / 2) \frac{S}{\sqrt{n}} \)
  • \( \mu_0 \le M + t_{n-1}(1 - \alpha) \frac{S}{\sqrt{n}}\)
  • \( \mu_0 \ge M - t_{n-1}(1 - \alpha) \frac{S}{\sqrt{n}}\)

This follows from . In each case, we start with the inequality that corresponds to not rejecting \( H_0 \) and then solve for \( \mu_0 \).

The two-sided test in (a) corresponds to \( \alpha / 2 \) in each tail of the distribution of the test statistic \( T \), under \( H_0 \). This set is said to be unbiased . But of course we can construct other biased tests by partitioning the confidence level \( \alpha \) between the left and right tails in a non-symmetric way.

For every \(\alpha, \, p \in (0, 1)\), the following test has significance level \(\alpha\): Reject \(H_0: \mu = \mu_0\) versus \(H_1: \mu \ne \mu_0\) if and only if \(T \lt t_{n-1}(\alpha - p \alpha)\) or \(T \ge t_{n-1}(1 - p \alpha)\) if and only if \( M \lt \mu_0 + t_{n-1}(\alpha - p \alpha) \frac{S}{\sqrt{n}} \) or \( M \gt \mu_0 + t_{n-1}(1 - p \alpha) \frac{S}{\sqrt{n}} \).

Once again, \( H_0 \) is a simple hypothesis, and under \( H_0 \) the test statistic \( T \) has the student \( t \) distribution with \( n - 1 \) degrees of freedom. So if \( H_0 \) is true, the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. Parts (a)–(c) follow from properties of the quantile function.

The \(P\)-value of these test can be computed in terms of the distribution function \(\Phi_{n-1}\) of the \(t\)-distribution with \(n - 1\) degrees of freedom.

  • \( 2 \left[1 - \Phi_{n-1}\left(\left|T\right|\right)\right]\)
  • \( 1 - \Phi_{n-1}(T) \)
  • \( \Phi_{n-1}(T) \)

In the mean test experiment , select the student test statistic and select the normal sampling distribution with standard deviation \(\sigma = 2\), sample size \(n = 20\), and \(\mu_0 = 1\). Run the experiment 1000 times for several values of the true distribution mean \(\mu\). For each value of \(\mu\), note the empirical distribution of \(\P\).

In the mean estimate experiment , select the student pivot variable and select the normal sampling distribution with mean 0 and standard deviation 2. Select confidence level 0.90 and sample size 10. For each of the three types of intervals, run the experiment 20 times. State the corresponding hypotheses and significance level, and for each run, give the set of \(\mu_0\) for which the null hypothesis would be rejected.

The power function for the \( t \) tests in can be computed explicitly in terms of the non-central \(t\) distribution function. Qualitatively, the graphs of the power functions are similar to the case when \(\sigma\) is known, given in (two-sided), (left-tailed), and (right-tailed).

If an upper bound \(\sigma_0\) on the standard deviation \(\sigma\) is known, then conservative estimates on the sample size needed for a given confidence level and a given margin of error can be obtained using the methods for the normal pivot variable, in for the two-sided case and for the one-sided cases.

Tests of the Standard Deviation

For our next discussion, we will construct hypothesis tests for the distribution standard deviation \( \sigma \). So our assumption is that \( \sigma \) is unknown, and of course almost always, \( \mu \) would be unknown as well.

For a conjectured value \( \sigma_0 \in (0, \infty)\), define the test statistic \[ V = \frac{n - 1}{\sigma_0^2} S^2 \]

  • If \( \sigma = \sigma_0 \), then \( V \) has the chi-square distribution with \( n - 1 \) degrees of freedom.
  • If \( \sigma \ne \sigma_0 \) then \( V \) has the gamma distribution with shape parameter \( (n - 1) / 2 \) and scale parameter \( 2 \sigma^2 \big/ \sigma_0^2 \).

Recall that the ordinary chi-square distribution with \( n - 1 \) degrees of freedom is the gamma distribution with shape parameter \( (n - 1) / 2 \) and scale parameter \( \frac{1}{2} \). So in case (b), the ordinary chi-square distribution is scaled by \( \sigma^2 \big/ \sigma_0^2 \). In particular, the scale factor is greater than 1 if \( \sigma \gt \sigma_0 \) and less than 1 if \( \sigma \lt \sigma_0 \).

For every \(\alpha \in (0, 1)\), the following test has significance level \(\alpha\):

  • Reject \(H_0: \sigma = \sigma_0\) versus \(H_1: \sigma \ne \sigma_0\) if and only if \(V \lt \chi_{n-1}^2(\alpha / 2)\) or \(V \gt \chi_{n-1}^2(1 - \alpha / 2)\) if and only if \( S^2 \lt \chi_{n-1}^2(\alpha / 2) \frac{\sigma_0^2}{n - 1} \) or \( S^2 \gt \chi_{n-1}^2(1 - \alpha / 2) \frac{\sigma_0^2}{n - 1} \)
  • Reject \(H_0: \sigma \ge \sigma_0\) versus \(H_1: \sigma \lt \sigma_0\) if and only if \(V \lt \chi_{n-1}^2(\alpha)\) if and only if \( S^2 \lt \chi_{n-1}^2(\alpha) \frac{\sigma_0^2}{n - 1} \)
  • Reject \(H_0: \sigma \le \sigma_0\) versus \(H_1: \sigma \gt \sigma_0\) if and only if \(V \gt \chi_{n-1}^2(1 - \alpha)\) if and only if \( S^2 \gt \chi_{n-1}^2(1 - \alpha) \frac{\sigma_0^2}{n - 1} \)

The logic is largely the same as with our other hypothesis test. In part (a), \( H_0 \) is a simple hypothesis, and under \( H_0 \), the test statistic \( V \) has the chi-square distribution with \( n - 1 \) degrees of freedom. So if \( H_0 \) is true, the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. In parts (b) and (c), \( V \) has the more general gamma distribution under \( H_0 \), as discussed in . If \( H_0 \) is true, the maximum type 1 error probability is \( \alpha \) and occurs when \( \sigma = \sigma_0 \).

Part (a) is the unbiased, two-sided test that corresponds to \( \alpha / 2 \) in each tail of the chi-square distribution of the test statistic \( V \), under \( H_0 \). Part (b) is the left-tailed test and part (c) is the right-tailed test. Once again, we have a duality between the hypothesis tests and the interval estimates constructed in the section on estimation in the normal model .

For each of the tests in , we fail to reject \(H_0\) at significance level \(\alpha\) if and only if \(\sigma_0^2\) is in the corresponding \(1 - \alpha\) confidence interval. That is

  • \( \frac{n - 1}{\chi_{n-1}^2(1 - \alpha / 2)} S^2 \le \sigma_0^2 \le \frac{n - 1}{\chi_{n-1}^2(\alpha / 2)} S^2 \)
  • \( \sigma_0^2 \le \frac{n - 1}{\chi_{n-1}^2(\alpha)} S^2 \)
  • \( \sigma_0^2 \ge \frac{n - 1}{\chi_{n-1}^2(1 - \alpha)} S^2 \)

This follows from . In each case, we start with the inequality that corresponds to not rejecting \( H_0 \) and then solve for \( \sigma_0^2 \).

As before, we can construct more general two-sided tests by partitioning the significance level \( \alpha \) between the left and right tails of the chi-square distribution in an arbitrary way.

For every \(\alpha, \, p \in (0, 1)\), the following test has significance level \(\alpha\): Reject \(H_0: \sigma = \sigma_0\) versus \(H_1: \sigma \ne \sigma_0\) if and only if \(V \le \chi_{n-1}^2(\alpha - p \alpha)\) or \(V \ge \chi_{n-1}^2(1 - p \alpha)\) if and only if \( S^2 \lt \chi_{n-1}^2(\alpha - p \alpha) \frac{\sigma_0^2}{n - 1} \) or \( S^2 \gt \chi_{n-1}^2(1 - p \alpha) \frac{\sigma_0^2}{n - 1} \).

  • \( p = \frac{1}{2} \) gives the equal-tail test.
  • \( p \downarrow 0 \) gives the left-tail test.
  • \( p \uparrow 1 \) gives the right-tail test.

As before, \( H_0 \) is a simple hypothesis, and under \( H_0 \) the test statistic \( V \) has the chi-square distribution with \( n - 1 \) degrees of freedom. So if \( H_0 \) is true, the probability of falsely rejecting \( H_0 \) is \( \alpha \) by definition of the quantiles. Parts (a)–(c) follow from properties of the quantile function.

Recall again that the power function of a test of a parameter is the probability of rejecting the null hypothesis, as a function of the true value of the parameter. The power functions of the tests for \( \sigma \) can be expressed in terms of the distribution function \( G_{n-1} \) of the chi-square distribution with \( n - 1 \) degrees of freedom.

The power function of the general two-sided test in is given by the following formula, and satisfies the given properties: \[ Q(\sigma) = 1 - G_{n-1} \left( \frac{\sigma_0^2}{\sigma^2} \chi_{n-1}^2(1 - p \, \alpha) \right) + G_{n-1} \left(\frac{\sigma_0^2}{\sigma^2} \chi_{n-1}^2(\alpha - p \, \alpha) \right)\]

  • \(Q\) is decreasing on \((-\infty, \sigma_0)\) and increasing on \((\sigma_0, \infty)\).
  • \(Q(\sigma_0) = \alpha\).
  • \(Q(\sigma) \to 1\) as \(\sigma \uparrow \infty\) and \(Q(\sigma) \to 1\) as \(\sigma \downarrow 0\).

The power function of the left-tailed test in is given by the following formula, and satisfies the given properties: \[ Q(\sigma) = 1 - G_{n-1} \left( \frac{\sigma_0^2}{\sigma^2} \chi_{n-1}^2(1 - \alpha) \right) \]

  • \(Q\) is increasing on \((0, \infty)\).
  • \(Q(\sigma) \to 1\) as \(\sigma \uparrow \infty\) and \(Q(\sigma) \to 0\) as \(\sigma \downarrow 0\).

The power function for the right-tailed test in is given by the following formula, and satisfies the given properties: \[ Q(\sigma) = G_{n-1} \left( \frac{\sigma_0^2}{\sigma^2} \chi_{n-1}^2(\alpha) \right) \]

  • \(Q\) is decreasing on \((0, \infty)\).
  • \(Q(\sigma_0) =\alpha\).
  • \(Q(\sigma) \to 0\) as \(\sigma \uparrow \infty)\) and \(Q(\sigma) \to 0\) as \(\sigma \uparrow \infty\) and as \(\sigma \downarrow 0\).

In the variance test experiment , select the normal distribution with mean 0, and select significance level 0.1, sample size 10, and test standard deviation 1.0. For various values of the true standard deviation, run the simulation 1000 times. Record the relative frequency of rejecting the null hypothesis and plot the empirical power curve.

  • Two-sided test
  • Left-tailed test
  • Right-tailed test

In the variance estimate experiment , select the normal distribution with mean 0 and standard deviation 2, and select confidence level 0.90 and sample size 10. Run the experiment 20 times. State the corresponding hypotheses and significance level, and for each run, give the set of test standard deviations for which the null hypothesis would be rejected.

  • Two-sided confidence interval
  • Confidence lower bound
  • Confidence upper bound

The primary assumption that we made is that the underlying sampling distribution is normal. Of course, in real statistical problems, we are unlikely to know much about the sampling distribution, let alone whether or not it is normal. Suppose in fact that the underlying distribution is not normal. When the sample size \(n\) is relatively large, the distribution of the sample mean will still be approximately normal by the central limit theorem , and thus our tests of the mean \(\mu\) should still be approximately valid. On the other hand, tests of the variance \(\sigma^2\) are less robust to deviations form the assumption of normality. The following exercises explore these ideas.

In the mean test experiment , select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various sample sizes, and values of \(\mu_0\), run the experiment 1000 times. For each configuration, note the empirical distribution of \(\P\).

In the mean test experiment , select the uniform distribution on \([0, 4]\). For the three different tests and for various sample sizes and values of \(\mu_0\), run the experiment 1000 times. For each configuration, note the empirical distribution of \(\P\).

How large \(n\) needs to be for the testing procedure to work well depends, of course, on the underlying distribution; the more this distribution deviates from normality, the larger \(n\) must be. Fortunately, convergence to normality in the central limit theorem is rapid and hence, as you observed in the exercises, we can get away with relatively small sample sizes (30 or more) in most cases.

In the variance test experiment , select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various significance levels, sample sizes, and values of \(\sigma_0\), run the experiment 1000 times. For each configuration, note the relative frequency of rejecting \(H_0\). When \(H_0\) is true, compare the relative frequency with the significance level.

In the variance test experiment , select the uniform distribution on \([0, 4]\). For the three different tests and for various significance levels, sample sizes, and values of \(\mu_0\), run the experiment 1000 times. For each configuration, note the relative frequency of rejecting \(H_0\). When \(H_0\) is true, compare the relative frequency with the significance level.

Computational Exercises

The length of a certain machined part is supposed to be 10 centimeters. In fact, due to imperfections in the manufacturing process, the actual length is a random variable. The standard deviation is due to inherent factors in the process, which remain fairly stable over time. From historical data, the standard deviation is known with a high degree of accuracy to be 0.3. The mean, on the other hand, may be set by adjusting various parameters in the process and hence may change to an unknown value fairly frequently. We are interested in testing \(H_0: \mu = 10\) versus \(H_1: \mu \ne 10\).

  • Suppose that a sample of 100 parts has mean 10.1. Perform the test at the 0.1 level of significance.
  • Compute the \(P\)-value for the data in (a).
  • Compute the power of the test in (a) at \(\mu = 10.05\).
  • Compute the approximate sample size needed for significance level 0.1 and power 0.8 when \(\mu = 10.05\).
  • Test statistic 3.33, critical values \(\pm 1.645\). Reject \(H_0\).
  • \(P = 0.0010\)
  • The power of the test at 10.05 is approximately 0.0509.
  • Sample size 223

A bag of potato chips of a certain brand has an advertised weight of 250 grams. Actually, the weight (in grams) is a random variable. Suppose that a sample of 75 bags has mean 248 and standard deviation 5. At the 0.05 significance level, perform the following tests:

  • \(H_0: \mu \ge 250\) versus \(H_1: \mu \lt 250\)
  • \(H_0: \sigma \ge 7\) versus \(H_1: \sigma \lt 7\)
  • Test statistic \(-3.464\), critical value \(-1.665\). Reject \(H_0\).
  • \(P \lt 0.0001\) so reject \(H_0\).

At a telemarketing firm, the length of a telephone solicitation (in seconds) is a random variable. A sample of 50 calls has mean 310 and standard deviation 25. At the 0.1 level of significance, can we conclude that

  • \(\mu \gt 300\)?
  • \(\sigma \gt 20\)?
  • Test statistic 2.828, critical value 1.2988. Reject \(H_0\).
  • \(P = 0.0071\) so reject \(H_0\).

At a certain farm the weight of a peach (in ounces) at harvest time is a random variable. A sample of 100 peaches has mean 8.2 and standard deviation 1.0. At the 0.01 level of significance, can we conclude that

  • \(\mu \gt 8\)?
  • \(\sigma \lt 1.5\)?
  • Test statistic 2.0, critical value 2.363. Fail to reject \(H_0\).

The hourly wage for a certain type of construction work is a random variable with standard deviation 1.25. For sample of 25 workers, the mean wage was $6.75. At the 0.01 level of significance, can we conclude that \(\mu \lt 7.00\)?

Test statistic \(-1\), critical value \(-2.328\). Fail to reject \(H_0\).

Data Analysis Exercises

Using Michelson's data , test to see if the velocity of light is greater than 730 (+299000) km/sec, at the 0.005 significance level.

Test statistic 15.49, critical value 2.6270. Reject \(H_0\).

Using Cavendish's data , test to see if the density of the earth is less than 5.5 times the density of water, at the 0.05 significance level .

Test statistic \(-1.269\), critical value \(-1.7017\). Fail to reject \(H_0\).

Using Short's data , test to see if the parallax of the sun differs from 9 seconds of a degree, at the 0.1 significance level.

Test statistic \(-3.730\), critical value \(\pm 1.6749\). Reject \(H_0\).

Using Fisher's iris data , perform the following tests, at the 0.1 level:

  • The mean petal length of Setosa irises differs from 15 mm.
  • The mean petal length of Verginica irises is greater than 52 mm.
  • The mean petal length of Versicolor irises is less than 44 mm.
  • Test statistic \(-1.563\), critical values \(\pm 1.672\). Fail to reject \(H_0\).
  • Test statistic 4.556, critical value 1.2988. Reject \(H_0\).
  • Test statistic \(-1.028\), critical value \(-1.2988\). Fail to Reject \(H_0\).

Module 9: Hypothesis Testing With One Sample

Distribution needed for hypothesis testing, learning outcomes.

  • Conduct and interpret hypothesis tests for a single population mean, population standard deviation known
  • Conduct and interpret hypothesis tests for a single population mean, population standard deviation unknown

Earlier in the course, we discussed sampling distributions.  Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student’s t- distribution . (Remember, use a Student’s t -distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.) We perform tests of a population proportion using a normal distribution (usually n is large or the sample size is large).

If you are testing a  single population mean , the distribution for the test is for means :

[latex]\displaystyle\overline{{X}}\text{~}{N}{\left(\mu_{{X}}\text{ , }\frac{{\sigma_{{X}}}}{\sqrt{{n}}}\right)}{\quad\text{or}\quad}{t}_{{{d}{f}}}[/latex]

The population parameter is [latex]\mu[/latex]. The estimated value (point estimate) for [latex]\mu[/latex] is [latex]\displaystyle\overline{{x}}[/latex], the sample mean.

If you are testing a  single population proportion , the distribution for the test is for proportions or percentages:

[latex]\displaystyle{P}^{\prime}\text{~}{N}{\left({p}\text{ , }\sqrt{{\frac{{{p}{q}}}{{n}}}}\right)}[/latex]

The population parameter is [latex]p[/latex]. The estimated value (point estimate) for [latex]p[/latex] is p′ . [latex]\displaystyle{p}\prime=\frac{{x}}{{n}}[/latex] where [latex]x[/latex] is the number of successes and [latex]n[/latex] is the sample size.

Assumptions

When you perform a  hypothesis test of a single population mean μ using a Student’s t -distribution (often called a t-test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed . You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a t-test will work even if the population is not approximately normally distributed).

When you perform a  hypothesis test of a single population mean μ using a normal distribution (often called a z -test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.

When you perform a  hypothesis test of a single population proportion p , you take a simple random sample from the population. You must meet the conditions for a binomial distribution which are as follows: there are a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success p . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np  and nq must both be greater than five ( np > 5 and nq > 5). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p and [latex]\displaystyle\sigma=\sqrt{{\frac{{{p}{q}}}{{n}}}}[/latex] . Remember that q = 1 – p .

Concept Review

In order for a hypothesis test’s results to be generalized to a population, certain requirements must be satisfied.

When testing for a single population mean:

  • A Student’s t -test should be used if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with an unknown standard deviation.
  • The normal test will work if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with a known standard deviation.

When testing a single population proportion use a normal test for a single population proportion if the data comes from a simple, random sample, fill the requirements for a binomial distribution, and the mean number of success and the mean number of failures satisfy the conditions:  np > 5 and nq > n where n is the sample size, p is the probability of a success, and q is the probability of a failure.

Formula Review

If there is no given preconceived  α , then use α = 0.05.

Types of Hypothesis Tests

  • Single population mean, known population variance (or standard deviation): Normal test .
  • Single population mean, unknown population variance (or standard deviation): Student’s t -test .
  • Single population proportion: Normal test .
  • For a single population mean , we may use a normal distribution with the following mean and standard deviation. Means: [latex]\displaystyle\mu=\mu_{{\overline{{x}}}}{\quad\text{and}\quad}\sigma_{{\overline{{x}}}}=\frac{{\sigma_{{x}}}}{\sqrt{{n}}}[/latex]
  • A single population proportion , we may use a normal distribution with the following mean and standard deviation. Proportions: [latex]\displaystyle\mu={p}{\quad\text{and}\quad}\sigma=\sqrt{{\frac{{{p}{q}}}{{n}}}}[/latex].
  • Distribution Needed for Hypothesis Testing. Provided by : OpenStax. Located at : . License : CC BY: Attribution
  • Introductory Statistics . Authored by : Barbara Illowski, Susan Dean. Provided by : Open Stax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution . License Terms : Download for free at http://cnx.org/contents/[email protected]

Introduction to the Normal Distribution (Bell Curve)

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Bell curve graphic depicting normal performance distribution outline diagram. Labeled educational expectation measurement or prediction percentage analysis vector illustration.

Properties of normal distribution

The normal distribution is a continuous probability distribution that is symmetrical on both sides of the mean, so the right side of the center is a mirror image of the left side.

The area under the normal distribution curve represents the probability and the total area under the curve sums to one.

Most of the continuous data values in a normal distribution tend to cluster around the mean, and the further a value is from the mean, the less likely it is to occur. The tails are asymptotic, which means that they approach but never quite meet the horizon (i.e., the x-axis).

For a perfectly normal distribution, the mean, median, and mode will be the same value, visually represented by the peak of the curve.

Features of a Normal Distribution (Bell Curve)

The normal distribution is often called the bell curve because the graph of its probability density looks like a bell. It is also known as called Gaussian distribution, after the German mathematician Carl Gauss who first described it.

Normal distribution Vs. Standard normal distribution?

A normal distribution is determined by two parameters the mean and the variance. A normal distribution with a mean of 0 and a standard deviation of 1 is called a standard normal distribution.

Gauss distribution. Standard normal distribution. Gaussian bell graph curve. Business and marketing concept. Math probability theory.

   Figure 1. A standard normal distribution (SND).

This is the distribution that is used to construct tables of the normal distribution .

Why is the normal distribution important?

The bell-shaped curve is a common feature of nature and psychology.

The normal distribution is the most important probability distribution in statistics because many continuous data in nature and psychology display this bell-shaped curve when compiled and graphed.

For example, if we randomly sampled 100 individuals, we would expect to see a normal distribution frequency curve for many continuous variables, such as IQ, height, weight, and blood pressure.

Parametric significance tests require a normal distribution of the sample’s data points

The most powerful (parametric) statistical tests psychologists use require data to be normally distributed. If the data does not resemble a bell curve, researchers may use a less powerful statistical test called non-parametric statistics.

Converting the raw scores of a normal distribution to z-scores

We can standardize a normal distribution’s values (raw scores) by converting them into z-scores .

This procedure allows researchers to determine the proportion of the values that fall within a specified number of standard deviations from the mean (i.e., calculate the empirical rule).

What is the empirical rule formula?

The empirical rule in statistics allows researchers to determine the proportion of values that fall within certain distances from the mean. The empirical rule is often referred to as the three-sigma rule or the 68-95-99.7 rule.

The Empirical Rule (68-95-99.7)

If the data values in a normal distribution are converted to standard score (z-score) in a standard normal distribution, the empirical rule describes the percentage of the data that fall within specific numbers of standard deviations (σ) from the mean (μ) for bell-shaped curves.

The empirical rule allows researchers to calculate the probability of randomly obtaining a score from a normal distribution.

68% of data falls within the first standard deviation from the mean. This means there is a 68% probability of randomly selecting a score between -1 and +1 standard deviations from the mean.

The Empirical Rule 68%

95% of the values fall within two standard deviations from the mean. This means there is a 95% probability of randomly selecting a score between -2 and +2 standard deviations from the mean.

The Empirical Rule 95%

99.7% of data will fall within three standard deviations from the mean. This means there is a 99.7% probability of randomly selecting a score between -3 and +3 standard deviations from the mean.

The Empirical Rule 99%

How to check data

Statistical software (such as SPSS) can be used to check if your dataset is normally distributed by calculating the three measures of central tendency. If the mean, median, and mode are very similar values, there is a good chance that the data follows a bell-shaped distribution (SPSS command here).

It is also advisable to use a frequency graph too, so you can check the visual shape of your data (If your chart is a histogram, you can add a distribution curve using SPSS: From the menus, choose: Elements > Show Distribution Curve).

Example of a Normal Distribution Curve Overlaid on a Histogram

Normal distributions become more apparent (i.e., perfect) the finer the level of measurement and the larger the sample from a population.

You can also calculate coefficients which tell us about the size of the distribution tails in relation to the bump in the middle of the bell curve. For example, Kolmogorov Smirnov and Shapiro-Wilk tests can be calculated using SPSS.

These tests compare your data to a normal distribution and provide a p-value, which, if significant (p < .05), indicates your data is different from a normal distribution (thus, on this occasion, we do not want a significant result and need a p -value higher than 0.05).

Test of Normality SPSS Output

Further Information

  • Deep Definition of the Normal Distribution (Kahn Academy)
  • Standard Normal Distribution and the Empirical Rule (Kahn Academy)
  • Statistics for Psychology Book Download

Is a normal distribution kurtosis 0 or 3?

A normal distribution has a kurtosis of 3. However, sometimes people use “excess kurtosis,” which subtracts 3 from the kurtosis of the distribution to compare it to a normal distribution.

In that case, the excess kurtosis of a normal distribution would be be 3 − 3 = 0.

So, the normal distribution has kurtosis of 3, but its excess kurtosis is 0.

Print Friendly, PDF & Email

Z-Test for Statistical Hypothesis Testing Explained

what is the hypothesis of normal distribution

The Z-test is a statistical hypothesis test used to determine where the distribution of the test statistic we are measuring, like the mean , is part of the normal distribution .

There are multiple types of Z-tests, however, we’ll focus on the easiest and most well known one, the one sample mean test. This is used to determine if the difference between the mean of a sample and the mean of a population is statistically significant.

What Is a Z-Test?

A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution.  

The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations’ mean.

Z-tests are the most common statistical tests conducted in fields such as healthcare and data science . Therefore, it’s an essential concept to understand.

Requirements for a Z-Test

In order to conduct a Z-test, your statistics need to meet a few requirements, including:

  • A Sample size that’s greater than 30. This is because we want to ensure our sample mean comes from a distribution that is normal. As stated by the c entral limit theorem , any distribution can be approximated as normally distributed if it contains more than 30 data points.
  • The standard deviation and mean of the population is known .
  • The sample data is collected/acquired randomly .

More on Data Science:   What Is Bootstrapping Statistics?

Z-Test Steps

There are four steps to complete a Z-test. Let’s examine each one.

4 Steps to a Z-Test

  • State the null hypothesis.
  • State the alternate hypothesis.
  • Choose your critical value.
  • Calculate your Z-test statistics. 

1. State the Null Hypothesis

The first step in a Z-test is to state the null hypothesis, H_0 . This what you believe to be true from the population, which could be the mean of the population, μ_0 :

Null hypothesis equation generated in LaTeX.

2. State the Alternate Hypothesis

Next, state the alternate hypothesis, H_1 . This is what you observe from your sample. If the sample mean is different from the population’s mean, then we say the mean is not equal to μ_0:

Alternate hypothesis equation generated in LaTeX.

3. Choose Your Critical Value

Then, choose your critical value, α , which determines whether you accept or reject the null hypothesis. Typically for a Z-test we would use a statistical significance of 5 percent which is z = +/- 1.96 standard deviations from the population’s mean in the normal distribution:

Z-test critical value plot.

This critical value is based on confidence intervals.

4. Calculate Your Z-Test Statistic

Compute the Z-test Statistic using the sample mean, μ_1 , the population mean, μ_0 , the number of data points in the sample, n and the population’s standard deviation, σ :

Z-test statistic equation generated in LaTeX.

If the test statistic is greater (or lower depending on the test we are conducting) than the critical value, then the alternate hypothesis is true because the sample’s mean is statistically significant enough from the population mean.

Another way to think about this is if the sample mean is so far away from the population mean, the alternate hypothesis has to be true or the sample is a complete anomaly.

More on Data Science: Basic Probability Theory and Statistics Terms to Know

Z-Test Example

Let’s go through an example to fully understand the one-sample mean Z-test.

A school says that its pupils are, on average, smarter than other schools. It takes a sample of 50 students whose average IQ measures to be 110. The population, or the rest of the schools, has an average IQ of 100 and standard deviation of 20. Is the school’s claim correct?

The null and alternate hypotheses are:

Null hypothesis and alternate hypothesis generated in LaTeX.

Where we are saying that our sample, the school, has a higher mean IQ than the population mean.

Now, this is what’s called a right-sided, one-tailed test as our sample mean is greater than the population’s mean. So, choosing a critical value of 5 percent, which equals a Z-score of 1.96 , we can only reject the null hypothesis if our Z-test statistic is greater than 1.96.

If the school claimed its students’ IQs were an average of 90, then we would use a left-tailed test, as shown in the figure above. We would then only reject the null hypothesis if our Z-test statistic is less than -1.96.

Computing our Z-test statistic, we see:

Z-test statistic equation generated in LaTeX.

Therefore, we have sufficient evidence to reject the null hypothesis, and the school’s claim is right.

Hope you enjoyed this article on Z-tests. In this post, we only addressed the most simple case, the one-sample mean test. However, there are other types of tests, but they all follow the same process just with some small nuances.  

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Great Companies Need Great People. That's Where We Come In.

  • Search Search Please fill out this field.

What Is a Normal Distribution?

Observations.

  • Uses in Finance

The Bottom Line

  • Fundamental Analysis

Normal Distribution: What It Is, Uses, and Formula

James Chen, CMT is an expert trader, investment adviser, and global market strategist.

what is the hypothesis of normal distribution

Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. The normal distribution appears as a " bell curve " when graphed.

Key Takeaways

  • The normal distribution is the proper term for a probability bell curve.
  • In a normal distribution, the mean is zero and the standard deviation is 1. It has zero skew and a kurtosis of 3.
  • Normal distributions are symmetrical, but not all symmetrical distributions are normal.

Investopedia / Lara Antal

Properties of Normal Distribution

The normal distribution is the most common type of distribution assumed in technical stock market analysis. The standard normal distribution has two parameters : the mean and the standard deviation. In a normal distribution, mean (average), median (midpoint), and mode (most frequent observation) are equal. These values represent the peak or highest point. The distribution then falls symmetrically around the mean, the width of which is defined by the standard deviation .

The normal distribution model is key to the Central Limit Theorem (CLT) which states that averages calculated from independent, identically distributed random variables have approximately normal distributions, regardless of the type of distribution from which the variables are sampled.

The normal distribution is one type of symmetrical distribution . Symmetrical distributions occur when a dividing line produces two mirror images. Not all symmetrical distributions are normal since some data could appear as two humps or a series of hills in addition to the bell curve that indicates a normal distribution.

The Empirical Rule

For all normal distributions, 68.2% of the observations will appear within plus or minus one standard deviation of the mean; 95.4% will fall within +/- two standard deviations; and 99.7% within +/- three standard deviations.

This fact is sometimes called the "empirical rule," a heuristic that describes where most of the data in a normal distribution will appear. Data falling outside three standard deviations ("3-sigma") would signify rare occurrences.

Investopedia / Sabrina Jiang

Skewness measures the degree of symmetry of a distribution. The normal distribution is symmetric and has a skewness of zero. If the distribution of a data set instead has a skewness less than zero, or negative skewness (left-skewness), then the left tail of the distribution is longer than the right tail; positive skewness (right-skewness) implies that the right tail of the distribution is longer than the left.

Kurtosis measures the thickness of the tail ends of a distribution to the tails of a distribution. The normal distribution has a kurtosis equal to 3.0. Distributions with larger kurtosis greater than 3.0 exhibit tail data exceeding the tails of the normal distribution (e.g., five or more standard deviations from the mean).

This excess kurtosis is known in statistics as leptokurtic , but is more colloquially known as "fat tails." The occurrence of fat tails in financial markets describes what is known as tail risk . Distributions with low kurtosis less than 3.0 ( platykurtic ) exhibit tails that are generally less extreme ("skinnier") than the tails of the normal distribution.

The normal distribution follows the following formula. Note that only the values of the mean (μ ) and standard deviation (σ) are necessary

  • x  = value of the variable or data being examined and f(x) the probability function
  • μ = the mean
  • σ = the standard deviation

How Normal Distribution Is Used in Finance

The assumption of a normal distribution is applied to asset prices and price action . Traders may plot price points to fit recent price action into a normal distribution. The further price action moves from the mean, in this case, the greater the likelihood that an asset is being over or undervalued. Traders can use the standard deviations to suggest potential trades. This type of trading is generally done on very short time frames as larger timescales make it much harder to pick entry and exit points.

Similarly, many statistical theories attempt to model asset prices and assume they follow a normal distribution. In reality, price distributions tend to have fat tails and, therefore, have kurtosis greater than three. Such assets have had price movements greater than three standard deviations beyond the mean more often than expected under the assumption of a normal distribution. Even if an asset has gone through a long period where it fits a normal distribution, there is no guarantee that the past performance truly informs the future.

Example of a Normal Distribution

Many naturally occurring phenomena appear to be normally distributed. For example, the average height of a human is roughly 175 cm (5' 9"), counting both males and females.

As the chart below shows, most people conform to that average. Taller and shorter people exist with decreasing frequency in the population. According to the empirical rule, 99.7% of all people will fall with +/- three standard deviations of the mean, or between 154 cm (5' 0") and 196 cm (6' 5"). Those taller and shorter than this would be rare (just 0.15% of the population each).

What Is Meant By the Normal Distribution?

The normal distribution describes a symmetrical plot of data around its mean value, where the width of the curve is defined by the standard deviation. It is visually depicted as the "bell curve."

Why Is the Normal Distribution Called "Normal?"

The normal distribution is technically known as the Gaussian distribution, however, it took on the terminology "normal" following scientific publications in the 19 th century showing that many natural phenomena appeared to "deviate normally" from the mean. This idea of "normal variability" was made popular as the "normal curve" by the naturalist Sir Francis Galton in his 1889 work, Natural Inheritance.

What Are the Limitations of the Normal Distribution in Finance?

Although normal distribution is a statistical concept, its applications in finance can be limited because financial phenomena—such as expected stock-market returns—do not fall neatly within a normal distribution. Prices tend to follow more of a log-normal distribution , right-skewed and with fatter tails. Therefore, relying too heavily on a bell curve when making predictions can lead to unreliable results. Although most analysts are well aware of this limitation, it is relatively difficult to overcome this shortcoming because it is often unclear which statistical distribution to use as an alternative.

Normal distribution, also known as the Gaussian distribution, is a probability distribution that appears as a "bell curve" when graphed. The normal distribution describes a symmetrical plot of data around its mean value, where the width of the curve is defined by the standard deviation.

Boston University. " The Central Limit Theorem ."

DePaul University. " NORMAL Distribution: Origin of the name ."

what is the hypothesis of normal distribution

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Normality test

One of the most common assumptions for statistical tests is that the data used are normally distributed. For example, if you want to run a t-test or an ANOVA , you must first test whether the data or variables are normally distributed.

The assumption of normal distribution is also important for linear regression analysis , but in this case it is important that the error made by the model is normally distributed, not the data itself.

Nonparametric tests

If the data are not normally distributed, the above procedures cannot be used and non-parametric tests must be used. Non-parametric tests do not assume that the data are normally distributed.

How is the normal distribution tested?

Normal distribution can be tested either analytically (statistical tests) or graphically. The most common analytical tests to check data for normal distribution are the:

  • Kolmogorov-Smirnov Test
  • Shapiro-Wilk Test
  • Anderson-Darling Test

For graphical verification, either a histogram or, better, the Q-Q plot is used. Q-Q stands for quantile-quantile plot, where the actually observed distribution is compared with the theoretically expected distribution.

Statistical tests for normal distribution

To test your data analytically for normal distribution, there are several test procedures, the best known being the Kolmogorov-Smirnov test, the Shapiro-Wilk test, and the Anderson Darling test.

Analytically test data for normal distribution

In all of these tests, you are testing the null hypothesis that your data are normally distributed. The null hypothesis is that the frequency distribution of your data is normally distributed. To reject or not reject the null hypothesis, all these tests give you a p-value . What matters is whether this p-value is less than or greater than 0.05.

Null hypothesis Test for normality

If the p-value is less than 0.05, this is interpreted as a significant deviation from the normal distribution and it can be assumed that the data are not normally distributed. If the p-value is greater than 0.05 and you want to be statistically clean, you cannot necessarily say that the frequency distribution is normal, you just cannot reject the null hypothesis.

In practice, a normal distribution is assumed for values greater than 0.05, although this is not entirely correct. Nevertheless, the graphical solution should always be considered.

Note: The Kolmogorov-Smirnov test and the Anderson-Darling test can also be used to test distributions other than the normal distribution.

Disadvantage of the analytical tests for normal distribution

Unfortunately, the analytical method has a major drawback, which is why more and more attention is being paid to graphical methods.

The problem is that the calculated p-value is affected by the size of the sample. Therefore, if you have a very small sample, your p-value may be much larger than 0.05, but if you have a very very large sample from the same population, your p-value may be smaller than 0.05.

Disadvantage of the analytical tests for normal distribution

If we assume that the distribution in the population deviates only slightly from the normal distribution, we will get a very large p-value with a very small sample and therefore assume that the data are normally distributed. However, if you take a larger sample, the p-value gets smaller and smaller, even though the samples are from the same population with the same distribution. With a very large sample, you can even get a p-value of less than 0.05, rejecting the null hypothesis of normal distribution.

To avoid this problem, graphical methods are increasingly being used.

Graphical test for normal distribution

If the normal distribution is tested graphically, one looks either at the histogram or even better the QQ plot.

If you want to check the normal distribution using a histogram, plot the normal distribution on the histogram of your data and check that the distribution curve of the data approximately matches the normal distribution curve.

Testing normality with histogram

A better way to do this is to use a quantile-quantile plot, or Q-Q plot for short. This compares the theoretical quantiles that the data should have if they were perfectly normal with the quantiles of the measured values.

Testing normality with QQ-Plot

If the data were perfectly normally distributed, all points would lie on the line. The further the data deviates from the line, the less normally distributed the data is.

In addition, DATAtab plots the 95% confidence interval. If all or almost all of the data fall within this interval, this is a very strong indication that the data are normally distributed. They are not normally distributed if, for example, they form an arc and are far from the line in some areas.

Test Normal distribution in DATAtab

When you test your data for normal distribution with DATAtab, you get the following evaluation, first the analytical test procedures clearly arranged in a table, then the graphical test procedures.

Test Normal distribution in DATAtab

If you want to test your data for normal distribution, simply copy your data into the table on DATAtab, click on descriptive statistics and then select the variable you want to test for normal distribution. Then, just click on Test Normal Distribution and you will get the results.

Furthermore, if you are calculating a hypothesis test with DATAtab, you can test the assumptions for each hypothesis test, if one of the assumptions is the normal distribution, then you will get the test for normal distribution in the same way.

Statistics made easy

  • many illustrative examples
  • ideal for exams and theses
  • statistics made easy on 301 pages
  • 4rd revised edition (February 2024)
  • Only 6.99 €

Datatab

"Super simple written"

"It could not be simpler"

"So many helpful examples"

Statistics Calculator

Cite DATAtab: DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. URL https://datatab.net

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.3: A Single Population Mean using the Normal Distribution

  • Last updated
  • Save as PDF
  • Page ID 20074

All hypotheses tests have the same basic steps:

  • The alternative hypothesis, \(H_{a}\), never has a symbol that contains an equal sign.
  • The alternative hypothesis, \(H_{a}\), tells you if the test is left, right, or two-tailed. It is the key to conducting the appropriate test.
  • In a hypothesis test problem, you may see words such as "the level of significance is 1%." The "1%" is the preconceived or preset \(\alpha\). The statistician setting up the hypothesis test selects the value of α to use before collecting the sample data. If no level of significance is given, a common standard to use is \(\alpha = 0.05\).
  • When you calculate the \(p\)-value and draw the picture, the \(p\)-value is the area in the left tail, the right tail, or split evenly between the two tails. For this reason, we call the hypothesis test left, right, or two tailed.
  • Never, ever, Accept the Null Hypothesis.
  • Thinking about the meaning of the \(p\)-value: A data analyst (and anyone else) should have more confidence that he made the correct decision to reject the null hypothesis with a smaller \(p\)-value (for example, 0.001 as opposed to 0.04) even if using the 0.05 level for alpha. Similarly, for a large p -value such as 0.4, as opposed to a \(p\)-value of 0.056 (\(\alpha = 0.05\) is less than either number), a data analyst should have more confidence that she made the correct decision in not rejecting the null hypothesis. This makes the data analyst use judgment rather than mindlessly applying rules.
  • Determine the conclusion : What does the decision mean in terms of the problem given?

Direction of Tail

Example \(\pageindex{1}\).

\(H_{0}: \mu \geq 5, H_{a}: \mu < 5\)

Test of a single population mean. \(H_{a}\) tells you the test is left-tailed. The picture of the \(p\)-value is as follows:

Normal distribution curve of a single population mean with a value of 5 on the x-axis and the p-value points to the area on the left tail of the curve.

Exercise \(\PageIndex{1}\)

\(H_{0}: \mu \geq 10, H_{a}: \mu < 10\)

Assume the \(p\)-value is 0.0935. What type of test is this? Draw the picture of the \(p\)-value.

left-tailed test

alt

Example \(\PageIndex{2}\)

\(H_{0}: \mu \leq 0.2, H_{a}: \mu > 0.2\)

This is a test of a single population proportion. \(H_{a}\) tells you the test is right-tailed . The picture of the p -value is as follows:

Normal distribution curve of a single population proportion with the value of 0.2 on the x-axis. The p-value points to the area on the right tail of the curve.

Exercise \(\PageIndex{2}\)

\(H_{0}: \mu \leq 1, H_{a}: \mu > 1\)

Assume the \(p\)-value is 0.1243. What type of test is this? Draw the picture of the \(p\)-value.

right-tailed test

alt

Example \(\PageIndex{3}\)

\(H_{0}: \mu = 50, H_{a}: \mu \neq 50\)

This is a test of a single population mean. \(H_{a}\) tells you the test is two-tailed . The picture of the \(p\)-value is as follows.

Normal distribution curve of a single population mean with a value of 50 on the x-axis. The p-value formulas, 1/2(p-value), for a two-tailed test is shown for the areas on the left and right tails of the curve.

Exercise \(\PageIndex{3}\)

\(H_{0}: \mu = 0.5, H_{a}: \mu \neq 0.5\)

Assume the p -value is 0.2564. What type of test is this? Draw the picture of the \(p\)-value.

two-tailed test

alt

Full Hypothesis Test Examples

Example \(\pageindex{4}\).

Jeffrey, as an eight-year old, established a mean time of 16.43 seconds for swimming the 25-yard freestyle, with a standard deviation of 0.8 seconds. His dad, Frank, thought that Jeffrey could swim the 25-yard freestyle faster using goggles. Frank bought Jeffrey a new pair of expensive goggles and timed Jeffrey for 15 25-yard freestyle swims. For the 15 swims, Jeffrey's mean time was 16 seconds. Frank thought that the goggles helped Jeffrey to swim faster than the 16.43 seconds. Conduct a hypothesis test using a preset α = 0.05. Assume that the swim times for the 25-yard freestyle are normal.

\(P\)-value Solution

Determine the hypothesis :

Since the problem is about a mean, this is a test of a single population mean.

For Jeffrey to swim faster, his time will be less than 16.43 seconds. So the claim will be that he can swim it in less time than 16.43 seconds.

\(H_{0}: \mu \geq 16.43\)

\(H_{a}: \mu < 16.43\) (claim)

The "\(<\)" in the alternative hypothesis tells you this is left-tailed.

Calculate the evidence :

Use the Standard Normal Distribution since the population standard deviation is given.

Calculate the test statistic using the same formula as a \(z\)-score using the Central Limit Theorem.

\[z=\frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\nonumber\]

\(\mu = 16.43\) comes from \(H_{0}\) and not the data. \(\sigma = 0.8\) and \(n = 15\). Which gives

\[z=\frac{16-16.43}{\frac{0.8}{\sqrt{15}}}=\frac{-0.43}{\frac{0.8}{3.87298}}=\frac{-0.43}{0.20656}=-2.0817\nonumber\]

Now calculate the p-value based on the test statistic found.

This is a left-tailed test, so use the Excel formula \(=\text{NORM.S.DIST}(z,\text{true})\).

In this case, we found \(z\), which is the test statistic, to be \(z=-2.0817\).

Use the Excel formula \(=\text{NORM.S.DIST}(-2.0817,\text{true})=0.0187\).

So the \(p\text{-value} = 0.0187\). This is the area to the left of the sample mean, which is given as 16.

Make a decision:

Interpretation of the \(p-\text{value}\): If \(H_{0}\) is true, there is a 0.0187 probability (1.87%) that Jeffrey's mean time to swim the 25-yard freestyle is 16 seconds or less. Because a 1.87% chance is small, the mean time of 16 seconds or less is unlikely to have happened randomly. It is a rare event.

Normal distribution curve for the average time to swim the 25-yard freestyle with values 16, as the sample mean, and 16.43 on the x-axis. A vertical upward line extends from 16 on the x-axis to the curve. An arrow points to the left tail of the curve.

\(\mu = 16.43\) comes from \(H_{0}\). Our assumption gives \(\mu = 16.43\).

\(\alpha\) is the minimum area that could be considered to make our result significant.

Compare \(\alpha\) and the \(p\text{-value}\)

  • If \(p\text{-value}\) is less than the \(\alpha\) then we will Reject \(H_0\).
  • If \(\alpha\) is less than the \(p\text{-value}\) then we will Fail to Reject \(H_0\).

\(\alpha = 0.05\) and \(p\text{-value} = 0.0187\), so \(p\text{-value}<\alpha\)

Since \(p\text{-value}<\alpha\), reject \(H_{0}\).

Conclusion:

This means that you reject \(\mu \geq 16.43\).

There is sufficient evidence to support the claim that Jeffrey's mean swim time for the 25-yard freestyle is less than 16.43 seconds.

Critical Value Solution

Determine the hypothesis (Same as the \(P\)-value solution) :

Calculate the critical value. Use the Standard Normal Distribution, Critical Value, Right-tail Excel formula: \(=\text{NORM.S.INV}(\alpha)\).

In this problem, the \(\alpha=0.05\), so use \(=\text{NORM.S.INV}(0.05)=-1.64485\)

Graph the critical value and the test statistic along the number line of the Standard Normal Distribution graph.

Distribution curve comparing the α to the p-value. Values of -2.15 and -1.645 are on the x-axis. Vertical upward lines extend from both of these values to the curve. The p-value is equal to 0.0158 and points to the area to the left of -2.15. α is equal to 0.05 and points to the area between the values of -2.15 and -1.645.

Since this is left-tailed, everything less than the critical value, \(\text{CV}=-1.64485\) will be the rejection region.

Since the test statistic, \(z=-2.0817\) is less than the critical value, \(\text{CV}=-1.64485, the decision will be to Reject the Null Hypothesis.

Conclusion (Same as the \(P\)-value solution):

The Type I and Type II errors for this problem are as follows :

The Type I error is to conclude that Jeffrey swims the 25-yard freestyle, on average, in less than 16.43 seconds when, in fact, he actually swims the 25-yard freestyle, on average, in 16.43 seconds. (Reject the null hypothesis when the null hypothesis is true.)

The Type II error is that there is not evidence to conclude that Jeffrey swims the 25-yard free-style, on average, in less than 16.43 seconds when, in fact, he actually does swim the 25-yard free-style, on average, in less than 16.43 seconds. (Do not reject the null hypothesis when the null hypothesis is false.)

Exercise \(\PageIndex{4}\)

The mean throwing distance of a football for a Marco, a high school freshman quarterback, is 40 yards, with a standard deviation of two yards. The team coach tells Marco to adjust his grip to get more distance. The coach records the distances for 20 throws. For the 20 throws, Marco’s mean distance was 45 yards. The coach thought the different grip helped Marco throw farther than 40 yards. Conduct a hypothesis test using a preset \(\alpha = 0.01\). Assume the throw distances for footballs are normal. Use the critical value method.

For Marco to throw farther, his distance will be greater than 40 yards. So the claim will be that he can throw farther than 40 yards.

\(H_{0}: \mu \leq 40\)

\(H_{a}: \mu > 40\) (claim)

The "\(>\)" in the alternative hypothesis tells you this is right-tailed.

Calculate the critical value. Use the Standard Normal Distribution, Critical Value, Right-tail Excel formula: \(=\text{NORM.S.INV}(1-\alpha)\).

In this problem, the \(\alpha=0.01\), so use \(=\text{NORM.S.INV}(1-0.01)=2.3263\)

\(\mu = 40\) comes from \(H_{0}\) and not the data. \(\sigma = 2\) and \(n = 20\). Which gives

\[z=\frac{45-40}{\frac{2}{\sqrt{20}}}=\frac{5}{\frac{2}{4.4721}}=\frac{5}{0.4472}=11.1803\nonumber\]

Since this is right-tailed, everything greater than the critical value, \(\text{CV}=2.3263\) will be the rejection region.

Since the test statistic, \(z=11.1803\) is greater than the critical value, \(\text{CV}=2.3263\), the decision will be to Reject the Null Hypothesis.

This means that you reject \(\mu \leq 40\).

There is sufficient evidence to support the claim that the change in Marco's grip improved his throwing distance to give a mean throw distance is greater than 40 yards.

Example \(\PageIndex{5}\)

A college football coach thought that his players could bench press a mean weight of 275 pounds. It is known that the standard deviation is 55 pounds. Three of his players thought that the mean weight was great than that amount. They asked 30 of their teammates for their estimated maximum lift on the bench press exercise. The data ranged from 205 pounds to 385 pounds. The actual different weights are given below

Conduct a \(p\)-value hypothesis test using a 2.5% level of significance to determine if the bench press mean is more than 275 pounds.

Since the problem is about a mean weight, this is a test of a single population mean.

\(H_{0}: \mu \leq 275\)

\(H_{a}: \mu > 275\) (claim)

The "\(>\)" in the alternative hypothesis tells you this is a right-tailed test.

Calculate the test statistic using the same formula as the \(z\)-score using the Central Limit Theorem.

\(\mu = 275\) comes from \(H_{0}\) and not the data. \(\sigma=55\) and \(n=30\). The problem does not give the sample mean, so that will need to be calculated using the data.

Enter the data into Excel, and use the Excel formula \(=\text{AVERAGE}()\) to find \(\bar{x}=286.2\).

\[z=\frac{286.2-275}{\frac{55}{\sqrt{30}}}=\frac{11.2}{\frac{2}{5.4772}}=\frac{11.2}{10.04}=1.11536\nonumber\]

Now calculate the \(p\)-value based on the test statistic found.

This is a right-tailed test, so use the Excel formula \(=1-\text{NORM.S.DIST}(z,\text{true})\).

In this case, we found \(z\), which is the test statistic, to be \(z=1.11536\).

Use the Excel formula \(=1-\text{NORM.S.DIST}(1.11536,\text{true})=0.132348\).

So the \(p\text{-value} = 0.132348\).

Interpretation of the p -value: If \(H_{0}\) is true, then there is a 0.1331 probability (13.23%) that the football players can lift a mean weight of 286.2 pounds or more. Because a 13.23% chance is large enough, a mean weight lift of 286.2 pounds or more is not a rare event.

Normal distribution curve of the average weight lifted by football players with values of 275 and 286.2 on the x-axis. A vertical upward line extends from 286.2 to the curve. The p-value points to the area to the right of 286.2.

Make a decision :

\(\alpha = 0.025\) and \(p\)-value \(= 0.1323\)

Since \(\alpha < p\text{-value}\), do not reject \(H_{0}\).

Conclusion: At the 2.5% level of significance, from the sample data, there is not sufficient evidence to conclude that the true mean weight lifted is more than 275 pounds.

The hypothesis test itself has an established process. This can be summarized as follows:

  • Determine \(H_{0}\) and \(H_{a}\). Remember, they are contradictory.
  • Find the evidence: Draw a graph, calculate the test statistic, and use the test statistic to calculate the \(p\text{-value}\). (A z -score and a t -score are examples of test statistics.)
  • Compare the preconceived α with the p -value, make a decision (reject or do not reject H 0 ).
  • Write a clear conclusion using English sentences.

Notice that in performing the hypothesis test, you use \(\alpha\) and not \(\beta\). \(\beta\) is needed to help determine the sample size of the data that is used in calculating the \(p\text{-value}\). Remember that the quantity \(1 – \beta\) is called the Power of the Test . A high power is desirable. If the power is too low, statisticians typically increase the sample size while keeping α the same.If the power is low, the null hypothesis might not be rejected when it should be.

Exercise \(\PageIndex{5}\)

Assume \(H_{0}: \mu = 9\) and \(H_{a}: \mu < 9\). Is this a left-tailed, right-tailed, or two-tailed test?

This is a left-tailed test.

Exercise \(\PageIndex{6}\)

Assume \(H_{0}: \mu \leq 6\) and \(H_{a}: \mu > 6\). Is this a left-tailed, right-tailed, or two-tailed test?

Exercise \(\PageIndex{7}\)

Assume \(H_{0}: p = 0.25\) and \(H_{a}: p \neq 0.25\). Is this a left-tailed, right-tailed, or two-tailed test?

This is a two-tailed test.

Exercise \(\PageIndex{8}\)

Draw the general graph of a left-tailed test.

Exercise \(\PageIndex{9}\)

Draw the graph of a two-tailed test.

alt

Exercise \(\PageIndex{10}\)

A bottle of water is labeled as containing 16 fluid ounces of water. You believe it is less than that. What type of test would you use?

Exercise \(\PageIndex{11}\)

Your friend claims that his mean golf score is 63. You want to show that it is higher than that. What type of test would you use?

a right-tailed test

Exercise \(\PageIndex{12}\)

A bathroom scale claims to be able to identify correctly any weight within a pound. You think that it cannot be that accurate. What type of test would you use?

Exercise \(\PageIndex{13}\)

You flip a coin and record whether it shows heads or tails. You know the probability of getting heads is 50%, but you think it is less for this particular coin. What type of test would you use?

a left-tailed test

Exercise \(\PageIndex{14}\)

If the alternative hypothesis has a not equals ( \(\neq\) ) symbol, you know to use which type of test?

Exercise \(\PageIndex{15}\)

Assume the null hypothesis states that the mean is at least 18. Is this a left-tailed, right-tailed, or two-tailed test?

Exercise \(\PageIndex{16}\)

Assume the null hypothesis states that the mean is at most 12. Is this a left-tailed, right-tailed, or two-tailed test?

Exercise \(\PageIndex{17}\)

Assume the null hypothesis states that the mean is equal to 88. The alternative hypothesis states that the mean is not equal to 88. Is this a left-tailed, right-tailed, or two-tailed test?

  • Data from Amit Schitai. Director of Instructional Technology and Distance Learning. LBCC.
  • Data from Bloomberg Businessweek . Available online at www.businessweek.com/news/2011- 09-15/nyc-smoking-rate-falls-to-record-low-of-14-bloomberg-says.html.
  • Data from energy.gov. Available online at http://energy.gov (accessed June 27. 2013).
  • Data from Gallup®. Available online at www.gallup.com (accessed June 27, 2013).
  • Data from Growing by Degrees by Allen and Seaman.
  • Data from La Leche League International. Available online at www.lalecheleague.org/Law/BAFeb01.html.
  • Data from the American Automobile Association. Available online at www.aaa.com (accessed June 27, 2013).
  • Data from the American Library Association. Available online at www.ala.org (accessed June 27, 2013).
  • Data from the Bureau of Labor Statistics. Available online at http://www.bls.gov/oes/current/oes291111.htm .
  • Data from the Centers for Disease Control and Prevention. Available online at www.cdc.gov (accessed June 27, 2013)
  • Data from the U.S. Census Bureau, available online at quickfacts.census.gov/qfd/states/00000.html (accessed June 27, 2013).
  • Data from the United States Census Bureau. Available online at www.census.gov/hhes/socdemo/language/.
  • Data from Toastmasters International. Available online at http://toastmasters.org/artisan/deta...eID=429&Page=1 .
  • Data from Weather Underground. Available online at www.wunderground.com (accessed June 27, 2013).
  • Federal Bureau of Investigations. “Uniform Crime Reports and Index of Crime in Daviess in the State of Kentucky enforced by Daviess County from 1985 to 2005.” Available online at http://www.disastercenter.com/kentucky/crime/3868.htm (accessed June 27, 2013).
  • “Foothill-De Anza Community College District.” De Anza College, Winter 2006. Available online at research.fhda.edu/factbook/DA...t_da_2006w.pdf.
  • Johansen, C., J. Boice, Jr., J. McLaughlin, J. Olsen. “Cellular Telephones and Cancer—a Nationwide Cohort Study in Denmark.” Institute of Cancer Epidemiology and the Danish Cancer Society, 93(3):203-7. Available online at http://www.ncbi.nlm.nih.gov/pubmed/11158188 (accessed June 27, 2013).
  • Rape, Abuse & Incest National Network. “How often does sexual assault occur?” RAINN, 2009. Available online at www.rainn.org/get-information...sexual-assault (accessed June 27, 2013).

Contributors and Attributions

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/[email protected] .

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • The Standard Normal Distribution | Calculator, Examples & Uses

The Standard Normal Distribution | Calculator, Examples & Uses

Published on November 5, 2020 by Pritha Bhandari . Revised on June 21, 2023.

The standard normal distribution , also called the z -distribution , is a special normal distribution where the mean is 0 and the standard deviation is 1.

Any normal distribution can be standardized by converting its values into z scores. Z scores tell you how many standard deviations from the mean each value lies.

The standard normal distribution has a mean of 0 and a standard deviation of 1.

Converting a normal distribution into a z -distribution allows you to calculate the probability of certain values occurring and to compare different data sets.

Download the  z  table

Table of contents

Standard normal distribution calculator, normal distribution vs the standard normal distribution, standardizing a normal distribution, use the standard normal distribution to find probability, step-by-step example of using the z distribution, other interesting articles, frequently asked questions about the standard normal distribution, here's why students love scribbr's proofreading services.

Discover proofreading & editing

All normal distributions , like the standard normal distribution, are unimodal and symmetrically distributed with a bell-shaped curve. However, a normal distribution can take on any value as its mean and standard deviation. In the standard normal distribution, the mean and standard deviation are always fixed.

Every normal distribution is a version of the standard normal distribution that’s been stretched or squeezed and moved horizontally right or left.

The mean determines where the curve is centered. Increasing the mean moves the curve right, while decreasing it moves the curve left.

The standard deviation stretches or squeezes the curve. A small standard deviation results in a narrow curve, while a large standard deviation leads to a wide curve.

The standard normal distribution compared with other normal distributions on a graph

When you standardize a normal distribution, the mean becomes 0 and the standard deviation becomes 1. This allows you to easily calculate the probability of certain values occurring in your distribution, or to compare data sets with different means and standard deviations.

While data points are referred to as x in a normal distribution, they are called z or z scores in the z distribution. A z score is a standard score that tells you how many standard deviations away from the mean an individual value ( x ) lies:

  • A positive z score means that your x value is greater than the mean.
  • A negative z score means that your x value is less than the mean.
  • A z score of zero means that your x value is equal to the mean.

Converting a normal distribution into the standard normal distribution allows you to:

  • Compare scores on different distributions with different means and standard deviations.
  • Normalize scores for statistical decision-making (e.g., grading on a curve).
  • Find the probability of observations in a distribution falling above or below a given value.
  • Find the probability that a sample mean significantly differs from a known population mean.

How to calculate a z score

To standardize a value from a normal distribution, convert the individual value into a z -score:

  • Subtract the mean from your individual value.
  • Divide the difference by the standard deviation.

To standardize your data, you first find the z score for 1380. The z score tells you how many standard deviations away 1380 is from the mean.

The  z score for a value of 1380 is 1.53 . That means 1380 is 1.53 standard deviations from the mean of your distribution.

The standard normal distribution is a probability distribution , so the area under the curve between two points tells you the probability of variables taking on a range of values. The total area under the curve is 1 or 100%.

Every z score has an associated p value that tells you the probability of all values below or above that z score occuring. This is the area under the curve left or right of that z score.

The area under the curve in a standard normal distribution tells you the probability of values occurring.

Z tests and p values

The z score is the test statistic used in a z test . The z test is used to compare the means of two groups, or to compare the mean of a group to a set value. Its null hypothesis typically assumes no difference between groups.

The area under the curve to the right of a z score is the p value, and it’s the likelihood of your observation occurring if the null hypothesis is true.

Usually, a p value of 0.05 or less means that your results are unlikely to have arisen by chance; it indicates a statistically significant effect.

By converting a value in a normal distribution into a z score, you can easily find the p value for a z test.

How to use a z table

Once you have a z score, you can look up the corresponding probability in a z table .

In a z table, the area under the curve is reported for every z value between -4 and 4 at intervals of 0.01.

There are a few different formats for the z table. Here, we use a portion of the cumulative table. This table tells you the total area under the curve up to a given z score—this area is equal to the probability of values below that z score occurring.

The first column of a z table contains the z score up to the first decimal place. The top row of the table gives the second decimal place.

To find the corresponding area under the curve (probability) for a z score:

  • Go down to the row with the first two digits of your z score.
  • Go across to the column with the same third digit as your z  score.
  • Find the value at the intersection of the row and column from the previous steps.

Portion of the z-table

To find the shaded area, you take away 0.937 from 1, which is the total area under the curve.

Probability of x > 1380 = 1 − 0.937 = 0.063

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

what is the hypothesis of normal distribution

Let’s walk through an invented research example to better understand how the standard normal distribution works.

As a sleep researcher, you’re curious about how sleep habits changed during COVID-19 lockdowns. You collect sleep duration data from a sample during a full lockdown.

Before the lockdown, the population mean was 6.5 hours of sleep. The lockdown sample mean is 7.62.

To assess whether your sample mean significantly differs from the pre-lockdown population mean, you perform a z test :

  • First, you calculate a z score for the sample mean value.
  • Then, you find the p value for your z score using a z table.

Step 1: Calculate a z -score

To compare sleep duration during and before the lockdown, you convert your lockdown sample mean into a z score using the pre-lockdown population mean and standard deviation.

A z score of 2.24 means that your sample mean is 2.24 standard deviations greater than the population mean.

Step 2: Find the  p value

To find the probability of your sample mean z score of 2.24 or less occurring, you use the  z table to find the value at the intersection of row 2.2 and column +0.04.

Finding the p-value using a z-table

The table tells you that the area under the curve up to or below your z score is 0.9874. This means that your sample’s mean sleep duration is higher than about 98.74% of the population’s mean sleep duration pre-lockdown.

Example of comparing population and sample means using a z-distribution.

To find the p value to assess whether the sample differs from the population, you calculate the area under the curve above or to the right of your z score. Since the total area under the curve is 1, you subtract the area under the curve below your z score from 1.

A p value of less than 0.05 or 5% means that the sample significantly differs from the population.

Probability of z > 2.24 = 1 − 0.9874 = 0.0126 or 1.26%

With a p value of less than 0.05, you can conclude that average sleep duration in the COVID-19 lockdown was significantly higher than the pre-lockdown average.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s t table
  • Student’s t distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

In a normal distribution , data are symmetrically distributed with no skew. Most values cluster around a central region, with values tapering off as they go further away from the center.

The measures of central tendency (mean, mode, and median) are exactly the same in a normal distribution.

Normal distribution

The standard normal distribution , also called the z -distribution, is a special normal distribution where the mean is 0 and the standard deviation is 1.

Any normal distribution can be converted into the standard normal distribution by turning the individual values into z -scores. In a z -distribution, z -scores tell you how many standard deviations away from the mean each value lies.

The empirical rule, or the 68-95-99.7 rule, tells you where most of the values lie in a normal distribution :

  • Around 68% of values are within 1 standard deviation of the mean.
  • Around 95% of values are within 2 standard deviations of the mean.
  • Around 99.7% of values are within 3 standard deviations of the mean.

The empirical rule is a quick way to get an overview of your data and check for any outliers or extreme values that don’t follow this pattern.

The t -distribution gives more probability to observations in the tails of the distribution than the standard normal distribution (a.k.a. the z -distribution).

In this way, the t -distribution is more conservative than the standard normal distribution: to reach the same level of confidence or statistical significance , you will need to include a wider range of the data.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 21). The Standard Normal Distribution | Calculator, Examples & Uses. Scribbr. Retrieved April 9, 2024, from https://www.scribbr.com/statistics/standard-normal-distribution/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, normal distribution | examples, formulas, & uses, t-distribution: what it is and how to use it, understanding p values | definition and examples, what is your plagiarism score.

IMAGES

  1. Hypothesis testing and the t-distribution

    what is the hypothesis of normal distribution

  2. Hypothesis Test on Sample Mean of Normal Distribution

    what is the hypothesis of normal distribution

  3. Hypothesis Testing with the Normal Distribution

    what is the hypothesis of normal distribution

  4. Normal Distribution for Test of Hypothesis I

    what is the hypothesis of normal distribution

  5. Hypothesis.pdf

    what is the hypothesis of normal distribution

  6. Hypothesis Testing In Trading

    what is the hypothesis of normal distribution

VIDEO

  1. Normal Distribution practicals & Hypothesis testing

  2. Exam prep session 1 for STA1610

  3. P&S(Part-21)C-chart[In Tamil]

  4. The Normal Distribution: Finding Values

  5. S2: Hypothesis Testing Using Normal Distribution Part 2: Type I and Type II errors

  6. Hypothesis Testing with Normal Distribution

COMMENTS

  1. 5.3.2 Normal Hypothesis Testing

    How is the critical value found in a hypothesis test for the mean of a normal distribution? The critical value(s) will be the boundary of the critical region. The probability of the observed value being within the critical region, given a true null hypothesis will be the same as the significance level; For an % significance level: . In a one-tailed test the critical region will consist of % in ...

  2. 9.4: Distribution Needed for Hypothesis Testing

    Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's \(t\)-distribution. (Remember, use a Student's \(t\)-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.)

  3. Normal Distribution in Statistics

    The normal distribution is a continuous probability distribution that is symmetrical around its mean, most of the observations cluster around the central peak, and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely.

  4. Normal Distribution

    What is a normal distribution and how to use it in statistics? Learn the definition, formulas, examples, and applications of this common data pattern. Find out how to calculate the mean, standard deviation, and z-scores of a normal distribution, and how to compare it with other distributions. Scribbr offers clear and concise explanations, diagrams, and calculators to help you master this topic.

  5. Normal distributions review (article)

    The mode of a normal distribution is the value at which the curve reaches its peak, which coincides with the mean and median in a normal distribution. While the probability of a specific point in a continuous distribution being exactly equal to a particular value is indeed 0, the mode is still a meaningful concept because it represents the most ...

  6. Normal Distribution Hypothesis Test: Explanation & Example

    When we hypothesis test for a normal distribution we are trying to see if the mean is different from the mean stated in the null hypothesis. We use the sample mean which is \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\). In two-tailed tests we divide the significance level by two and test on both tails.

  7. 9.3 Probability Distribution Needed for Hypothesis Testing

    Assumptions. When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z-test), you take a simple random sample from the population. The population you are testing is normally distributed, or your sample size is sufficiently large.You know the value of the population standard deviation, which, in reality, is rarely known.

  8. 8.1.3: Distribution Needed for Hypothesis Testing

    The estimated value (point estimate) for μ is ˉx, the sample mean. If you are testing a single population proportion, the distribution for the test is for proportions or percentages: P ′ − N(p, √p − q n) The population parameter is p. The estimated value (point estimate) for p is p′. p ′ = x n where x is the number of successes ...

  9. How to Do Hypothesis Testing with Normal Distribution

    The alternative hypothesis in this case is that the bottles do not contain 0. 5 L and that the machines are not precise enough. This thus becomes a two-sided hypothesis test and you must therefore remember to multiply the p-value by 2 before deciding whether the p-value is in the critical region.This is because the normal distribution is symmetric, so P (X ≥ k) = P (X ≤ − k).

  10. Distribution Needed for Hypothesis Testing

    normal distribution (often called a z -test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known. When you perform a hypothesis test of a single population ...

  11. 9.3 Distribution Needed for Hypothesis Testing

    Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t-distribution. (Remember, use a Student's t-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.)

  12. 5.6: The Normal Distribution

    The standard normal distribution is a continuous distribution on R with probability density function ϕ given by ϕ(z) = 1 √2πe − z2 / 2, z ∈ R. Proof that ϕ is a probability density function. The standard normal probability density function has the famous bell shape that is known to just about everyone.

  13. 6.1 The Standard Normal Distribution

    The Empirical Rule If X is a random variable and has a normal distribution with mean µ and standard deviation σ, then the Empirical Rule states the following:. About 68% of the x values lie between -1σ and +1σ of the mean µ (within one standard deviation of the mean).; About 95% of the x values lie between -2σ and +2σ of the mean µ (within two standard deviations of the mean).

  14. Tests in the Normal Model

    Basic Theory The Normal Model. The normal distribution is perhaps the most important distribution in the study of mathematical statistics, in part because of the central limit theorem. As a consequence of this theorem, a measured quantity that is subject to numerous small, random errors will have, at least approximately, a normal distribution.

  15. Distribution Needed for Hypothesis Testing

    Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t-distribution. (Remember, use a Student's t-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.)

  16. Bell Shaped Curve: Normal Distribution In Statistics

    The normal distribution is a continuous probability distribution that is symmetrical on both sides of the mean, so the right side of the center is a mirror image of the left side. The area under the normal distribution curve represents the probability and the total area under the curve sums to one.

  17. Normal distribution

    In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is. The parameter is the mean or expectation of the distribution (and also its median and mode ), while the parameter is its standard deviation.

  18. 9.2: Tests in the Normal Model

    In the mean test experiment, select the normal test statistic and select the normal sampling distribution with standard deviation σ = 2 σ = 2, significance level α = 0.1 α = 0.1, sample size n = 20 n = 20, and μ0 = 0 μ 0 = 0. Run the experiment 1000 times for several values of the true distribution mean μ μ.

  19. Z-Test for Statistical Hypothesis Testing Explained

    The Z-test is a statistical hypothesis test used to determine where the distribution of the test statistic we are measuring, like the mean, is part of the normal distribution. There are multiple types of Z-tests, however, we'll focus on the easiest and most well known one, the one sample mean test. This is used to determine if the difference ...

  20. Normal Distribution: What It Is, Uses, and Formula

    Normal Distribution: The normal distribution, also known as the Gaussian or standard normal distribution, is the probability distribution that plots all of its values in a symmetrical fashion, and ...

  21. Test of Normality • Simply explained

    With a very large sample, you can even get a p-value of less than 0.05, rejecting the null hypothesis of normal distribution. To avoid this problem, graphical methods are increasingly being used. Graphical test for normal distribution. If the normal distribution is tested graphically, one looks either at the histogram or even better the QQ plot.

  22. 9.3: A Single Population Mean using the Normal Distribution

    The "\(>\)" in the alternative hypothesis tells you this is right-tailed. Calculate the evidence: Use the Standard Normal Distribution since the population standard deviation is given. Calculate the critical value. Use the Standard Normal Distribution, Critical Value, Right-tail Excel formula: \(=\text{NORM.S.INV}(1-\alpha)\).

  23. The Standard Normal Distribution

    The z score tells you how many standard deviations away 1380 is from the mean. Step 1: Subtract the mean from the x value. x = 1380. M = 1150. x - M = 1380 − 1150 = 230. Step 2: Divide the difference by the standard deviation. SD = 150. z = 230 ÷ 150 = 1.53. The z score for a value of 1380 is 1.53.

  24. Research on the Impact of Non-Uniform and Frequency-Dependent Normal

    The analysis method for the normal contact stiffness of mechanical interface proposed in this paper considers both the distribution and frequency-dependent properties of the normal contact stiffness. The application of this method in simulations has successfully achieved a good match between the modal vibration shapes, frequency response curves ...