Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved March 25, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base
  • Type I & Type II Errors | Differences, Examples, Visualizations

Type I & Type II Errors | Differences, Examples, Visualizations

Published on 18 January 2021 by Pritha Bhandari . Revised on 2 February 2023.

In statistics , a Type I error is a false positive conclusion, while a Type II error is a false negative conclusion.

Making a statistical decision always involves uncertainties, so the risks of making these errors are unavoidable in hypothesis testing .

The probability of making a Type I error is the significance level , or alpha (α), while the probability of making a Type II error is beta (β). These risks can be minimized through careful planning in your study design.

  • Type I error (false positive) : the test result says you have coronavirus, but you actually don’t.
  • Type II error (false negative) : the test result says you don’t have coronavirus, but you actually do.

Table of contents

Error in statistical decision-making, type i error, type ii error, trade-off between type i and type ii errors, is a type i or type ii error worse, frequently asked questions about type i and ii errors.

Using hypothesis testing, you can make decisions about whether your data support or refute your research predictions with null and alternative hypotheses .

Hypothesis testing starts with the assumption of no difference between groups or no relationship between variables in the population—this is the null hypothesis . It’s always paired with an alternative hypothesis , which is your research prediction of an actual difference between groups or a true relationship between variables .

In this case:

  • The null hypothesis (H 0 ) is that the new drug has no effect on symptoms of the disease.
  • The alternative hypothesis (H 1 ) is that the drug is effective for alleviating symptoms of the disease.

Then , you decide whether the null hypothesis can be rejected based on your data and the results of a statistical test . Since these decisions are based on probabilities, there is always a risk of making the wrong conclusion.

  • If your results show statistical significance , that means they are very unlikely to occur if the null hypothesis is true. In this case, you would reject your null hypothesis. But sometimes, this may actually be a Type I error.
  • If your findings do not show statistical significance, they have a high chance of occurring if the null hypothesis is true. Therefore, you fail to reject your null hypothesis. But sometimes, this may be a Type II error.

Type I and Type II error in statistics

A Type I error means rejecting the null hypothesis when it’s actually true. It means concluding that results are statistically significant when, in reality, they came about purely by chance or because of unrelated factors.

The risk of committing this error is the significance level (alpha or α) you choose. That’s a value that you set at the beginning of your study to assess the statistical probability of obtaining your results ( p value).

The significance level is usually set at 0.05 or 5%. This means that your results only have a 5% chance of occurring, or less, if the null hypothesis is actually true.

If the p value of your test is lower than the significance level, it means your results are statistically significant and consistent with the alternative hypothesis. If your p value is higher than the significance level, then your results are considered statistically non-significant.

To reduce the Type I error probability, you can simply set a lower significance level.

Type I error rate

The null hypothesis distribution curve below shows the probabilities of obtaining all possible results if the study were repeated with new samples and the null hypothesis were true in the population .

At the tail end, the shaded area represents alpha. It’s also called a critical region in statistics.

If your results fall in the critical region of this curve, they are considered statistically significant and the null hypothesis is rejected. However, this is a false positive conclusion, because the null hypothesis is actually true in this case!

Type I error rate

A Type II error means not rejecting the null hypothesis when it’s actually false. This is not quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell you whether to reject the null hypothesis.

Instead, a Type II error means failing to conclude there was an effect when there actually was. In reality, your study may not have had enough statistical power to detect an effect of a certain size.

Power is the extent to which a test can correctly detect a real effect when there is one. A power level of 80% or higher is usually considered acceptable.

The risk of a Type II error is inversely related to the statistical power of a study. The higher the statistical power, the lower the probability of making a Type II error.

Statistical power is determined by:

  • Size of the effect : Larger effects are more easily detected.
  • Measurement error : Systematic and random errors in recorded data reduce power.
  • Sample size : Larger samples reduce sampling error and increase power.
  • Significance level : Increasing the significance level increases power.

To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the significance level.

Type II error rate

The alternative hypothesis distribution curve below shows the probabilities of obtaining all possible results if the study were repeated with new samples and the alternative hypothesis were true in the population .

The Type II error rate is beta (β), represented by the shaded area on the left side. The remaining area under the curve represents statistical power, which is 1 – β.

Increasing the statistical power of your test directly decreases the risk of making a Type II error.

Type II error rate

The Type I and Type II error rates influence each other. That’s because the significance level (the Type I error rate) affects statistical power, which is inversely related to the Type II error rate.

This means there’s an important tradeoff between Type I and Type II errors:

  • Setting a lower significance level decreases a Type I error risk, but increases a Type II error risk.
  • Increasing the power of a test decreases a Type II error risk, but increases a Type I error risk.

This trade-off is visualized in the graph below. It shows two curves:

  • The null hypothesis distribution shows all possible results you’d obtain if the null hypothesis is true. The correct conclusion for any point on this distribution means not rejecting the null hypothesis.
  • The alternative hypothesis distribution shows all possible results you’d obtain if the alternative hypothesis is true. The correct conclusion for any point on this distribution means rejecting the null hypothesis.

Type I and Type II errors occur where these two distributions overlap. The blue shaded area represents alpha, the Type I error rate, and the green shaded area represents beta, the Type II error rate.

By setting the Type I error rate, you indirectly influence the size of the Type II error rate as well.

Type I and Type II error

It’s important to strike a balance between the risks of making Type I and Type II errors. Reducing the alpha always comes at the cost of increasing beta, and vice versa .

For statisticians, a Type I error is usually worse. In practical terms, however, either type of error could be worse depending on your research context.

A Type I error means mistakenly going against the main statistical assumption of a null hypothesis. This may lead to new policies, practices or treatments that are inadequate or a waste of resources.

In contrast, a Type II error means failing to reject a null hypothesis. It may only result in missed opportunities to innovate, but these can also have important practical consequences.

In statistics, a Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s actually false.

The risk of making a Type I error is the significance level (or alpha) that you choose. That’s a value that you set at the beginning of your study to assess the statistical probability of obtaining your results ( p value ).

To reduce the Type I error probability, you can set a lower significance level.

The risk of making a Type II error is inversely related to the statistical power of a test. Power is the extent to which a test can correctly detect a real effect when there is one.

To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the significance level to increase statistical power.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

In statistics, power refers to the likelihood of a hypothesis test detecting a true effect if there is one. A statistically powerful test is more likely to reject a false negative (a Type II error).

If you don’t ensure enough power in your study, you may not be able to detect a statistically significant result even when it has practical significance. Your study might not have the ability to answer your research question.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2023, February 02). Type I & Type II Errors | Differences, Examples, Visualizations. Scribbr. Retrieved 25 March 2024, from https://www.scribbr.co.uk/stats/type-i-and-type-ii-error/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

9.2 Outcomes and the Type I and Type II Errors

When you perform a hypothesis test, there are four possible outcomes depending on the actual truth, or falseness, of the null hypothesis H 0 and the decision to reject or not. The outcomes are summarized in the following table:

The four possible outcomes in the table are as follows:

  • The decision is not to reject H 0 when H 0 is true (correct decision).
  • The decision is to reject H 0 when, in fact, H 0 is true (incorrect decision known as a Type I error ).
  • The decision is not to reject H 0 when, in fact, H 0 is false (incorrect decision known as a Type II error ).
  • The decision is to reject H 0 when H 0 is false (correct decision whose probability is called the Power of the Test ).

Each of the errors occurs with a particular probability. The Greek letters α and β represent the probabilities.

α = probability of a Type I error = P (Type I error) = probability of rejecting the null hypothesis when the null hypothesis is true.

β = probability of a Type II error = P (Type II error) = probability of not rejecting the null hypothesis when the null hypothesis is false.

α and β should be as small as possible because they are probabilities of errors. They are rarely zero.

The Power of the Test is 1 – β . Ideally, we want a high power that is as close to one as possible. Increasing the sample size can increase the Power of the Test.

The following are examples of Type I and Type II errors.

Example 9.5

Suppose the null hypothesis, H 0 , is: Frank's rock climbing equipment is safe.

Type I error: Frank does not go rock climbing because he considers that the equipment is not safe, when in fact, the equipment is really safe. Frank is making the mistake of rejecting the null hypothesis, when the equipment is actually safe!

Type II error: Frank goes climbing, thinking that his equipment is safe, but this is a mistake, and he painfully realizes that his equipment is not as safe as it should have been. Frank assumed that the null hypothesis was true, when it was not.

α = probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe. β = probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)

Suppose the null hypothesis, H 0 , is: the blood cultures contain no traces of pathogen X . State the Type I and Type II errors.

Example 9.6

Suppose the null hypothesis, H 0 , is: a tomato plant is alive when a class visits the school garden.

Type I error: The null hypothesis claims that the tomato plant is alive, and it is true, but the students make the mistake of thinking that the plant is already dead.

Type II error: The tomato plant is already dead (the null hypothesis is false), but the students do not notice it, and believe that the tomato plant is alive.

α = probability that the class thinks the tomato plant is dead when, in fact, it is alive = P (Type I error). β = probability that the class thinks the tomato plant is alive when, in fact, it is dead = P (Type II error).

The error with the greater consequence is the Type I error. (If the class thinks the plant is dead, they will not water it.)

Suppose the null hypothesis, H 0 , is: a patient is not sick. Which type of error has the greater consequence, Type I or Type II?

Example 9.7

It’s a Boy Genetic Labs, a genetics company, claims to be able to increase the likelihood that a pregnancy will result in a boy being born. Statisticians want to test the claim. Suppose that the null hypothesis, H 0 , is: It’s a Boy Genetic Labs has no effect on gender outcome.

Type I error : This error results when a true null hypothesis is rejected. In the context of this scenario, we would state that we believe that It’s a Boy Genetic Labs influences the gender outcome, when in fact it has no effect. The probability of this error occurring is denoted by the Greek letter alpha, α .

Type II error : This error results when we fail to reject a false null hypothesis. In context, we would state that It’s a Boy Genetic Labs does not influence the gender outcome of a pregnancy when, in fact, it does. The probability of this error occurring is denoted by the Greek letter beta, β .

The error with the greater consequence would be the Type I error since couples would use the It’s a Boy Genetic Labs product in hopes of increasing the chances of having a boy.

Red tide is a bloom of poison-producing algae—a few different species of a class of plankton called dinoflagellates. When the weather and water conditions cause these blooms, shellfish such as clams living in the area develop dangerous levels of a paralysis-inducing toxin. In Massachusetts, the Division of Marine Fisheries montors levels of the toxin in shellfish by regular sampling of shellfish along the coastline. If the mean level of toxin in clams exceeds 800 μg (micrograms) of toxin per kilogram of clam meat in any area, clam harvesting is banned there until the bloom is over and levels of toxin in clams subside. Describe both a Type I and a Type II error in this context, and state which error has the greater consequence.

Example 9.8

A certain experimental drug claims a cure rate of at least 75 percent for males with a disease. Describe both the Type I and Type II errors in context. Which error is the more serious?

Type I : A patient believes the cure rate for the drug is less than 75 percent when it actually is at least 75 percent.

Type II : A patient believes the experimental drug has at least a 75 percent cure rate when it has a cure rate that is less than 75 percent.

In this scenario, the Type II error contains the more severe consequence. If a patient believes the drug works at least 75 percent of the time, this most likely will influence the patient’s (and doctor’s) choice about whether to use the drug as a treatment option.

Determine both Type I and Type II errors for the following scenario:

Assume a null hypothesis, H 0 , that states the percentage of adults with jobs is at least 88 percent.

Identify the Type I and Type II errors from these four possible choices.

  • Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when that percentage is actually less than 88 percent
  • Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when the percentage is actually at least 88 percent
  • Reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when the percentage is actually at least 88 percent
  • Reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when that percentage is actually less than 88 percent

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/9-2-outcomes-and-the-type-i-and-type-ii-errors

© Jan 23, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.1: Introduction to Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 10211

  • Kyle Siegrist
  • University of Alabama in Huntsville via Random Services

Basic Theory

Preliminaries.

As usual, our starting point is a random experiment with an underlying sample space and a probability measure \(\P\). In the basic statistical model, we have an observable random variable \(\bs{X}\) taking values in a set \(S\). In general, \(\bs{X}\) can have quite a complicated structure. For example, if the experiment is to sample \(n\) objects from a population and record various measurements of interest, then \[ \bs{X} = (X_1, X_2, \ldots, X_n) \] where \(X_i\) is the vector of measurements for the \(i\)th object. The most important special case occurs when \((X_1, X_2, \ldots, X_n)\) are independent and identically distributed. In this case, we have a random sample of size \(n\) from the common distribution.

The purpose of this section is to define and discuss the basic concepts of statistical hypothesis testing . Collectively, these concepts are sometimes referred to as the Neyman-Pearson framework, in honor of Jerzy Neyman and Egon Pearson, who first formalized them.

A statistical hypothesis is a statement about the distribution of \(\bs{X}\). Equivalently, a statistical hypothesis specifies a set of possible distributions of \(\bs{X}\): the set of distributions for which the statement is true. A hypothesis that specifies a single distribution for \(\bs{X}\) is called simple ; a hypothesis that specifies more than one distribution for \(\bs{X}\) is called composite .

In hypothesis testing , the goal is to see if there is sufficient statistical evidence to reject a presumed null hypothesis in favor of a conjectured alternative hypothesis . The null hypothesis is usually denoted \(H_0\) while the alternative hypothesis is usually denoted \(H_1\).

An hypothesis test is a statistical decision ; the conclusion will either be to reject the null hypothesis in favor of the alternative, or to fail to reject the null hypothesis. The decision that we make must, of course, be based on the observed value \(\bs{x}\) of the data vector \(\bs{X}\). Thus, we will find an appropriate subset \(R\) of the sample space \(S\) and reject \(H_0\) if and only if \(\bs{x} \in R\). The set \(R\) is known as the rejection region or the critical region . Note the asymmetry between the null and alternative hypotheses. This asymmetry is due to the fact that we assume the null hypothesis, in a sense, and then see if there is sufficient evidence in \(\bs{x}\) to overturn this assumption in favor of the alternative.

An hypothesis test is a statistical analogy to proof by contradiction, in a sense. Suppose for a moment that \(H_1\) is a statement in a mathematical theory and that \(H_0\) is its negation. One way that we can prove \(H_1\) is to assume \(H_0\) and work our way logically to a contradiction. In an hypothesis test, we don't prove anything of course, but there are similarities. We assume \(H_0\) and then see if the data \(\bs{x}\) are sufficiently at odds with that assumption that we feel justified in rejecting \(H_0\) in favor of \(H_1\).

Often, the critical region is defined in terms of a statistic \(w(\bs{X})\), known as a test statistic , where \(w\) is a function from \(S\) into another set \(T\). We find an appropriate rejection region \(R_T \subseteq T\) and reject \(H_0\) when the observed value \(w(\bs{x}) \in R_T\). Thus, the rejection region in \(S\) is then \(R = w^{-1}(R_T) = \left\{\bs{x} \in S: w(\bs{x}) \in R_T\right\}\). As usual, the use of a statistic often allows significant data reduction when the dimension of the test statistic is much smaller than the dimension of the data vector.

The ultimate decision may be correct or may be in error. There are two types of errors, depending on which of the hypotheses is actually true.

Types of errors:

  • A type 1 error is rejecting the null hypothesis \(H_0\) when \(H_0\) is true.
  • A type 2 error is failing to reject the null hypothesis \(H_0\) when the alternative hypothesis \(H_1\) is true.

Similarly, there are two ways to make a correct decision: we could reject \(H_0\) when \(H_1\) is true or we could fail to reject \(H_0\) when \(H_0\) is true. The possibilities are summarized in the following table:

Of course, when we observe \(\bs{X} = \bs{x}\) and make our decision, either we will have made the correct decision or we will have committed an error, and usually we will never know which of these events has occurred. Prior to gathering the data, however, we can consider the probabilities of the various errors.

If \(H_0\) is true (that is, the distribution of \(\bs{X}\) is specified by \(H_0\)), then \(\P(\bs{X} \in R)\) is the probability of a type 1 error for this distribution. If \(H_0\) is composite, then \(H_0\) specifies a variety of different distributions for \(\bs{X}\) and thus there is a set of type 1 error probabilities.

The maximum probability of a type 1 error, over the set of distributions specified by \( H_0 \), is the significance level of the test or the size of the critical region.

The significance level is often denoted by \(\alpha\). Usually, the rejection region is constructed so that the significance level is a prescribed, small value (typically 0.1, 0.05, 0.01).

If \(H_1\) is true (that is, the distribution of \(\bs{X}\) is specified by \(H_1\)), then \(\P(\bs{X} \notin R)\) is the probability of a type 2 error for this distribution. Again, if \(H_1\) is composite then \(H_1\) specifies a variety of different distributions for \(\bs{X}\), and thus there will be a set of type 2 error probabilities. Generally, there is a tradeoff between the type 1 and type 2 error probabilities. If we reduce the probability of a type 1 error, by making the rejection region \(R\) smaller, we necessarily increase the probability of a type 2 error because the complementary region \(S \setminus R\) is larger.

The extreme cases can give us some insight. First consider the decision rule in which we never reject \(H_0\), regardless of the evidence \(\bs{x}\). This corresponds to the rejection region \(R = \emptyset\). A type 1 error is impossible, so the significance level is 0. On the other hand, the probability of a type 2 error is 1 for any distribution defined by \(H_1\). At the other extreme, consider the decision rule in which we always rejects \(H_0\) regardless of the evidence \(\bs{x}\). This corresponds to the rejection region \(R = S\). A type 2 error is impossible, but now the probability of a type 1 error is 1 for any distribution defined by \(H_0\). In between these two worthless tests are meaningful tests that take the evidence \(\bs{x}\) into account.

If \(H_1\) is true, so that the distribution of \(\bs{X}\) is specified by \(H_1\), then \(\P(\bs{X} \in R)\), the probability of rejecting \(H_0\) is the power of the test for that distribution.

Thus the power of the test for a distribution specified by \( H_1 \) is the probability of making the correct decision.

Suppose that we have two tests, corresponding to rejection regions \(R_1\) and \(R_2\), respectively, each having significance level \(\alpha\). The test with region \(R_1\) is uniformly more powerful than the test with region \(R_2\) if \[ \P(\bs{X} \in R_1) \ge \P(\bs{X} \in R_2) \text{ for every distribution of } \bs{X} \text{ specified by } H_1 \]

Naturally, in this case, we would prefer the first test. Often, however, two tests will not be uniformly ordered; one test will be more powerful for some distributions specified by \(H_1\) while the other test will be more powerful for other distributions specified by \(H_1\).

If a test has significance level \(\alpha\) and is uniformly more powerful than any other test with significance level \(\alpha\), then the test is said to be a uniformly most powerful test at level \(\alpha\).

Clearly a uniformly most powerful test is the best we can do.

\(P\)-value

In most cases, we have a general procedure that allows us to construct a test (that is, a rejection region \(R_\alpha\)) for any given significance level \(\alpha \in (0, 1)\). Typically, \(R_\alpha\) decreases (in the subset sense) as \(\alpha\) decreases.

The \(P\)-value of the observed value \(\bs{x}\) of \(\bs{X}\), denoted \(P(\bs{x})\), is defined to be the smallest \(\alpha\) for which \(\bs{x} \in R_\alpha\); that is, the smallest significance level for which \(H_0\) is rejected, given \(\bs{X} = \bs{x}\).

Knowing \(P(\bs{x})\) allows us to test \(H_0\) at any significance level for the given data \(\bs{x}\): If \(P(\bs{x}) \le \alpha\) then we would reject \(H_0\) at significance level \(\alpha\); if \(P(\bs{x}) \gt \alpha\) then we fail to reject \(H_0\) at significance level \(\alpha\). Note that \(P(\bs{X})\) is a statistic . Informally, \(P(\bs{x})\) can often be thought of as the probability of an outcome as or more extreme than the observed value \(\bs{x}\), where extreme is interpreted relative to the null hypothesis \(H_0\).

Analogy with Justice Systems

There is a helpful analogy between statistical hypothesis testing and the criminal justice system in the US and various other countries. Consider a person charged with a crime. The presumed null hypothesis is that the person is innocent of the crime; the conjectured alternative hypothesis is that the person is guilty of the crime. The test of the hypotheses is a trial with evidence presented by both sides playing the role of the data. After considering the evidence, the jury delivers the decision as either not guilty or guilty . Note that innocent is not a possible verdict of the jury, because it is not the point of the trial to prove the person innocent. Rather, the point of the trial is to see whether there is sufficient evidence to overturn the null hypothesis that the person is innocent in favor of the alternative hypothesis of that the person is guilty. A type 1 error is convicting a person who is innocent; a type 2 error is acquitting a person who is guilty. Generally, a type 1 error is considered the more serious of the two possible errors, so in an attempt to hold the chance of a type 1 error to a very low level, the standard for conviction in serious criminal cases is beyond a reasonable doubt .

Tests of an Unknown Parameter

Hypothesis testing is a very general concept, but an important special class occurs when the distribution of the data variable \(\bs{X}\) depends on a parameter \(\theta\) taking values in a parameter space \(\Theta\). The parameter may be vector-valued, so that \(\bs{\theta} = (\theta_1, \theta_2, \ldots, \theta_n)\) and \(\Theta \subseteq \R^k\) for some \(k \in \N_+\). The hypotheses generally take the form \[ H_0: \theta \in \Theta_0 \text{ versus } H_1: \theta \notin \Theta_0 \] where \(\Theta_0\) is a prescribed subset of the parameter space \(\Theta\). In this setting, the probabilities of making an error or a correct decision depend on the true value of \(\theta\). If \(R\) is the rejection region, then the power function \( Q \) is given by \[ Q(\theta) = \P_\theta(\bs{X} \in R), \quad \theta \in \Theta \] The power function gives a lot of information about the test.

The power function satisfies the following properties:

  • \(Q(\theta)\) is the probability of a type 1 error when \(\theta \in \Theta_0\).
  • \(\max\left\{Q(\theta): \theta \in \Theta_0\right\}\) is the significance level of the test.
  • \(1 - Q(\theta)\) is the probability of a type 2 error when \(\theta \notin \Theta_0\).
  • \(Q(\theta)\) is the power of the test when \(\theta \notin \Theta_0\).

If we have two tests, we can compare them by means of their power functions.

Suppose that we have two tests, corresponding to rejection regions \(R_1\) and \(R_2\), respectively, each having significance level \(\alpha\). The test with rejection region \(R_1\) is uniformly more powerful than the test with rejection region \(R_2\) if \( Q_1(\theta) \ge Q_2(\theta)\) for all \( \theta \notin \Theta_0 \).

Most hypothesis tests of an unknown real parameter \(\theta\) fall into three special cases:

Suppose that \( \theta \) is a real parameter and \( \theta_0 \in \Theta \) a specified value. The tests below are respectively the two-sided test , the left-tailed test , and the right-tailed test .

  • \(H_0: \theta = \theta_0\) versus \(H_1: \theta \ne \theta_0\)
  • \(H_0: \theta \ge \theta_0\) versus \(H_1: \theta \lt \theta_0\)
  • \(H_0: \theta \le \theta_0\) versus \(H_1: \theta \gt \theta_0\)

Thus the tests are named after the conjectured alternative. Of course, there may be other unknown parameters besides \(\theta\) (known as nuisance parameters ).

Equivalence Between Hypothesis Test and Confidence Sets

There is an equivalence between hypothesis tests and confidence sets for a parameter \(\theta\).

Suppose that \(C(\bs{x})\) is a \(1 - \alpha\) level confidence set for \(\theta\). The following test has significance level \(\alpha\) for the hypothesis \( H_0: \theta = \theta_0 \) versus \( H_1: \theta \ne \theta_0 \): Reject \(H_0\) if and only if \(\theta_0 \notin C(\bs{x})\)

By definition, \(\P[\theta \in C(\bs{X})] = 1 - \alpha\). Hence if \(H_0\) is true so that \(\theta = \theta_0\), then the probability of a type 1 error is \(P[\theta \notin C(\bs{X})] = \alpha\).

Equivalently, we fail to reject \(H_0\) at significance level \(\alpha\) if and only if \(\theta_0\) is in the corresponding \(1 - \alpha\) level confidence set. In particular, this equivalence applies to interval estimates of a real parameter \(\theta\) and the common tests for \(\theta\) given above .

In each case below, the confidence interval has confidence level \(1 - \alpha\) and the test has significance level \(\alpha\).

  • Suppose that \(\left[L(\bs{X}, U(\bs{X})\right]\) is a two-sided confidence interval for \(\theta\). Reject \(H_0: \theta = \theta_0\) versus \(H_1: \theta \ne \theta_0\) if and only if \(\theta_0 \lt L(\bs{X})\) or \(\theta_0 \gt U(\bs{X})\).
  • Suppose that \(L(\bs{X})\) is a confidence lower bound for \(\theta\). Reject \(H_0: \theta \le \theta_0\) versus \(H_1: \theta \gt \theta_0\) if and only if \(\theta_0 \lt L(\bs{X})\).
  • Suppose that \(U(\bs{X})\) is a confidence upper bound for \(\theta\). Reject \(H_0: \theta \ge \theta_0\) versus \(H_1: \theta \lt \theta_0\) if and only if \(\theta_0 \gt U(\bs{X})\).

Pivot Variables and Test Statistics

Recall that confidence sets of an unknown parameter \(\theta\) are often constructed through a pivot variable , that is, a random variable \(W(\bs{X}, \theta)\) that depends on the data vector \(\bs{X}\) and the parameter \(\theta\), but whose distribution does not depend on \(\theta\) and is known. In this case, a natural test statistic for the basic tests given above is \(W(\bs{X}, \theta_0)\).

Hypothesis Testing and Types of Errors

Article info.

  • Statistical Inference
  • Null Hypothesis
  • Bayes Factor
  • Bayesian Inference
  • Sampling and Estimation

Article Versions

  • 14 2021-05-28 15:47:23 2562,2561 14,2562 By arvindpdmn Adding info on base rate and confidence interval.
  • 13 2021-05-28 15:20:55 2561,2560 13,2561 By arvindpdmn Minor edit
  • 12 2021-05-28 15:11:35 2560,2559 12,2560 By arvindpdmn Milestones added. Added question on misconceptions.
  • 11 2021-05-28 13:22:05 2559,2116 11,2559 By arvindpdmn Minor reordering of questions. Merge two questions into one. Added more refs and citations to remove warnings. Other corrections. Milestones are pending.
  • 10 2020-07-15 06:11:06 2116,1611 10,2116 By arvindpdmn Reducing to 8 tags. This is the maximum that will be imposed in the next release. Corrected spelling.
  • Submitting ... You are editing an existing chat message. All Versions 2021-05-28 15:47:23 by arvindpdmn 2021-05-28 15:20:55 by arvindpdmn 2021-05-28 15:11:35 by arvindpdmn 2021-05-28 13:22:05 by arvindpdmn 2020-07-15 06:11:06 by arvindpdmn 2019-09-23 10:56:34 by arvindpdmn 2019-03-13 15:34:32 by arvindpdmn 2018-05-29 05:32:41 by arvindpdmn 2018-05-27 12:14:19 by raam.raam 2018-05-26 11:37:25 by raam.raam 2018-05-21 10:59:24 by arvindpdmn 2018-05-21 10:46:31 by arvindpdmn 2018-05-18 07:27:06 by raam.raam 2018-05-18 06:19:18 by raam.raam All Sections Summary Discussion Sample Code References Milestones Tags See Also Further Reading
  • 2021-05-28 13:12:19 - By raam.raam It is the error rate that is related and not error quantum. The following article puts this nicely. https://www.scribbr.com/statistics/type-i-and-type-ii-errors/#:~:text=Trade%2Doff%20between%20Type%20I%20and%20Type%20II%20errors,-The%20Type%20I&text=This%20means%20there's%20an%20important,a%20Type%20I%20error%20risk.
  • 2021-05-28 12:57:18 - By arvindpdmn I am anyway updating the article to remove warnings. Will edit as part of the update.
  • 2021-05-28 12:36:34 - 1 By raam.raam I will correct the relationship between alpha and beta. They are dependent
  • 2021-05-28 12:05:26 - By raam.raam as a special case when null and alternate hypotheses distributions are overlapping, it may be. But don't think can be generalized. I will look for more references.
  • 2021-05-28 11:30:47 - 1 By arvindpdmn Need to clarify "Errors α and β are independent of each other. Increasing one does not decrease the other.". This ref says otherwise: https://www.afit.edu/stat/statcoe_files/8__The_Logic_of_Statistical_Hypothesis_Testing.pdf

Suppose we want to study income of a population. We study a sample from the population and draw conclusions. The sample should represent the population for our study to be a reliable one.

Null hypothesis \((H_0)\) is that sample represents population. Hypothesis testing provides us with framework to conclude if we have sufficient evidence to either accept or reject null hypothesis.

Population characteristics are either assumed or drawn from third-party sources or judgements by subject matter experts. Population data and sample data are characterised by moments of its distribution (mean, variance, skewness and kurtosis). We test null hypothesis for equality of moments where population characteristic is available and conclude if sample represents population.

For example, given only mean income of population, we validate if mean income of sample is close to population mean to conclude if sample represents the population.

Population and sample parameters. Source: Rolke 2018.

Population mean and population variance are denoted in Greek alphabets \(\mu\) and \(\sigma^2\) respectively, while sample mean and sample variance are denoted in English alphabets \(\bar x\) and \(s^2\) respectively.

Graphical representations of sampling error. Source: Nurse Key 2017, fig. 15-2.

  • If the difference is not significant, we conclude the difference is due to sampling. This is called sampling error and this happens due to chance.
  • If the difference is significant, we conclude the sample does not represent the population. The reason has to be more than chance for difference to be explained.

Hypothesis testing helps us to conclude if the difference is due to sampling error or due to reasons beyond sampling error.

A common assumption is that the observations are independent and come from a random sample. The population distribution must be Normal or the sample size is large enough. If the sample size is large enough, we can invoke the Central Limit Theorem ( CLT ) regardless of the underlying population distribution. Due to CLT , sampling distribution of the sample statistic (such as sample mean) will be approximately a Normal distribution.

A rule of thumb is 30 observations but in some cases even 10 observations may be sufficient to invoke the CLT . Others require at least 50 observations.

When acceptance of \(H_0\) involves boundaries on both sides, we invoke the two-tailed test . For example, if we define \(H_0\) as sample drawn from population with age limits in the range of 25 to 35, then testing of \(H_0\) involves limits on both sides.

Suppose we define the population as greater than age 50, we are interested in rejecting a sample if the age is less than or equal to 50; we are not concerned about any upper limit. Here we invoke the one-tailed test . A one-tailed test could be left-tailed or right-tailed.

  • \(H_0\): \(\mu = \$ 2.62\)
  • \(H_1\) right-tailed: \(\mu > \$ 2.62\)
  • \(H_1\) two-tailed: \(\mu \neq \$ 2.62\)

Matrix showing types of errors in hypothesis testing. Source: howMed 2013.

  • Not accepting that sample represents population when in reality it does. This is called type-I or \(\alpha\) error .
  • Accepting that sample represents population when in reality it does not. This is called type-II or \(\beta\) error .

For instance, granting loan to an applicant with low credit score is \(\alpha\) error. Not granting loan to an applicant with high credit score is (\(\beta\)) error.

The symbols \(\alpha\) and \(\beta\) are used to represent the probability of type-I and type-II errors respectively.

Illustration of p-value in the population distribution. Source: Heard 2015.

The p-value can be interpreted as the probability of getting a result that's same or more extreme when the null hypothesis is true.

The observed sample mean \(\bar x\) is overlaid on population distribution of values with mean \(\mu\) and variance \(\sigma^2\). The proportion of values beyond \(\bar x\) and away from \(\mu\) (either in left tail or in right tail or in both tails) is p-value . If p-value <= \(\alpha\) we reject null hypothesis. The results are said to be statistically significant and not due to chance.

Assuming \(\alpha\)=0.05, p-value > 5%, we conclude the sample is highly likely to be drawn from population with mean \(\mu\) and variance \(\sigma^2\). We accept \((H_0)\). Otherwise, there's insufficient evidence to be part of population and we reject \(H_0\).

We preselect \(\alpha\) based on how much type-I error we're willing to tolerate. \(\alpha\) is called level of significance . The standard for level of significance is 0.05 but in some studies it may be 0.01 or 0.1. In the case of two-tailed tests, it's \(\alpha/2\) on either side.

As sample size increases the margin of error falls. Source: Wikipedia 2018.

Law of Large Numbers suggests larger the sample size, the more accurate the estimate. Accuracy means the variance of estimate will tend towards zero as sample size increases. Sample Size can be determined to suit accepted level of tolerance for deviation.

Confidence interval of sample mean is determined from sample mean offset by variance on either side of the sample mean. If the population variance is known, then we conduct z-test based on Normal distribution. Otherwise, variance has to be estimated and we use t-test based on t-distribution.

The formulae for determining sample size and confidence interval depends on what we to estimate (mean/variance/others), sampling distribution of estimate and standard deviation of estimate's sampling distribution.

Illustrating type-II or beta error. Source: Gordon 2011.

We overlay sample mean's distribution on population distribution, the proportion of overlap of sampling estimate's distribution on population distribution is \(\beta\) error .

Larger the overlap, larger the chance the sample does belong to population with mean \(\mu\) and variance \(\sigma^2\). Incidentally, despite the overlap, p-value may be less than 5%. This happens when sample mean is way off population mean, but the variance of sample mean is such that the overlap is significant.

Understanding alpha and beta errors together. Source: McNeese 2015.

Errors \(\alpha\) and \(\beta\) are dependent on each other. Increasing one decreases the other. Choosing suitable values for these depends on the cost of making these errors. Perhaps it's worse to convict an innocent person (type-I error) than to acquit a guilty person (type-II error), in which case we choose a lower \(\alpha\). But it's possible to decrease both errors but collecting more data.

  • Probability of rejecting the null hypothesis when, in fact, it is false.
  • Probability that a test of significance will pick up on an effect that is present.
  • Probability of avoiding a Type II error.

Low p-value and high power help us decisively conclude sample doesn't belong to population. When we cannot conclude decisively, it's advisable to go for larger samples and multiple samples.

In fact, power is increased by increasing sample size, effect sizes and significance levels. Variance also affects power.

Flowchart showing the classification of 1000 theoretical null hypotheses. Source: Biau et al. 2010, fig. 2.

A common misconception is to consider "p value as the probability that the null hypothesis is true". In fact, p-value is computed under the assumption that the null hypothesis is true. P-value is the probability of observing the values, or more extremes values, if the null hypothesis is true.

Another misconception, sometimes called base rate fallacy , is that under controlled \(\alpha\) and adequate power, statistically significant results correspond to true differences. This is not the case, as shown in the figure. Even with \(\alpha\)=5% and power=80%, 36% of statistically significant p-values will not report the true difference. This is because only 10% of the null hypotheses are false (base rate) and 80% power on these gives only 80 true positives.

P-value doesn't measure the size of the effect, for which confidence interval is a better approach. A drug that gives 25% improvement may not mean much if symptoms are innocuous compared to another drug that gives small improvement from a disease that leads to certain death. Context is therefore important.

Early work on statistical testing. Source: Huberty 1993, table 1.

The field of statistical testing probably starts with John Arbuthnot who applies it to test sex ratios at birth. Subsequently, others in the 18th and 19th centuries use it in other fields. However, modern terminology (null hypothesis, p-value, type-I or type-II errors) is formed only in the 20th century.

Table of P values published by Pearson. Source: Pearson 1900, pp. 175.

Pearson introduces the concept of p-value with the chi-squared test. He gives equations for calculating P and states that it's "the measure of the probability of a complex system of n errors occurring with a frequency as great or greater than that of the observed system."

Ronald A. Fisher develops the concept of p-value and shows how to calculate it in a wide variety of situations. He also notes that a value of 0.05 may be considered as conventional cut-off.

Illustrating both type-I and type-II errors with shaded regions. Source: Neyman and Pearson 1933, fig. 5.

Neyman and Pearson publish On the problem of the most efficient tests of statistical hypotheses . They introduce the notion of alternative hypotheses . They also describe both type-I and type-II errors (although they don't use these terms). They state, "Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behaviour with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong."

Johnson's textbook titled Statistical methods in research is perhaps the first to introduce to students the Neyman-Pearson hypothesis testing at a time when most textbooks follow Fisher's significance testing. Johnson uses the terms "error of the first kind" and "error of the second kind". In time, Fisher's approach is called P-value approach and the Neyman-Pearson approach is called fixed-α approach .

Carver makes the following suggestions: use of the term "statistically significant"; interpret results with respect to the data first and statistical significance second; and pay attention to the size of the effect.

  • Biau, David Jean, Brigitte M. Jolles, and Raphaël Porcher. 2010. "P Value and the Theory of Hypothesis Testing: An Explanation for New Researchers." Clin Orthop Relat Res, vol. 468, no. 3, pp. 885–892, March. Accessed 2021-05-28.
  • Carver, Ronald P. 1993. "The Case against Statistical Significance Testing, Revisited." The Journal of Experimental Education, vol. 61, no. 4, Statistical Significance Testing in Contemporary Practice, pp. 287-292. Accessed 2021-05-28.
  • Frankfort-Nachmias, Chava, Anna Leon-Guerrero, and Georgiann Davis. 2020. "Chapter 8: Testing Hypotheses." In: Social Statistics for a Diverse Society, SAGE Publications. Accessed 2021-05-28.
  • Gordon, Max. 2011. "How to best display graphically type II (beta) error, power and sample size?" August 11. Accessed 2018-05-18.
  • Heard, Stephen B. 2015. "In defence of the P-value" Types of Errors. February 9. Updated 2015-12-04. Accessed 2018-05-18.
  • Huberty, Carl J. 1993. "Historical Origins of Statistical Testing Practices: The Treatment of Fisher versus Neyman-Pearson Views in Textbooks." The Journal of Experimental Education, vol. 61, no. 4, Statistical Significance Testing in Contemporary Practice, pp. 317-333. Accessed 2021-05-28.
  • Kensler, Jennifer. 2013. "The Logic of Statistical Hypothesis Testing: Best Practice." Report, STAT T&E Center of Excellence. Accessed 2021-05-28.
  • Klappa, Peter. 2014. "Sampling error and hypothesis testing." On YouTube, December 10. Accessed 2021-05-28.
  • Lane, David M. 2021. "Section 10.8: Confidence Interval on the Mean." In: Introduction to Statistics, Rice University. Accessed 2021-05-28.
  • McNeese, Bill. 2015. "How Many Samples Do I Need?" SPC For Excel, BPI Consulting, June. Accessed 2018-05-18.
  • McNeese, Bill. 2017. "Interpretation of Alpha and p-Value." SPC for Excel, BPI Consulting, April 6. Updated 2020-04-25. Accessed 2021-05-28.
  • Neyman, J., and E. S. Pearson. 1933. "On the problem of the most efficient tests of statistical hypotheses." Philos Trans R Soc Lond A., vol. 231, issue 694-706, pp. 289–337. doi: 10.1098/rsta.1933.0009. Accessed 2021-05-28.
  • Nurse Key. 2017. "Chapter 15: Sampling." February 17. Accessed 2018-05-18.
  • Pearson, Karl. 1900. "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling." Philosophical Magazine, Series 5 (1876-1900), pp. 157–175. Accessed 2021-05-28.
  • Reinhart, Alex. 2015. "Statistics Done Wrong: The Woefully Complete Guide." No Starch Press.
  • Rolke, Wolfgang A. 2018. "Quantitative Variables." Department of Mathematical Sciences, University of Puerto Rico - Mayaguez. Accessed 2018-05-18.
  • Six-Sigma-Material.com. 2016. "Population & Samples." Six-Sigma-Material.com. Accessed 2018-05-18.
  • Walmsley, Angela and Michael C. Brown. 2017. "What is Power?" Statistics Teacher, American Statistical Association, September 15. Accessed 2021-05-28.
  • Wang, Jing. 2014. "Chapter 4.II: Hypothesis Testing." Applied Statistical Methods II, Univ. of Illinois Chicago. Accessed 2021-05-28.
  • Weigle, David C. 1994. "Historical Origins of Contemporary Statistical Testing Practices: How in the World Did Significance Testing Assume Its Current Place in Contemporary Analytic Practice?" Paper presented at the Annual Meeting of the Southwest Educational Research Association, SanAntonio, TX, January 27. Accessed 2021-05-28.
  • Wikipedia. 2018. "Margin of Error." May 1. Accessed 2018-05-18.
  • Wikipedia. 2021. "Law of large numbers." Wikipedia, March 26. Accessed 2021-05-28.
  • howMed. 2013. "Significance Testing and p value." August 4. Updated 2013-08-08. Accessed 2018-05-18.

Further Reading

  • Foley, Hugh. 2018. "Introduction to Hypothesis Testing." Skidmore College. Accessed 2018-05-18.
  • Buskirk, Trent. 2015. "Sampling Error in Surveys." Accessed 2018-05-18.
  • Zaiontz, Charles. 2014. "Assumptions for Statistical Tests." Real Statistics Using Excel. Accessed 2018-05-18.
  • DeCook, Rhonda. 2018. "Section 9.2: Types of Errors in Hypothesis testing." Stat1010 Notes, Department of Statistics and Actuarial Science, University of Iowa. Accessed 2018-05-18.

Article Stats

Author-wise stats for article edits.

Avatar of user arvindpdmn

  • Browse Articles
  • Community Outreach
  • About Devopedia
  • Author Guidelines
  • FAQ & Help
  • Forgot your password?
  • Create an account

hypothesis test errors

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3 hypothesis testing.

In reviewing hypothesis tests, we start first with the general idea. Then, we keep returning to the basic procedures of hypothesis testing, each time adding a little more detail.

The general idea of hypothesis testing involves:

  • Making an initial assumption.
  • Collecting evidence (data).
  • Based on the available evidence (data), deciding whether to reject or not reject the initial assumption.

Every hypothesis test — regardless of the population parameter involved — requires the above three steps.

Example S.3.1

Is normal body temperature really 98.6 degrees f section  .

Consider the population of many, many adults. A researcher hypothesized that the average adult body temperature is lower than the often-advertised 98.6 degrees F. That is, the researcher wants an answer to the question: "Is the average adult body temperature 98.6 degrees? Or is it lower?" To answer his research question, the researcher starts by assuming that the average adult body temperature was 98.6 degrees F.

Then, the researcher went out and tried to find evidence that refutes his initial assumption. In doing so, he selects a random sample of 130 adults. The average body temperature of the 130 sampled adults is 98.25 degrees.

Then, the researcher uses the data he collected to make a decision about his initial assumption. It is either likely or unlikely that the researcher would collect the evidence he did given his initial assumption that the average adult body temperature is 98.6 degrees:

  • If it is likely , then the researcher does not reject his initial assumption that the average adult body temperature is 98.6 degrees. There is not enough evidence to do otherwise.
  • either the researcher's initial assumption is correct and he experienced a very unusual event;
  • or the researcher's initial assumption is incorrect.

In statistics, we generally don't make claims that require us to believe that a very unusual event happened. That is, in the practice of statistics, if the evidence (data) we collected is unlikely in light of the initial assumption, then we reject our initial assumption.

Example S.3.2

Criminal trial analogy section  .

One place where you can consistently see the general idea of hypothesis testing in action is in criminal trials held in the United States. Our criminal justice system assumes "the defendant is innocent until proven guilty." That is, our initial assumption is that the defendant is innocent.

In the practice of statistics, we make our initial assumption when we state our two competing hypotheses -- the null hypothesis ( H 0 ) and the alternative hypothesis ( H A ). Here, our hypotheses are:

  • H 0 : Defendant is not guilty (innocent)
  • H A : Defendant is guilty

In statistics, we always assume the null hypothesis is true . That is, the null hypothesis is always our initial assumption.

The prosecution team then collects evidence — such as finger prints, blood spots, hair samples, carpet fibers, shoe prints, ransom notes, and handwriting samples — with the hopes of finding "sufficient evidence" to make the assumption of innocence refutable.

In statistics, the data are the evidence.

The jury then makes a decision based on the available evidence:

  • If the jury finds sufficient evidence — beyond a reasonable doubt — to make the assumption of innocence refutable, the jury rejects the null hypothesis and deems the defendant guilty. We behave as if the defendant is guilty.
  • If there is insufficient evidence, then the jury does not reject the null hypothesis . We behave as if the defendant is innocent.

In statistics, we always make one of two decisions. We either "reject the null hypothesis" or we "fail to reject the null hypothesis."

Errors in Hypothesis Testing Section  

Did you notice the use of the phrase "behave as if" in the previous discussion? We "behave as if" the defendant is guilty; we do not "prove" that the defendant is guilty. And, we "behave as if" the defendant is innocent; we do not "prove" that the defendant is innocent.

This is a very important distinction! We make our decision based on evidence not on 100% guaranteed proof. Again:

  • If we reject the null hypothesis, we do not prove that the alternative hypothesis is true.
  • If we do not reject the null hypothesis, we do not prove that the null hypothesis is true.

We merely state that there is enough evidence to behave one way or the other. This is always true in statistics! Because of this, whatever the decision, there is always a chance that we made an error .

Let's review the two types of errors that can be made in criminal trials:

Table S.3.2 shows how this corresponds to the two types of errors in hypothesis testing.

Note that, in statistics, we call the two types of errors by two different  names -- one is called a "Type I error," and the other is called  a "Type II error." Here are the formal definitions of the two types of errors:

There is always a chance of making one of these errors. But, a good scientific study will minimize the chance of doing so!

Making the Decision Section  

Recall that it is either likely or unlikely that we would observe the evidence we did given our initial assumption. If it is likely , we do not reject the null hypothesis. If it is unlikely , then we reject the null hypothesis in favor of the alternative hypothesis. Effectively, then, making the decision reduces to determining "likely" or "unlikely."

In statistics, there are two ways to determine whether the evidence is likely or unlikely given the initial assumption:

  • We could take the " critical value approach " (favored in many of the older textbooks).
  • Or, we could take the " P -value approach " (what is used most often in research, journal articles, and statistical software).

In the next two sections, we review the procedures behind each of these two approaches. To make our review concrete, let's imagine that μ is the average grade point average of all American students who major in mathematics. We first review the critical value approach for conducting each of the following three hypothesis tests about the population mean $\mu$:

In Practice

  • We would want to conduct the first hypothesis test if we were interested in concluding that the average grade point average of the group is more than 3.
  • We would want to conduct the second hypothesis test if we were interested in concluding that the average grade point average of the group is less than 3.
  • And, we would want to conduct the third hypothesis test if we were only interested in concluding that the average grade point average of the group differs from 3 (without caring whether it is more or less than 3).

Upon completing the review of the critical value approach, we review the P -value approach for conducting each of the above three hypothesis tests about the population mean \(\mu\). The procedures that we review here for both approaches easily extend to hypothesis tests about any other population parameter.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Crit Care Med
  • v.23(Suppl 3); 2019 Sep

An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors

Priya ranganathan.

1 Department of Anesthesiology, Critical Care and Pain, Tata Memorial Hospital, Mumbai, Maharashtra, India

2 Department of Surgical Oncology, Tata Memorial Centre, Mumbai, Maharashtra, India

The second article in this series on biostatistics covers the concepts of sample, population, research hypotheses and statistical errors.

How to cite this article

Ranganathan P, Pramesh CS. An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors. Indian J Crit Care Med 2019;23(Suppl 3):S230–S231.

Two papers quoted in this issue of the Indian Journal of Critical Care Medicine report. The results of studies aim to prove that a new intervention is better than (superior to) an existing treatment. In the ABLE study, the investigators wanted to show that transfusion of fresh red blood cells would be superior to standard-issue red cells in reducing 90-day mortality in ICU patients. 1 The PROPPR study was designed to prove that transfusion of a lower ratio of plasma and platelets to red cells would be superior to a higher ratio in decreasing 24-hour and 30-day mortality in critically ill patients. 2 These studies are known as superiority studies (as opposed to noninferiority or equivalence studies which will be discussed in a subsequent article).

SAMPLE VERSUS POPULATION

A sample represents a group of participants selected from the entire population. Since studies cannot be carried out on entire populations, researchers choose samples, which are representative of the population. This is similar to walking into a grocery store and examining a few grains of rice or wheat before purchasing an entire bag; we assume that the few grains that we select (the sample) are representative of the entire sack of grains (the population).

The results of the study are then extrapolated to generate inferences about the population. We do this using a process known as hypothesis testing. This means that the results of the study may not always be identical to the results we would expect to find in the population; i.e., there is the possibility that the study results may be erroneous.

HYPOTHESIS TESTING

A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the “alternate” hypothesis, and the opposite is called the “null” hypothesis; every study has a null hypothesis and an alternate hypothesis. For superiority studies, the alternate hypothesis states that one treatment (usually the new or experimental treatment) is superior to the other; the null hypothesis states that there is no difference between the treatments (the treatments are equal). For example, in the ABLE study, we start by stating the null hypothesis—there is no difference in mortality between groups receiving fresh RBCs and standard-issue RBCs. We then state the alternate hypothesis—There is a difference between groups receiving fresh RBCs and standard-issue RBCs. It is important to note that we have stated that the groups are different, without specifying which group will be better than the other. This is known as a two-tailed hypothesis and it allows us to test for superiority on either side (using a two-sided test). This is because, when we start a study, we are not 100% certain that the new treatment can only be better than the standard treatment—it could be worse, and if it is so, the study should pick it up as well. One tailed hypothesis and one-sided statistical testing is done for non-inferiority studies, which will be discussed in a subsequent paper in this series.

STATISTICAL ERRORS

There are two possibilities to consider when interpreting the results of a superiority study. The first possibility is that there is truly no difference between the treatments but the study finds that they are different. This is called a Type-1 error or false-positive error or alpha error. This means falsely rejecting the null hypothesis.

The second possibility is that there is a difference between the treatments and the study does not pick up this difference. This is called a Type 2 error or false-negative error or beta error. This means falsely accepting the null hypothesis.

The power of the study is the ability to detect a difference between groups and is the converse of the beta error; i.e., power = 1-beta error. Alpha and beta errors are finalized when the protocol is written and form the basis for sample size calculation for the study. In an ideal world, we would not like any error in the results of our study; however, we would need to do the study in the entire population (infinite sample size) to be able to get a 0% alpha and beta error. These two errors enable us to do studies with realistic sample sizes, with the compromise that there is a small possibility that the results may not always reflect the truth. The basis for this will be discussed in a subsequent paper in this series dealing with sample size calculation.

Conventionally, type 1 or alpha error is set at 5%. This means, that at the end of the study, if there is a difference between groups, we want to be 95% certain that this is a true difference and allow only a 5% probability that this difference has occurred by chance (false positive). Type 2 or beta error is usually set between 10% and 20%; therefore, the power of the study is 90% or 80%. This means that if there is a difference between groups, we want to be 80% (or 90%) certain that the study will detect that difference. For example, in the ABLE study, sample size was calculated with a type 1 error of 5% (two-sided) and power of 90% (type 2 error of 10%) (1).

Table 1 gives a summary of the two types of statistical errors with an example

Statistical errors

In the next article in this series, we will look at the meaning and interpretation of ‘ p ’ value and confidence intervals for hypothesis testing.

Source of support: Nil

Conflict of interest: None

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

AP®︎/College Statistics

Course: ap®︎/college statistics   >   unit 10, introduction to type i and type ii errors.

  • Examples identifying Type I and Type II errors
  • Type I vs Type II error
  • Introduction to power in significance tests
  • Examples thinking about power in significance tests
  • Error probabilities and power
  • Consequences of errors and significance

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Good Answer

Video transcript

What are type I and type II errors?

The probability of rejecting the null hypothesis when it is false is equal to 1–β. This value is the power of the test.

Example of type I and type II error

To understand the interrelationship between type I and type II error, and to determine which error has more severe consequences for your situation, consider the following example.

Null hypothesis (H 0 ): μ 1 = μ 2

The two medications are equally effective.

Alternative hypothesis (H 1 ): μ 1 ≠ μ 2

The two medications are not equally effective.

A type I error occurs if the researcher rejects the null hypothesis and concludes that the two medications are different when, in fact, they are not. If the medications have the same effectiveness, the researcher may not consider this error too severe because the patients still benefit from the same level of effectiveness regardless of which medicine they take. However, if a type II error occurs, the researcher fails to reject the null hypothesis when it should be rejected. That is, the researcher concludes that the medications are the same when, in fact, they are different. This error is potentially life-threatening if the less-effective medication is sold to the public instead of the more effective one.

As you conduct your hypothesis tests, consider the risks of making type I and type II errors. If the consequences of making one type of error are more severe or costly than making the other type of error, then choose a level of significance and a power for the test that will reflect the relative severity of those consequences.

  • Minitab.com
  • License Portal
  • Cookie Settings

You are now leaving support.minitab.com.

Click Continue to proceed to:

Statistics for Sustainability

  • Find Our Services
  • _Data Analysis & Tutoring
  • __Data Collection & Entry
  • __Data Cleaning & Preprocessing
  • __Data Analysis & Interpretation
  • __Data Modelling & Prediction
  • _Statistical Software Tutoring
  • __Microsoft Excel
  • _Analytical Strategies Tutoring
  • __Business Analytics
  • __Digital Marketing Analytics
  • __Findings Visualization & Reporting
  • __Statistical Experimental Design
  • __Hypothesis Setting & Testing
  • __Statistical Research Methodology
  • Applied Analytics
  • _Health Analytics
  • _Agricultural Analytics
  • _Business Analytics
  • _Environmental Analytics
  • Statistical Software
  • _Microsoft Excel
  • Statistical Reporting
  • _Descriptive Statistics
  • _Inferential Statistics
  • _Exploratory Data Analysis
  • _Predictive Analytics
  • _Contact Us
  • _Disclaimer
  • _Privacy Policy
  • _Terms & Conditions
  • _Cookies Policy

Types of Errors in Hypothesis Testing: A Practical Guide

In the world of hypothesis testing, the quest to draw meaningful conclusions about populations based on sample data is accompanied by the inevitability of errors. Understanding the types of errors that can occur during hypothesis testing is crucial for researchers, statisticians, and decision-makers. This comprehensive guide explores the intricacies of Type I and Type II errors, shedding light on their definitions, causes, consequences, and practical implications in the context of hypothesis testing.

hypothesis test errors

The Basics of Hypothesis Testing

Before delving into the nuances of errors, let's briefly revisit the fundamental concepts of hypothesis testing:

1. Null Hypothesis (Ho):

A statement that there is no significant difference or effect.

2. Alternative Hypothesis (H1 or Ha):

A statement that contradicts the null hypothesis, suggesting a significant difference or effect.

3. Significance Level (α):

The probability of rejecting the null hypothesis when it is true. Commonly set at 0.05 or 5%.

4. Test Statistic:

A numerical summary of the sample data used to make a decision about the null hypothesis.

5. P-Value:

The probability of obtaining results as extreme or more extreme than the observed data, assuming the null hypothesis is true.

Type I Error (False Positive)

Definition:.

A Type I error occurs when the null hypothesis is incorrectly rejected when it is actually true. In other words, it is the mistake of claiming evidence for an effect or difference that doesn't exist.

Low Significance Level (α):

Setting a very low significance level increases the probability of committing a Type I error.

Sample Size:

In small sample sizes, the variability of data can lead to a higher chance of observing extreme values that may incorrectly lead to the rejection of the null hypothesis.

Random Variation:

Natural variation in data, especially when dealing with inherently variable phenomena, can contribute to the occurrence of Type I errors.

Consequences:

Incorrect conclusions:.

Concluding that there is a significant effect or difference when there isn't one.

Wasted Resources:

Resources may be wasted on pursuing non-existent effects, leading to misdirected efforts.

Type II Error (False Negative)

A Type II error occurs when the null hypothesis is not rejected when it is actually false. In other words, it is the mistake of failing to detect a real effect or difference.

High Significance Level (α):

Setting a very high significance level increases the risk of overlooking a real effect.

Inadequate sample sizes may lack the power to detect real effects, especially when they are subtle.

Variability:

High variability in data can make it challenging to distinguish between the null and alternative hypotheses.

Missed Opportunities:

Failing to identify a real effect or difference that could have practical or theoretical significance.

Incomplete Understanding:

Incomplete knowledge about the phenomenon under study, leading to potential misunderstandings.

hypothesis test errors

Balancing Type I and Type II Errors

The power of a test:.

The power of a statistical test is its ability to correctly reject a false null hypothesis, minimizing the risk of Type II errors. It is influenced by factors such as the significance level (α), sample size, effect size, and variability in the data.

Practical Implications:

Adjusting significance level:.

Researchers must carefully choose the significance level based on the consequences of Type I and Type II errors. A balance is needed to control both error rates effectively.

Increasing Sample Size:

Larger sample sizes enhance the power of a test, reducing the likelihood of Type II errors.

Consideration of Consequences:

The severity of consequences associated with each type of error should guide the decision on significance levels and sample sizes.

Real-World Applications

1. medical diagnostics:.

In medical testing, a Type I error could lead to an incorrect diagnosis of a disease that is not present (false positive), while a Type II error could result in a failure to detect a disease that is actually present (false negative).

2. Quality Control in Manufacturing:

Type I errors may lead to the rejection of high-quality products (false positives), while Type II errors may result in accepting defective products (false negatives).

3. Criminal Justice:

In criminal trials, a Type I error corresponds to convicting an innocent person (false positive), while a Type II error involves acquitting a guilty person (false negative).

4. Market Research:

In market research, Type I errors may lead to the adoption of ineffective strategies based on false positive results, while Type II errors may result in missing out on potentially successful strategies.

5. Environmental Impact Studies:

In studies assessing environmental impacts, a Type I error may lead to unnecessary regulations based on false positive findings, while a Type II error could result in failure to detect and address real environmental threats.

Minimizing Errors: Practical Strategies

1. adjust significance levels:.

Choose significance levels based on the consequences of each type of error, considering the relative importance of false positives and false negatives.

2. Increase Sample Size:

Larger sample sizes improve the power of a test, reducing the risk of Type II errors.

3. Use Prior Knowledge:

Incorporate prior knowledge and expertise into the decision-making process, guiding the choice of significance levels and sample sizes.

4. Replication of Studies:

Replicating studies can help validate findings and reduce the risk of Type I errors due to random variation.

5. Continuous Monitoring:

Continuously monitor and evaluate the outcomes of decisions based on hypothesis tests, allowing for adjustments based on new information.

In the intricate landscape of hypothesis testing, the potential for errors is ever-present, and understanding their nature is essential for informed decision-making. Type I and Type II errors carry distinct consequences, and balancing the risks associated with each is crucial for designing robust experiments, formulating effective policies, and drawing reliable conclusions from data.

Researchers and decision-makers must navigate the delicate trade-off between the desire to detect real effects and the need to avoid false positives. By embracing practical strategies, considering the context of the study, and continuously refining methodologies, the impact of errors in hypothesis testing can be minimized, paving the way for more accurate and meaningful scientific advancements and decisions.

{getProduct} $button={Contact Us Now} $price={From $20} $sale={DATA ANALYSIS HELP} $free={It is Free?} $icon={whatsapp} $style={1}

Post a Comment

Our website uses cookies to provide you with the best user experience. By using our site you consent to our use of cookies. Learn more

نموذج الاتصال

  • Machine Learning Tutorial
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning
  • Data Analysis with Python

Introduction to Data Analysis

  • What is Data Analysis?
  • Data Analytics and its type
  • How to Install Numpy on Windows?
  • How to Install Pandas in Python?
  • How to Install Matplotlib on python?
  • How to Install Python Tensorflow in Windows?

Data Analysis Libraries

  • Pandas Tutorial
  • NumPy Tutorial - Python Library
  • Data Analysis with SciPy
  • Introduction to TensorFlow

Data Visulization Libraries

  • Matplotlib Tutorial
  • Python Seaborn Tutorial
  • Plotly tutorial
  • Introduction to Bokeh in Python

Exploratory Data Analysis (EDA)

  • Univariate, Bivariate and Multivariate data and its analysis
  • Measures of Central Tendency in Statistics
  • Measures of spread - Range, Variance, and Standard Deviation
  • Interquartile Range and Quartile Deviation using NumPy and SciPy
  • Anova Formula
  • Skewness of Statistical Data
  • How to Calculate Skewness and Kurtosis in Python?
  • Difference Between Skewness and Kurtosis
  • Histogram | Meaning, Example, Types and Steps to Draw
  • Interpretations of Histogram
  • Quantile Quantile plots
  • What is Univariate, Bivariate & Multivariate Analysis in Data Visualisation?
  • Using pandas crosstab to create a bar plot
  • Exploring Correlation in Python
  • Mathematics | Covariance and Correlation
  • Introduction to Factor Analysis
  • Data Mining - Cluster Analysis
  • MANOVA Test in R Programming
  • Python - Central Limit Theorem
  • Probability Distribution Function
  • Probability Density Estimation & Maximum Likelihood Estimation
  • Exponential Distribution in R Programming - dexp(), pexp(), qexp(), and rexp() Functions
  • Mathematics | Probability Distributions Set 4 (Binomial Distribution)
  • Poisson Distribution - Definition, Formula, Table and Examples
  • P-Value: Comprehensive Guide to Understand, Apply, and Interpret
  • Z-Score in Statistics
  • How to Calculate Point Estimates in R?
  • Confidence Interval
  • Chi-square test in Machine Learning

Understanding Hypothesis Testing

Data preprocessing.

  • ML | Data Preprocessing in Python
  • ML | Overview of Data Cleaning
  • ML | Handling Missing Values
  • Detect and Remove the Outliers using Python

Data Transformation

  • Data Normalization Machine Learning
  • Sampling distribution Using Python

Time Series Data Analysis

  • Data Mining - Time-Series, Symbolic and Biological Sequences Data
  • Basic DateTime Operations in Python
  • Time Series Analysis & Visualization in Python
  • How to deal with missing values in a Timeseries in Python?
  • How to calculate MOVING AVERAGE in a Pandas DataFrame?
  • What is a trend in time series?
  • How to Perform an Augmented Dickey-Fuller Test in R
  • AutoCorrelation

Case Studies and Projects

  • Top 8 Free Dataset Sources to Use for Data Science Projects
  • Step by Step Predictive Analysis - Machine Learning
  • 6 Tips for Creating Effective Data Visualizations

Hypothesis testing involves formulating assumptions about population parameters based on sample statistics and rigorously evaluating these assumptions against empirical evidence. This article sheds light on the significance of hypothesis testing and the critical steps involved in the process.

What is Hypothesis Testing?

Hypothesis testing is a statistical method that is used to make a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. 

Example: You say an average height in the class is 30 or a boy is taller than a girl. All of these is an assumption that we are assuming, and we need some statistical way to prove these. We need some mathematical conclusion whatever we are assuming is true.

Defining Hypotheses

\mu

Key Terms of Hypothesis Testing

\alpha

  • P-value: The P value , or calculated probability, is the probability of finding the observed/extreme results when the null hypothesis(H0) of a study-given problem is true. If your P-value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample claims to support the alternative hypothesis.
  • Test Statistic: The test statistic is a numerical value calculated from sample data during a hypothesis test, used to determine whether to reject the null hypothesis. It is compared to a critical value or p-value to make decisions about the statistical significance of the observed results.
  • Critical value : The critical value in statistics is a threshold or cutoff point used to determine whether to reject the null hypothesis in a hypothesis test.
  • Degrees of freedom: Degrees of freedom are associated with the variability or freedom one has in estimating a parameter. The degrees of freedom are related to the sample size and determine the shape.

Why do we use Hypothesis Testing?

Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing. 

One-Tailed and Two-Tailed Test

One tailed test focuses on one direction, either greater than or less than a specified value. We use a one-tailed test when there is a clear directional expectation based on prior knowledge or theory. The critical region is located on only one side of the distribution curve. If the sample falls into this critical region, the null hypothesis is rejected in favor of the alternative hypothesis.

One-Tailed Test

There are two types of one-tailed test:

\mu \geq 50

Two-Tailed Test

A two-tailed test considers both directions, greater than and less than a specified value.We use a two-tailed test when there is no specific directional expectation, and want to detect any significant difference.

\mu =

What are Type 1 and Type 2 errors in Hypothesis Testing?

In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis.

\alpha

How does Hypothesis Testing work?

Step 1: define null and alternative hypothesis.

H_0

We first identify the problem about which we want to make an assumption keeping in mind that our assumption should be contradictory to one another, assuming Normally distributed data.

Step 2 – Choose significance level

\alpha

Step 3 – Collect and Analyze data.

Gather relevant data through observation or experimentation. Analyze the data using appropriate statistical methods to obtain a test statistic.

Step 4-Calculate Test Statistic

The data for the tests are evaluated in this step we look for various scores based on the characteristics of data. The choice of the test statistic depends on the type of hypothesis test being conducted.

There are various hypothesis tests, each appropriate for various goal to calculate our test. This could be a Z-test , Chi-square , T-test , and so on.

  • Z-test : If population means and standard deviations are known. Z-statistic is commonly used.
  • t-test : If population standard deviations are unknown. and sample size is small than t-test statistic is more appropriate.
  • Chi-square test : Chi-square test is used for categorical data or for testing independence in contingency tables
  • F-test : F-test is often used in analysis of variance (ANOVA) to compare variances or test the equality of means across multiple groups.

We have a smaller dataset, So, T-test is more appropriate to test our hypothesis.

T-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.

Step 5 – Comparing Test Statistic:

In this stage, we decide where we should accept the null hypothesis or reject the null hypothesis. There are two ways to decide where we should accept or reject the null hypothesis.

Method A: Using Crtical values

Comparing the test statistic and tabulated critical value we have,

  • If Test Statistic>Critical Value: Reject the null hypothesis.
  • If Test Statistic≤Critical Value: Fail to reject the null hypothesis.

Note: Critical values are predetermined threshold values that are used to make a decision in hypothesis testing. To determine critical values for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Method B: Using P-values

We can also come to an conclusion using the p-value,

p\leq\alpha

Note : The p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed in the sample, assuming the null hypothesis is true. To determine p-value for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Step 7- Interpret the Results

At last, we can conclude our experiment using method A or B.

Calculating test statistic

To validate our hypothesis about a population parameter we use statistical functions . We use the z-score, p-value, and level of significance(alpha) to make evidence for our hypothesis for normally distributed data .

1. Z-statistics:

When population means and standard deviations are known.

z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}

  • μ represents the population mean, 
  • σ is the standard deviation
  • and n is the size of the sample.

2. T-Statistics

T test is used when n<30,

t-statistic calculation is given by:

t=\frac{x̄-μ}{s/\sqrt{n}}

  • t = t-score,
  • x̄ = sample mean
  • μ = population mean,
  • s = standard deviation of the sample,
  • n = sample size

3. Chi-Square Test

Chi-Square Test for Independence categorical Data (Non-normally distributed) using:

\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

  • i,j are the rows and columns index respectively.

E_{ij}

Real life Hypothesis Testing example

Let’s examine hypothesis testing using two real life situations,

Case A: D oes a New Drug Affect Blood Pressure?

Imagine a pharmaceutical company has developed a new drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to market, they need to conduct a study to assess its impact on blood pressure.

  • Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119
  • After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114

Step 1 : Define the Hypothesis

  • Null Hypothesis : (H 0 )The new drug has no effect on blood pressure.
  • Alternate Hypothesis : (H 1 )The new drug has an effect on blood pressure.

Step 2: Define the Significance level

Let’s consider the Significance level at 0.05, indicating rejection of the null hypothesis.

If the evidence suggests less than a 5% chance of observing the results due to random variation.

Step 3 : Compute the test statistic

Using paired T-test analyze the data to obtain a test statistic and a p-value.

The test statistic (e.g., T-statistic) is calculated based on the differences between blood pressure measurements before and after treatment.

t = m/(s/√n)

  • m  = mean of the difference i.e X after, X before
  • s  = standard deviation of the difference (d) i.e d i ​= X after, i ​− X before,
  • n  = sample size,

then, m= -3.9, s= 1.8 and n= 10

we, calculate the , T-statistic = -9 based on the formula for paired t test

Step 4: Find the p-value

The calculated t-statistic is -9 and degrees of freedom df = 9, you can find the p-value using statistical software or a t-distribution table.

thus, p-value = 8.538051223166285e-06

Step 5: Result

  • If the p-value is less than or equal to 0.05, the researchers reject the null hypothesis.
  • If the p-value is greater than 0.05, they fail to reject the null hypothesis.

Conclusion: Since the p-value (8.538051223166285e-06) is less than the significance level (0.05), the researchers reject the null hypothesis. There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

Python Implementation of Hypothesis Testing

Let’s create hypothesis testing with python, where we are testing whether a new drug affects blood pressure. For this example, we will use a paired T-test. We’ll use the scipy.stats library for the T-test.

Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations.

We will implement our first real life problem via python,

In the above example, given the T-statistic of approximately -9 and an extremely small p-value, the results indicate a strong case to reject the null hypothesis at a significance level of 0.05. 

  • The results suggest that the new drug, treatment, or intervention has a significant effect on lowering blood pressure.
  • The negative T-statistic indicates that the mean blood pressure after treatment is significantly lower than the assumed population mean before treatment.

Case B : Cholesterol level in a population

Data: A sample of 25 individuals is taken, and their cholesterol levels are measured.

Cholesterol Levels (mg/dL): 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198, 202, 208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192, 205.

Populations Mean = 200

Population Standard Deviation (σ): 5 mg/dL(given for this problem)

Step 1: Define the Hypothesis

  • Null Hypothesis (H 0 ): The average cholesterol level in a population is 200 mg/dL.
  • Alternate Hypothesis (H 1 ): The average cholesterol level in a population is different from 200 mg/dL.

As the direction of deviation is not given , we assume a two-tailed test, and based on a normal distribution table, the critical values for a significance level of 0.05 (two-tailed) can be calculated through the z-table and are approximately -1.96 and 1.96.

(203.8 - 200) / (5 \div \sqrt{25})

Step 4: Result

Since the absolute value of the test statistic (2.04) is greater than the critical value (1.96), we reject the null hypothesis. And conclude that, there is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL

Limitations of Hypothesis Testing

  • Although a useful technique, hypothesis testing does not offer a comprehensive grasp of the topic being studied. Without fully reflecting the intricacy or whole context of the phenomena, it concentrates on certain hypotheses and statistical significance.
  • The accuracy of hypothesis testing results is contingent on the quality of available data and the appropriateness of statistical methods used. Inaccurate data or poorly formulated hypotheses can lead to incorrect conclusions.
  • Relying solely on hypothesis testing may cause analysts to overlook significant patterns or relationships in the data that are not captured by the specific hypotheses being tested. This limitation underscores the importance of complimenting hypothesis testing with other analytical approaches.

Hypothesis testing stands as a cornerstone in statistical analysis, enabling data scientists to navigate uncertainties and draw credible inferences from sample data. By systematically defining null and alternative hypotheses, choosing significance levels, and leveraging statistical tests, researchers can assess the validity of their assumptions. The article also elucidates the critical distinction between Type I and Type II errors, providing a comprehensive understanding of the nuanced decision-making process inherent in hypothesis testing. The real-life example of testing a new drug’s effect on blood pressure using a paired T-test showcases the practical application of these principles, underscoring the importance of statistical rigor in data-driven decision-making.

Frequently Asked Questions (FAQs)

1. what are the 3 types of hypothesis test.

There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed. Right-tailed tests assess if a parameter is greater, left-tailed if lesser. Two-tailed tests check for non-directional differences, greater or lesser.

2.What are the 4 components of hypothesis testing?

Null Hypothesis ( ): No effect or difference exists. Alternative Hypothesis ( ): An effect or difference exists. Significance Level ( ): Risk of rejecting null hypothesis when it’s true (Type I error). Test Statistic: Numerical value representing observed evidence against null hypothesis.

3.What is hypothesis testing in ML?

Statistical method to evaluate the performance and validity of machine learning models. Tests specific hypotheses about model behavior, like whether features influence predictions or if a model generalizes well to unseen data.

4.What is the difference between Pytest and hypothesis in Python?

Pytest purposes general testing framework for Python code while Hypothesis is a Property-based testing framework for Python, focusing on generating test cases based on specified properties of the code.

Please Login to comment...

  • data-science
  • Data Science
  • Machine Learning
  • WhatsApp To Launch New App Lock Feature
  • Top Design Resources for Icons
  • Node.js 21 is here: What’s new
  • Zoom: World’s Most Innovative Companies of 2024
  • 30 OOPs Interview Questions and Answers (2024)

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

IMAGES

  1. Hypothesis testing

    hypothesis test errors

  2. Hypothesis Testing and Types of Errors

    hypothesis test errors

  3. Statistical Hypothesis Testing Definition

    hypothesis test errors

  4. 8-Errors in Hypothesis Testing Matistics

    hypothesis test errors

  5. Hypothesis Testing and Difference Between Type I and Type II Error

    hypothesis test errors

  6. PPT

    hypothesis test errors

VIDEO

  1. Hypothesis testing and errors in hypothesis testing

  2. 51- IM (Hypothesis Test)

  3. Hypothesis

  4. Errors in Hypothesis Testing (Type I &Type II error) #hypotheses #hypothesestesting #statistics

  5. Hypothesis Testing

  6. Hypothesis Testing

COMMENTS

  1. Types I & Type II Errors in Hypothesis Testing

    Ideally, a hypothesis test fails to reject the null hypothesis when the effect is not present in the population, and it rejects the null hypothesis when the effect exists. Statisticians define two types of errors in hypothesis testing. Creatively, they call these errors Type I and Type II errors.

  2. Type I & Type II Errors

    Using hypothesis testing, you can make decisions about whether your data support or refute your research predictions with null and alternative hypotheses. Hypothesis testing starts with the assumption of no difference between groups or no relationship between variables in the population—this is the null hypothesis.

  3. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  4. 6.1

    6.1 - Type I and Type II Errors. When conducting a hypothesis test there are two possible decisions: reject the null hypothesis or fail to reject the null hypothesis. You should remember though, hypothesis testing uses data from a sample to make an inference about a population. When conducting a hypothesis test we do not know the population ...

  5. 9.3: Outcomes and the Type I and Type II Errors

    The following are examples of Type I and Type II errors. Example 9.3.1 9.3. 1: Type I vs. Type II errors. Suppose the null hypothesis, H0 H 0, is: Frank's rock climbing equipment is safe. Type I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe.

  6. Type I & Type II Errors

    Using hypothesis testing, you can make decisions about whether your data support or refute your research predictions with null and alternative hypotheses. Hypothesis testing starts with the assumption of no difference between groups or no relationship between variables in the population—this is the null hypothesis.

  7. 8.2: Type I and II Errors

    The critical value is a cutoff point on the horizontal axis of the sampling distribution that you can compare your test statistic to see if you should reject the null hypothesis. For a left-tailed test the critical value will always be on the left side of the sampling distribution, the right-tailed test will always be on the right side, and a ...

  8. 9.2 Outcomes and the Type I and Type II Errors

    Introduction; 9.1 Null and Alternative Hypotheses; 9.2 Outcomes and the Type I and Type II Errors; 9.3 Distribution Needed for Hypothesis Testing; 9.4 Rare Events, the Sample, and the Decision and Conclusion; 9.5 Additional Information and Full Hypothesis Test Examples; 9.6 Hypothesis Testing of a Single Mean and Single Proportion; Key Terms; Chapter Review; Formula Review

  9. 9.1: Introduction to Hypothesis Testing

    In hypothesis testing, the goal is to see if there is sufficient statistical evidence to reject a presumed null hypothesis in favor of a conjectured alternative hypothesis. The null hypothesis is usually denoted H0 H 0 while the alternative hypothesis is usually denoted H1 H 1. An hypothesis test is a statistical decision; the conclusion will ...

  10. Hypothesis testing, type I and type II errors

    The alternative hypothesis cannot be tested directly; it is accepted by exclusion if the test of statistical significance rejects the null hypothesis. One- and two-tailed alternative hypotheses A one-tailed (or one-sided) hypothesis specifies the direction of the association between the predictor and outcome variables.

  11. Type I, II, and III statistical errors: A brief overview

    When conducting hypothesis testing, one must guard against the possibility of Type I and II errors, since both have the potential to adversely affect healthcare decisions and policies, particularly if treatments and interventions are either promoted inappropriately or withheld due to inability to detect their true impact.

  12. Statistical hypothesis test

    A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently support a particular hypothesis. ... Hypothesis testing (and Type I/II errors) was devised by Neyman and Pearson as a more objective alternative to Fisher's p-value, ...

  13. Hypothesis Testing and Types of Errors

    Hypothesis Testing and Types of Errors. Illustrating a sample drawn from a population. Source: Six-Sigma-Material.com. Suppose we want to study income of a population. We study a sample from the population and draw conclusions. The sample should represent the population for our study to be a reliable one. Null hypothesis (H 0) ( H 0) is that ...

  14. S.3 Hypothesis Testing

    hypothesis testing. S.3 Hypothesis Testing. In reviewing hypothesis tests, we start first with the general idea. Then, we keep returning to the basic procedures of hypothesis testing, each time adding a little more detail. The general idea of hypothesis testing involves: Making an initial assumption. Collecting evidence (data).

  15. An Introduction to Statistics: Understanding Hypothesis Testing and

    An Introduction to Statistics: Understanding Hypothesis Testing and Statistical Errors. Indian J Crit Care Med 2019;23 (Suppl 3):S230-S231. Two papers quoted in this issue of the Indian Journal of Critical Care Medicine report. The results of studies aim to prove that a new intervention is better than (superior to) an existing treatment.

  16. Introduction to Type I and Type II errors (video)

    And the null hypothesis tends to be kind of what was always assumed or the status quo while the alternative hypothesis, hey, there's news here, there's something alternative here. And to test it, and we're really testing the null hypothesis. We're gonna decide whether we want to reject or fail to reject the null hypothesis, we take a sample.

  17. What are type I and type II errors?

    No hypothesis test is 100% certain. Because the test is based on probabilities, there is always a chance of making an incorrect conclusion. When you do a hypothesis test, two types of errors are possible: type I and type II. The risks of these two errors are inversely related and determined by the level of significance and the power for the test.

  18. Hypothesis Testing along with Type I & Type II Errors explained simply

    Type I and Type II Errors. This type of statistical analysis is prone to errors. In the above example, it might be the case that the 20 students chosen are already very engaged and we wrongly decided the high mean engagement ratio is because of the new feature. ... I hope the term Hypothesis Testing will no longer be a foreign concept. Although ...

  19. Clarifying Type I and Type II Errors in Hypothesis Testing

    This case is a type I error, which is more generally referred to as a false positive. In hypothesis testing, you need to decide what degree of confidence, or trust, for which you can dismiss the null hypothesis. If a scientist were to set alpha (𝛼) =.05, this means that there is a 5 percent probability that they would reject the null ...

  20. Type I & II Errors and Sample Size Calculation in Hypothesis Testing

    Photo by Scott Graham on Unsplash. In the world of statistics and data analysis, hypothesis testing is a fundamental concept that plays a vital role in making informed decisions. In this blog, we will delve deeper into hypothesis testing, specifically focusing on how to reduce type I and type II errors.We will discuss the factors that influence these errors, such as significance level, sample ...

  21. Understanding Type-I and Type-II Errors in Hypothesis Testing

    The below post is written to provide an intuitive yet detailed explanation of Type-I and Type-II errors that happen during statistical hypothesis testing. Hypothesis Testing. Hypothesis Testing is the domain wherein data scientists test their assumptions (or hypothesis) around population parameters by observing the sample data.

  22. Types of Errors in Hypothesis Testing: A Practical Guide

    Understanding the types of errors that can occur during hypothesis testing is crucial for researchers, statisticians, and decision-makers. This comprehensive guide explores the intricacies of Type I and Type II errors, shedding light on their definitions, causes, consequences, and practical implications in the context of hypothesis testing.

  23. Understanding Hypothesis Testing

    In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis.