User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

25.3 - calculating sample size.

Before we learn how to calculate the sample size that is necessary to achieve a hypothesis test with a certain power, it might behoove us to understand the effect that sample size has on power. Let's investigate by returning to our IQ example.

Example 25-3 Section  

Let \(X\) denote the IQ of a randomly selected adult American. Assume, a bit unrealistically again, that \(X\) is normally distributed with unknown mean \(\mu\) and (a strangely known) standard deviation of 16. This time, instead of taking a random sample of \(n=16\) students, let's increase the sample size to \(n=64\). And, while setting the probability of committing a Type I error to \(\alpha=0.05\), test the null hypothesis \(H_0:\mu=100\) against the alternative hypothesis that \(H_A:\mu>100\).

What is the power of the hypothesis test when \(\mu=108\), \(\mu=112\), and \(\mu=116\)?

Setting \(\alpha\), the probability of committing a Type I error, to 0.05, implies that we should reject the null hypothesis when the test statistic \(Z\ge 1.645\), or equivalently, when the observed sample mean is 103.29 or greater:

\( \bar{x} = \mu + z \left(\dfrac{\sigma}{\sqrt{n}} \right) = 100 +1.645\left(\dfrac{16}{\sqrt{64}} \right) = 103.29\)

Therefore, the power function \K(\mu)\), when \(\mu>100\) is the true value, is:

\( K(\mu) = P(\bar{X} \ge 103.29 | \mu) = P \left(Z \ge \dfrac{103.29 - \mu}{16 / \sqrt{64}} \right) = 1 - \Phi \left(\dfrac{103.29 - \mu}{2} \right)\)

Therefore, the probability of rejecting the null hypothesis at the \(\alpha=0.05\) level when \(\mu=108\) is 0.9907, as calculated here:

\(K(108) = 1 - \Phi \left( \dfrac{103.29-108}{2} \right) = 1- \Phi(-2.355) = 0.9907 \)

And, the probability of rejecting the null hypothesis at the \(\alpha=0.05\) level when \(\mu=112\) is greater than 0.9999, as calculated here:

\( K(112) = 1 - \Phi \left( \dfrac{103.29-112}{2} \right) = 1- \Phi(-4.355) = 0.9999\ldots \)

And, the probability of rejecting the null hypothesis at the \(\alpha=0.05\) level when \(\mu=116\) is greater than 0.999999, as calculated here:

\( K(116) = 1 - \Phi \left( \dfrac{103.29-116}{2} \right) = 1- \Phi(-6.355) = 0.999999... \)

In summary, in the various examples throughout this lesson, we have calculated the power of testing \(H_0:\mu=100\) against \(H_A:\mu>100\) for two sample sizes ( \(n=16\) and \(n=64\)) and for three possible values of the mean ( \(\mu=108\), \(\mu=112\), and \(\mu=116\)). Here's a summary of our power calculations:

As you can see, our work suggests that for a given value of the mean \(\mu\) under the alternative hypothesis, the larger the sample size \(n\), the greater the power \(K(\mu)\) . Perhaps there is no better way to see this than graphically by plotting the two power functions simultaneously, one when \(n=16\) and the other when \(n=64\):

As this plot suggests, if we are interested in increasing our chance of rejecting the null hypothesis when the alternative hypothesis is true, we can do so by increasing our sample size \(n\). This benefit is perhaps even greatest for values of the mean that are close to the value of the mean assumed under the null hypothesis. Let's take a look at two examples that illustrate the kind of sample size calculation we can make to ensure our hypothesis test has sufficient power.

Example 25-4 Section  

corn field

Let \(X\) denote the crop yield of corn measured in the number of bushels per acre. Assume (unrealistically) that \(X\) is normally distributed with unknown mean \(\mu\) and standard deviation \(\sigma=6\). An agricultural researcher is working to increase the current average yield from 40 bushels per acre. Therefore, he is interested in testing, at the \(\alpha=0.05\) level, the null hypothesis \(H_0:\mu=40\) against the alternative hypothesis that \(H_A:\mu>40\). Find the sample size \(n\) that is necessary to achieve 0.90 power at the alternative \(\mu=45\).

As is always the case, we need to start by finding a threshold value \(c\), such that if the sample mean is larger than \(c\), we'll reject the null hypothesis:

That is, in order for our hypothesis test to be conducted at the \(\alpha=0.05\) level, the following statement must hold (using our typical \(Z\) transformation):

\(c = 40 + 1.645 \left( \dfrac{6}{\sqrt{n}} \right) \) (**)

But, that's not the only condition that \(c\) must meet, because \(c\) also needs to be defined to ensure that our power is 0.90 or, alternatively, that the probability of a Type II error is 0.10. That would happen if there was a 10% chance that our test statistic fell short of \(c\) when \(\mu=45\), as the following drawing illustrates in blue:

This illustration suggests that in order for our hypothesis test to have 0.90 power, the following statement must hold (using our usual \(Z\) transformation):

\(c = 45 - 1.28 \left( \dfrac{6}{\sqrt{n}} \right) \) (**)

Aha! We have two (asterisked (**)) equations and two unknowns! All we need to do is equate the equations, and solve for \(n\). Doing so, we get:

\(40+1.645\left(\frac{6}{\sqrt{n}}\right)=45-1.28\left(\frac{6}{\sqrt{n}}\right)\) \(\Rightarrow 5=(1.645+1.28)\left(\frac{6}{\sqrt{n}}\right), \qquad \Rightarrow 5=\frac{17.55}{\sqrt{n}}, \qquad n=(3.51)^2=12.3201\approx 13\)

Now that we know we will set \(n=13\), we can solve for our threshold value c :

\( c = 40 + 1.645 \left( \dfrac{6}{\sqrt{13}} \right)=42.737 \)

So, in summary, if the agricultural researcher collects data on \(n=13\) corn plots, and rejects his null hypothesis \(H_0:\mu=40\) if the average crop yield of the 13 plots is greater than 42.737 bushels per acre, he will have a 5% chance of committing a Type I error and a 10% chance of committing a Type II error if the population mean \(\mu\) were actually 45 bushels per acre.

Example 25-5 Section  

politician

Consider \(p\), the true proportion of voters who favor a particular political candidate. A pollster is interested in testing at the \(\alpha=0.01\) level, the null hypothesis \(H_0:9=0.5\) against the alternative hypothesis that \(H_A:p>0.5\). Find the sample size \(n\) that is necessary to achieve 0.80 power at the alternative \(p=0.55\).

In this case, because we are interested in performing a hypothesis test about a population proportion \(p\), we use the \(Z\)-statistic:

\(Z = \dfrac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \)

Again, we start by finding a threshold value \(c\), such that if the observed sample proportion is larger than \(c\), we'll reject the null hypothesis:

That is, in order for our hypothesis test to be conducted at the \(\alpha=0.01\) level, the following statement must hold:

\(c = 0.5 + 2.326 \sqrt{ \dfrac{(0.5)(0.5)}{n}} \) (**)

But, again, that's not the only condition that c must meet, because \(c\) also needs to be defined to ensure that our power is 0.80 or, alternatively, that the probability of a Type II error is 0.20. That would happen if there was a 20% chance that our test statistic fell short of \(c\) when \(p=0.55\), as the following drawing illustrates in blue:

This illustration suggests that in order for our hypothesis test to have 0.80 power, the following statement must hold:

\(c = 0.55 - 0.842 \sqrt{ \dfrac{(0.55)(0.45)}{n}} \) (**)

Again, we have two (asterisked (**)) equations and two unknowns! All we need to do is equate the equations, and solve for \(n\). Doing so, we get:

\(0.5+2.326\sqrt{\dfrac{0.5(0.5)}{n}}=0.55-0.842\sqrt{\dfrac{0.55(0.45)}{n}} \\ 2.326\dfrac{\sqrt{0.25}}{\sqrt{n}}+0.842\dfrac{\sqrt{0.2475}}{\sqrt{n}}=0.55-0.5 \\ \dfrac{1}{\sqrt{n}}(1.5818897)=0.05 \qquad \Rightarrow n\approx \left(\dfrac{1.5818897}{0.05}\right)^2 = 1000.95 \approx 1001 \)

Now that we know we will set \(n=1001\), we can solve for our threshold value \(c\):

\(c = 0.5 + 2.326 \sqrt{\dfrac{(0.5)(0.5)}{1001}}= 0.5367 \)

So, in summary, if the pollster collects data on \(n=1001\) voters, and rejects his null hypothesis \(H_0:p=0.5\) if the proportion of sampled voters who favor the political candidate is greater than 0.5367, he will have a 1% chance of committing a Type I error and a 20% chance of committing a Type II error if the population proportion \(p\) were actually 0.55.

Incidentally, we can always check our work! Conducting the survey and subsequent hypothesis test as described above, the probability of committing a Type I error is:

\(\alpha= P(\hat{p} >0.5367 \text { if } p = 0.50) = P(Z > 2.3257) = 0.01 \)

and the probability of committing a Type II error is:

\(\beta = P(\hat{p} <0.5367 \text { if } p = 0.55) = P(Z < -0.846) = 0.199 \)

just as the pollster had desired.

We've illustrated several sample size calculations. Now, let's summarize the information that goes into a sample size calculation. In order to determine a sample size for a given hypothesis test, you need to specify:

The desired \(\alpha\) level, that is, your willingness to commit a Type I error.

The desired power or, equivalently, the desired \(\beta\) level, that is, your willingness to commit a Type II error.

A meaningful difference from the value of the parameter that is specified in the null hypothesis.

The standard deviation of the sample statistic or, at least, an estimate of the standard deviation (the "standard error") of the sample statistic.

hypothesis test different sample size

Power and Sample Size Determination

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  
  • |   10  
  • |   11  

On This Page sidebar

Issues in Estimating Sample Size for Hypothesis Testing

Ensuring that a test has high power.

Learn More sidebar

All Modules

In the module on hypothesis testing for means and proportions, we introduced techniques for means, proportions, differences in means, and differences in proportions. While each test involved details that were specific to the outcome of interest (e.g., continuous or dichotomous) and to the number of comparison groups (one, two, more than two), there were common elements to each test. For example, in each test of hypothesis, there are two errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true.   In the first step of any test of hypothesis, we select a level of significance, α , and α = P(Type I error) = P(Reject H 0 | H 0 is true). Because we purposely select a small value for α , we control the probability of committing a Type I error. The second type of error is called a Type II error and it is defined as the probability we do not reject H 0 when it is false. The probability of a Type II error is denoted β , and β =P(Type II error) = P(Do not Reject H 0 | H 0 is false). In hypothesis testing, we usually focus on power, which is defined as the probability that we reject H 0 when it is false, i.e., power = 1- β = P(Reject H 0 | H 0 is false). Power is the probability that a test correctly rejects a false null hypothesis. A good test is one with low probability of committing a Type I error (i.e., small α ) and high power (i.e., small β, high power).  

Here we present formulas to determine the sample size required to ensure that a test has high power. The sample size computations depend on the level of significance, aα, the desired power of the test (equivalent to 1-β), the variability of the outcome, and the effect size. The effect size is the difference in the parameter of interest that represents a clinically meaningful difference. Similar to the margin of error in confidence interval applications, the effect size is determined based on clinical or practical criteria and not statistical criteria.  

The concept of statistical power can be difficult to grasp. Before presenting the formulas to determine the sample sizes required to ensure high power in a test, we will first discuss power from a conceptual point of view.  

Suppose we want to test the following hypotheses at aα=0.05:  H 0 : μ = 90 versus H 1 : μ ≠ 90. To test the hypotheses, suppose we select a sample of size n=100. For this example, assume that the standard deviation of the outcome is σ=20. We compute the sample mean and then must decide whether the sample mean provides evidence to support the alternative hypothesis or not. This is done by computing a test statistic and comparing the test statistic to an appropriate critical value. If the null hypothesis is true (μ=90), then we are likely to select a sample whose mean is close in value to 90. However, it is also possible to select a sample whose mean is much larger or much smaller than 90. Recall from the Central Limit Theorem (see page 11 in the module on Probability ), that for large n (here n=100 is sufficiently large), the distribution of the sample means is approximately normal with a mean of

If the null hypothesis is true, it is possible to observe any sample mean shown in the figure below; all are possible under H 0 : μ = 90.  

Normal distribution of X when the mean of X is 90. A bell-shaped curve with a value of X-90 at the center.

Rejection Region for Test H 0 : μ = 90 versus H 1 : μ ≠ 90 at α =0.05

Standard normal distribution showing a mean of 90. The rejection areas are in the two tails at the extremes above and below the mean. If the alpha level is 0.05, then each tail accounts for an arean of 0.025.

The areas in the two tails of the curve represent the probability of a Type I Error, α= 0.05. This concept was discussed in the module on Hypothesis Testing .  

Now, suppose that the alternative hypothesis, H 1 , is true (i.e., μ ≠ 90) and that the true mean is actually 94. The figure below shows the distributions of the sample mean under the null and alternative hypotheses.The values of the sample mean are shown along the horizontal axis.  

Two overlapping normal distributions, one depicting the null hypothesis with a mean of 90 and the other showing the alternative hypothesis with a mean of 94. A more complete explanation of the figure is provided in the text below the figure.

If the true mean is 94, then the alternative hypothesis is true. In our test, we selected α = 0.05 and reject H 0 if the observed sample mean exceeds 93.92 (focusing on the upper tail of the rejection region for now). The critical value (93.92) is indicated by the vertical line. The probability of a Type II error is denoted β, and β = P(Do not Reject H 0 | H 0 is false), i.e., the probability of not rejecting the null hypothesis if the null hypothesis were true. β is shown in the figure above as the area under the rightmost curve (H 1 ) to the left of the vertical line (where we do not reject H 0 ). Power is defined as 1- β = P(Reject H 0 | H 0 is false) and is shown in the figure as the area under the rightmost curve (H 1 ) to the right of the vertical line (where we reject H 0 ).  

Note that β and power are related to α, the variability of the outcome and the effect size. From the figure above we can see what happens to β and power if we increase α. Suppose, for example, we increase α to α=0.10.The upper critical value would be 92.56 instead of 93.92. The vertical line would shift to the left, increasing α, decreasing β and increasing power. While a better test is one with higher power, it is not advisable to increase α as a means to increase power. Nonetheless, there is a direct relationship between α and power (as α increases, so does power).

β and power are also related to the variability of the outcome and to the effect size. The effect size is the difference in the parameter of interest (e.g., μ) that represents a clinically meaningful difference. The figure above graphically displays α, β, and power when the difference in the mean under the null as compared to the alternative hypothesis is 4 units (i.e., 90 versus 94). The figure below shows the same components for the situation where the mean under the alternative hypothesis is 98.

Overlapping bell-shaped distributions - one with a mean of 90 and the other with a mean of 98

Notice that there is much higher power when there is a larger difference between the mean under H 0 as compared to H 1 (i.e., 90 versus 98). A statistical test is much more likely to reject the null hypothesis in favor of the alternative if the true mean is 98 than if the true mean is 94. Notice also in this case that there is little overlap in the distributions under the null and alternative hypotheses. If a sample mean of 97 or higher is observed it is very unlikely that it came from a distribution whose mean is 90. In the previous figure for H 0 : μ = 90 and H 1 : μ = 94, if we observed a sample mean of 93, for example, it would not be as clear as to whether it came from a distribution whose mean is 90 or one whose mean is 94.

In designing studies most people consider power of 80% or 90% (just as we generally use 95% as the confidence level for confidence interval estimates). The inputs for the sample size formulas include the desired power, the level of significance and the effect size. The effect size is selected to represent a clinically meaningful or practically important difference in the parameter of interest, as we will illustrate.  

The formulas we present below produce the minimum sample size to ensure that the test of hypothesis will have a specified probability of rejecting the null hypothesis when it is false (i.e., a specified power). In planning studies, investigators again must account for attrition or loss to follow-up. The formulas shown below produce the number of participants needed with complete data, and we will illustrate how attrition is addressed in planning studies.

return to top | previous page | next page

Content ©2020. All Rights Reserved. Date last modified: March 13, 2020. Wayne W. LaMorte, MD, PhD, MPH

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

Sample size calculation: Basic principles

Sabyasachi das.

Department of Anaesthesiology and Critical Care, Medical College, Kolkata, West Bengal, India

Mohanchandra Mandal

1 Department of Anaesthesiology and Critical Care, North Bengal Medical College, Sushrutanagar, Darjeeling, West Bengal, India

Addressing a sample size is a practical issue that has to be solved during planning and designing stage of the study. The aim of any clinical research is to detect the actual difference between two groups (power) and to provide an estimate of the difference with a reasonable accuracy (precision). Hence, researchers should do a priori estimate of sample size well ahead, before conducting the study. Post hoc sample size computation is not encouraged conventionally. Adequate sample size minimizes the random error or in other words, lessens something happening by chance. Too small a sample may fail to answer the research question and can be of questionable validity or provide an imprecise answer while too large a sample may answer the question but is resource-intensive and also may be unethical. More transparency in the calculation of sample size is required so that it can be justified and replicated while reporting.

INTRODUCTION

Besides scientific justification and validity, the calculation of sample size (‘just large enough’) helps a medical researcher to assess cost, time and feasibility of his project.[ 1 ] Although frequently reported in anaesthesia journals, the details or the elements of calculation of sample size are not consistently provided by the authors. Sample size calculations reported do not match with replication of sample size in many studies.[ 2 ] Most trials with negative results do not have a large enough sample size. Hence, reporting of statistical power and sample size need to be improved.[ 3 , 4 ] There is a belief that studies with small sample size are unethical if they do not ensure adequate power. However, the truth is that for a study to be ethical in its design, its predicted value must outweigh the projected risks to its participants. In studies, where the risks and inconvenience borne by the participants outweigh the benefits received as a result of participation, it is the projected burden. A study may still be valid if the projected benefit to the society outweighs the burden to society. If there is no burden, then any sample size may be ideal.[ 5 ] Many different approaches of sample size design exist depending on the study design and research question. Moreover, each study design can have multiple sub-designs resulting in different sample size calculation.[ 6 ] Addressing a sample size is a practical issue that has to be solved during planning and designing stage of the study. It may be an important issue in approval or rejection of clinical trial results irrespective of the efficacy.[ 7 ]

By the end of this article, the reader will be able to enumerate the prerequisite for sample size estimation, to describe the common lapses of sample size calculation and the importance of a priori sample size estimation. The readers will be able to define the common terminologies related to sample size calculation.

IMPORTANCE OF PILOT STUDY IN SAMPLE SIZE ESTIMATION

In published literature, relevant data for calculating the sample size can be gleaned from prevalence estimates or event rates, standard deviation (SD) of the continuous outcome, sample size of similar studies with similar outcomes. The idea of approximate ‘effect’ estimates can be obtained by reviewing meta-analysis and clinically meaningful effect. Small pilot study, personal experience, expert opinion, educated guess, hospital registers, unpublished reports support researcher when we have insufficient information in the existing/available literature. A pilot study not only helps in the estimation of sample size but also its primary purpose is to check the feasibility of the study.

The pilot study is a small scale trial run as a pre-test, and it tries out for the proposed major trial. It allows preliminary testing of the hypotheses and may suggest some change, dropping some part or developing new hypotheses so that it can be tested more precisely.[ 8 ] It may address many logistic issues such as checking that instructions are comprehensive, and the investigators are adequately skilled for the trial. The pilot study almost always provides enough data for the researcher to decide whether to go ahead with the main study or to abandon. Many research ideas that seem to show great promise are unproductive when actually carried out. From the findings of pilot study, the researcher may abandon the main study involving large logistic resources, and thus can save a lot of time and money.[ 8 ]

METHODS FOR SAMPLE SIZE CALCULATION

Sample size can be calculated either using confidence interval method or hypothesis testing method. In the former, the main objective is to obtain narrow intervals with high reliability. In the latter, the hypothesis is concerned with testing whether the sample estimate is equal to some specific value.

Null hypothesis

This hypothesis states that there is no difference between the control and the study group in relation to randomized controlled trial (RCT). Rejecting or disproving the null hypothesis – and thus concluding that there are grounds for believing that there is a difference between the two groups, is a central task in the modern practice of science, and gives a precise criterion for rejecting a hypothesis.[ 9 , 10 ]

Alternative hypothesis

This hypothesis is contradictory to null hypothesis, i.e., it assumes that there is a difference among the groups, or there is some association between the predictor and the outcome [ Figure 1 ].[ 9 , 10 ] Sometimes, it is accepted by exclusion if the test of significance rejects the null hypothesis. It may be one-sided (specifies the difference in one direction only) or two-sided (specifies the difference in both directions).

An external file that holds a picture, illustration, etc.
Object name is IJA-60-652-g001.jpg

Result possibilities during hypothesis tasting. H 0 – Null hypothesis; H 1 – The alternative hypothesis

Type I error (α error) occur if the null hypothesis is rejected when it is true. It represents the chance that the researcher detects a difference between two groups when in reality no difference exists. In other words, it is the chance of false-positive conclusion. A value of 0.05 is most commonly used.

Type II error (β error) is the chance of a false-negative result. The researcher does not detect the difference between the two groups when in reality the difference exists. Conventionally, it is set at a level of 0.20, which translates into <20% chance of a false-negative conclusion. Power is the complement of beta, i.e., (1-beta). In other words, power is 0.80 or 80% when beta is set at 0.20. The power represents the chance of avoiding a false-negative conclusion, or the chance of detecting an effect if it really exists.[ 11 ]

TYPES OF TRIALS

Parallel arm RCTs are most commonly used, that means all participants are randomized in two or more arms of different interventions treated concurrently. Various types of parallel RCTs are used in accordance with the need: Superiority trials which verify whether a new approach is more effective than the conventional from statistically or clinical point of view. Here, the concurrent null hypothesis is that the new approach is not more effective than the conventional approach. Equivalence trials which are designed to ascertain that the new approach and the standard approach are equally effective. Corresponding null hypothesis states that the difference between both approaches is clinically relevant. Non-inferiority trials which are designed to ascertain that the new approach is equal if not superior to the conventional approach. Corresponding null hypothesis is that the new approach is inferior to the conventional one.

PREREQUISITES FOR SAMPLE SIZE ESTIMATION

At the outset, primary objectives (descriptive/analytical) and primary outcome measure (mean/proportion/rates) should be defined. Often there is a primary research question that the researcher wants to investigate. It is important to choose a primary outcome and lock that for the study. The minimum difference that investigator wants to detect between the groups makes the effect size for the sample size calculation.[ 7 ] Hence, if the researcher changes the planned outcome after the start of the study, the reported P value and inference becomes invalid.[ 11 ] The level of acceptable Type I error (α) and Type II error (β) should also be determined. The error rate of Type I error (alpha) is customarily set lower than Type II error (beta). The philosophy behind this is the impact of a false positive error (Type I) is more detrimental than that of false negative (Type II) error. So they are protected against more rigidly.

Besides, the researcher needs to know the control arm mean/event rates/proportion, and the smallest clinically important effect that one is trying to detect.

THE RELATION BETWEEN PRIMARY OBJECTIVE AND THE SAMPLE SIZE

The type of primary outcome measure with its clear definition help computing correct sample size as there are definite ways to reach sample size for each outcome measure. It needs special attention as it principally influences how impressively the research question is answered. The type of primary outcome measure also is the basis for the mode of estimation regarding population variance. For continuous variable (e.g., mean arterial pressure [MAP]), population SD is incorporated in the formula whereas the SD needs to be worked out from the proportion of outcomes for binomial variables (e.g., hypotension - yes/no). In literature, there can be several outcomes for each study design. It is the responsibility of the researcher to find out the primary outcome of the study. Mostly sample size is estimated based on the primary outcome. It is possible to estimate sample size taking into consideration all outcome measures, both primary and secondary at the cost of much larger sample size.

ESSENTIAL COMPONENTS OF SAMPLE SIZE ESTIMATION

The sample size for any study depends on certain factors such as the acceptable level of significance ( P value), power (1 − β) of the study, expected ‘clinically relevant’ effect size, underlying event rate in the population, etc.[ 7 ] Primarily, three factors P value (depends on α), power (related with β) and the effect size (clinically relevant assumption) govern an appropriate sample size.[ 12 , 13 , 14 ] The ‘effect size’ means the magnitude of clinically relevant effect under the alternative hypothesis. It quantifies the difference in the outcomes between the study and control groups. It refers to the smallest difference that would be of clinical importance. Ideally, the basis of effect size selection should be on clinical judgement. It varies with different clinical trials. The researcher has to determine this effect size with scientific knowledge and wisdom. Available previous publications on related topic might be helpful in this regard. ‘Minimal clinically important difference’ is the smallest difference that would be worth testing. Sample size varies inversely with effect size.

The ideal study to make a researcher happy is one where power of the study is high, or in other words, the study has high chance of making a conclusion with reasonable confidence, be it accepting or rejecting null hypothesis.[ 9 ] Sample size matrix, includes different values of sample sizes using varying dimensions of alpha, power (1-β), and effect size. It is often more useful for the research team to choose the sample size number that fits conveniently to the need of the researcher [ Table 1 ].

The matrix showing changes of sample size with varying dimensions of alpha, power (1-β), and effect size

An external file that holds a picture, illustration, etc.
Object name is IJA-60-652-g002.jpg

FORMULAE AND SOFTWARE

Once these three factors are fixed, there are many ways (formulae, nomogram, tables and software) for estimating the optimum sample size. At present, there are a good number of softwares, available in the internet. It is prudent to be familiar with the instructions of any software to get sample size of one arm of the study. Perhaps the most important step is to check with the most appropriate formula to get a correct sample size. Websites of some of the commonly used softwares are provided in Table 2 .[ 2 , 6 ]

Websites for some useful statistical software

An external file that holds a picture, illustration, etc.
Object name is IJA-60-652-g003.jpg

The number of formulae for calculating the sample size and power, to answer precisely different study designs and research questions are no less than 100. It is wise to check appropriate formula even while using software. Although there are more than 100 formulae, for RCTs numbers of formulae are limited. It essentially depends on the primary outcome measure such as mean ± SD, rate and proportion.[ 6 ] Interested readers may access all relevant sample size estimation formulae using the relevant links.

Calculating the sample size by comparing two means

A study to see the effect of phenylephrine on MAP as continuous variable after spinal anaesthesia to counteract hypotension.

MAP as continuous variable:

  • n = Sample size in each of the groups

μ2 = Population mean in treatment Group 2

  • μ1−μ2 = The difference the investigator wishes to detect
  • ℧ = Population variance (SD)

b = Conventional multiplier for power = 0.80.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-652-g004.jpg

Value of a = 1.96, b = 0.842 [ Table 3 ]. If a difference of 15 mmHg in MAP is considered between the phenylephrine and the placebo group as clinically significant (μ1− μ2) and be detected with 80% power and a significance level alpha of 0.05.[ 7 ] n = 2 × ([1.96 + 0.842] 2 × 20 2 )/15 2 = 27.9. That means 28 subjects per group is the sample size.

The constant Z values for conventional α and β values

An external file that holds a picture, illustration, etc.
Object name is IJA-60-652-g005.jpg

Calculating the sample size by comparing two proportions

A study to see the effect of phenylephrine on MAP as a binary variable after spinal anaesthesia to counteract hypotension.

MAP as a binary outcome, below or above 60 mmHg (hypotension – yes/no):

  • n = The sample size in each of the groups
  • p 1 = Proportion of subjects with hypotension in treatment Group 1
  • q 1 = Proportion of subjects without hypotension in treatment Group 1 (1 − p1)
  • p 2 = Proportion of subjects with hypotension in treatment Group 2
  • q 2 = Proportion of subjects without hypotension in treatment Group 2 (1 − p2)
  • x = The difference the investigator wishes to detect
  • a = Conventional multiplier for alpha = 0.05
  • b = Conventional multiplier for power = 0.8.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-652-g006.jpg

Considering a difference of 10% as clinically relevant and from the recent publication the proportion of subjects with hypotension in the treated group will be 20% ( p 1 = 0.2) and in the control group will be 30% ( p 2 = 0.3), and thus q 1 and q 2 are 0.80 and 0.70, respectively.[ 7 ] Assuming a power of 80%, and an alpha of 0.05, i.e. 1.96 for a and 0.842 for b [ Table 3 ] we get:

([1.96 + 0.842] 2 × [0.20 × 0.80 + 0.30 × 0.70])/0.10 2 = 290.5. Thus, 291 is the sample size.

Researcher may follow some measures like using continuous variables as the primary outcome, measuring the outcome precisely or choose outcomes that can be measured properly. Use of a more common outcome, making one-sided hypothesis may help achieving this target. Published literature and pilot studies are the basis of sample size calculation. At times, expert opinions, personal experience with event rates and educated guess becomes helpful. Variance, effect size or event rates may be underestimated during calculation of the sample size at the designing phase. If the investigator realizes that this underestimation has led to ‘too small a sample size’ recalculation can be tried based on interim data.[ 15 ]

Sample size calculation can be guided by previous literature, pilot studies and past clinical experiences. The collaborative effort of the researcher and the statistician is required in this stage. Estimated sample size is not an absolute truth, but our best guess. Issues such as anticipated loss to follow-up, large subgroup analysis and complicated study designs, demands a larger sample size to ensure adequate power throughout the trial. A change in sample size is proportional to variance (square of SD) and inversely proportional to the detected difference.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Statology

Statistics Made Easy

How to Perform a t-test with Unequal Sample Sizes

One question students often have in statistics is:

Is it possible to perform a t-test when the sample sizes of each group are not equal?

The short answer:

Yes, you can perform a t-test when the sample sizes are not equal. Equal sample sizes is not one of the assumptions made in a t-test.

The real issues arise when the two samples do not have equal variances, which is one of the assumptions made in a t-test.

When this occurs, it’s recommended that you use Welch’s t-test instead, which does not make the assumption of equal variances.

The following examples demonstrate how to perform t-tests with unequal sample sizes when the variances are equal and when they’re not equal.

Example 1: Unequal Sample Sizes and Equal Variances

Suppose we administer two programs designed to help students score higher on some exam.

The results are as follows:

  • n (sample size): 500
  • x (sample mean): 80
  • s (sample standard deviation): 5
  • n (sample size): 20
  • x (sample mean): 85

The following code shows how to create a boxplot in R to visualize the distribution of exam scores for each program:

hypothesis test different sample size

The mean exam score for Program 2 appears to be higher, but the variance of exam scores between the two programs is roughly equal. 

The following code shows how to perform an independent samples t-test along with a Welch’s t-test:

The independent samples t-test returns a p-value of .0009 and Welch’s t-test returns a p-value of .0029 .

Since the p-value of each test is less than .05, we would reject the null hypothesis in each test and conclude that there is a statistically significant difference in mean exam scores between the two programs.

Even though the sample sizes are unequal, the independent samples t-test and Welch’s t-test both return similar results since the two samples had equal variances.

Example 2: Unequal Sample Sizes and Unequal Variances

  • s (sample standard deviation): 25

hypothesis test different sample size

The mean exam score for Program 2 appears to be higher, but the variance of exam scores for Program 1 is much higher than Program 2.

The independent samples t-test returns a p-value of .5496  and Welch’s t-test returns a p-value of .0361 .

The independent samples t-test is not able to detect a difference in mean exam scores, but the Welch’s t-test is able to detect a statistically significant difference.

Since the two samples had unequal variances, only Welch’s t-test was able to detect the statistically significant difference in mean exam scores since this test does not make the assumption of equal variances between samples.

Additional Resources

The following tutorials provide additional information about t-tests:

Introduction to the One Sample t-test Introduction to the Two Sample t-test Introduction to the Paired Samples t-test

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Teach yourself statistics

Hypothesis Test: Difference Between Proportions

This lesson explains how to conduct a hypothesis test to determine whether the difference between two proportions is significant. The test procedure, called the two-proportion z-test , is appropriate when the following conditions are met:

  • The sampling method for each population is simple random sampling .
  • The samples are independent .
  • Each sample includes at least 10 successes and 10 failures.
  • Each population is at least 20 times as big as its sample.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The table below shows three sets of hypotheses. Each makes a statement about the difference d between two population proportions, P 1 and P 2 . (In the table, the symbol ≠ means " not equal to ".)

The first set of hypotheses (Set 1) is an example of a two-tailed test , since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests , since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.

When the null hypothesis states that there is no difference between the two population proportions (i.e., d = P 1 - P 2 = 0), the null and alternative hypothesis for a two-tailed test are often stated in the following form.

H o : P 1 = P 2 H a : P 1 ≠ P 2

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the two-proportion z-test (described in the next section) to determine whether the hypothesized difference between population proportions differs significantly from the observed sample difference.

Analyze Sample Data

Using sample data, complete the following computations to find the test statistic and its associated P-Value.

p = (p 1 * n 1 + p 2 * n 2 ) / (n 1 + n 2 )

SE = sqrt{ p * ( 1 - p ) * [ (1/n 1 ) + (1/n 2 ) ] }

z = (p 1 - p 2 ) / SE

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a z-score, use the Normal Distribution Calculator to assess the probability associated with the z-score. (See sample problems at the end of this lesson for examples of how this is done.)

The analysis described above is a two-proportion z-test.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

In this section, two sample problems illustrate how to conduct a hypothesis test for the difference between two proportions. The first problem involves a two-tailed test; the second problem, a one-tailed test.

Problem 1: Two-Tailed Test

Suppose the Acme Drug Company develops a new drug, designed to prevent colds. The company states that the drug is equally effective for men and women. To test this claim, they choose a a simple random sample of 100 women and 200 men from a population of 100,000 volunteers.

At the end of the study, 38% of the women caught a cold; and 51% of the men caught a cold. Based on these findings, can we reject the company's claim that the drug is equally effective for men and women? Use a 0.05 level of significance.

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

Null hypothesis: P 1 = P 2

Alternative hypothesis: P 1 ≠ P 2

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. The test method is a two-proportion z-test.

p = [(0.38 * 100) + (0.51 * 200)] / (100 + 200)

p = 140/300 = 0.467

SE = sqrt [ 0.467 * 0.533 * ( 1/100 + 1/200 ) ]

SE = sqrt [0.003733] = 0.061

z = (p 1 - p 2 ) / SE = (0.38 - 0.51)/0.061 = -2.13

where p 1 is the sample proportion in sample 1, where p 2 is the sample proportion in sample 2, n 1 is the size of sample 1, and n 2 is the size of sample 2.

Since we have a two-tailed test , the P-value is the probability that the z-score is less than -2.13 or greater than 2.13.

  • Interpret results . Since the P-value (0.034) is less than the significance level (0.05), we cannot accept the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the samples were independent, each population was at least 10 times larger than its sample, and each sample included at least 10 successes and 10 failures.

Problem 2: One-Tailed Test

Suppose the previous example is stated a little bit differently. Suppose the Acme Drug Company develops a new drug, designed to prevent colds. The company states that the drug is more effective for women than for men. To test this claim, they choose a a simple random sample of 100 women and 200 men from a population of 100,000 volunteers.

At the end of the study, 38% of the women caught a cold; and 51% of the men caught a cold. Based on these findings, can we conclude that the drug is more effective for women than for men? Use a 0.01 level of significance.

Null hypothesis: P 1 >= P 2

Alternative hypothesis: P 1 < P 2

  • Formulate an analysis plan . For this analysis, the significance level is 0.01. The test method is a two-proportion z-test.
  • Interpret results . Since the P-value (0.017) is greater than the significance level (0.01), we cannot reject the null hypothesis.

Accendo Reliability

Your Reliability Engineering Professional Development Site

by Fred Schenkelberg Leave a Comment

Hypothesis Test Sample Size

Hypothesis Test Sample Size

Hypothesis testing permits us to compare two groups of items and determine if there is a significant difference or not. There are many types of hypothesis tests depending on the specific question, type of data , and what is or is not known when designing the test.

One of the most often asked question concerning nearly any testing is the sample size. There are times, although sometimes rare, when we have sufficient samples to run the test properly – statistically speaking. Whether we have ample samples or not, we should calculate the sample size related to the desired confidence.

The α error that can be tolerated should be set by local policy (if not, select a reasonable number at or below 0.4, common is 0.1 which is related to a 90% Type-1 confidence.) The next bit of information you need is how much of a difference is desired that the test actually detects. For example, if we are considering heights of people (for some reason) we may want to know if a specific group’s average height is at least 6 inches less than the overall population. Thus if the two groups of data are at least 6 inches different the test is designed to be able to detect that difference. The last bit of information we need is the variation of the data (standard deviation either known or estimated).

Given that we’ve selected a particular hypothesis test to run, and we’ve collected the three bits of information we need, we can calculate the minimum number of samples needed for the hypothesis test. Let’s assume we’re using a Z-test for a hypothesis about the means.

$$ \large\displaystyle Z=\frac{\bar{X}-\mu }{{}^{\sigma }\!\!\diagup\!\!{}_{\sqrt{n}}\;}$$

Z as you recall the test statistic and compared to a critical value for the decision at hand.

The X-bar minus μ is the data average difference from the population mean. In our planning stage, this is the amount of difference we would like to be able to detect with the test. For the height example above, this would be 6 inches. Let’s change the notation by stating the difference as E.

$$ \large\displaystyle E=\bar{X}-\mu $$

E is the minimum difference of interest. Next, let’s isolate the n term with some simple algebra.

$$ \large\displaystyle \sqrt{n}=\frac{Z\sigma }{E}$$

and, square both sides to finish.

$$ \large\displaystyle n=\frac{{{Z}^{2}}{{\sigma }^{2}}}{{{E}^{2}}}$$

Continuing our example of heights, let’s use E = 6, σ=6, and we’d like at least 95% confidence making α=0.05 and Z=1.96. Insert the values a little math, and we find the sample size.

$$ \large\displaystyle n=\frac{{{1.96}^{2}}{{6}^{2}}}{{{6}^{2}}}=3.84\approx 4$$

In this simple example, the standard deviation and difference of interest at the same and cancel out making the calculation very easy. We need a sample of four from the group to determine if that group’s average height is 6 inches lower or not than the population.

If we wanted higher confidence notice that Z increases rapidly. Also, if we wanted to detect less of a difference, again sample size increases rapidly as we want to detect a smaller difference, approaching infinity as the difference approaches zero. And, if the population has a higher variation than this example, again the sample size goes up.

Ideally, we always want to work with populations with very small amount of variation, relative low confidence (although many argue for at least 80 to 95% confidence as acceptable), and the desire to detect very large differences. Of course, this is often not the case. This sample size formula permits you to work with your team to balance the test cost, sample size, confidence, and sensitivity (E) of the planned test.

One can start with any hypothesis test and with some ‘rearrangement’ determine the sample size formula for that test. For those preparing for the CRE exam to work though a few of these and have the formulas in your notes.

Hypothesis Testing  (article)

Paired-Comparison Hypothesis Tests  (article)

Sample Size – success testing  (article)

' src=

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses cookies to give you a better experience, analyze site traffic, and gain insight to products or offers that may interest you. By continuing, you consent to the use of cookies. Learn how we use cookies, how they work, and how to set your browser preferences by reading our  Cookies Policy .

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

8.4: Small Sample Tests for a Population Mean

  • Last updated
  • Save as PDF
  • Page ID 522

Learning Objectives

  • To learn how to apply the five-step test procedure for test of hypotheses concerning a population mean when the sample size is small.

In the previous section hypotheses testing for population means was described in the case of large samples. The statistical validity of the tests was insured by the Central Limit Theorem, with essentially no assumptions on the distribution of the population. When sample sizes are small, as is often the case in practice, the Central Limit Theorem does not apply. One must then impose stricter assumptions on the population to give statistical validity to the test procedure. One common assumption is that the population from which the sample is taken has a normal probability distribution to begin with. Under such circumstances, if the population standard deviation is known, then the test statistic

\[\frac{(\bar{x}-\mu _0)}{\sigma /\sqrt{n}} \nonumber \]

still has the standard normal distribution, as in the previous two sections. If \(\sigma\) is unknown and is approximated by the sample standard deviation \(s\), then the resulting test statistic

\[\dfrac{(\bar{x}-\mu _0)}{s/\sqrt{n}} \nonumber \]

follows Student’s \(t\)-distribution with \(n-1\) degrees of freedom.

Standardized Test Statistics for Small Sample Hypothesis Tests Concerning a Single Population Mean

 If \(\sigma\) is known: \[Z=\frac{\bar{x}-\mu _0}{\sigma /\sqrt{n}} \nonumber \]

If \(\sigma\) is unknown: \[T=\frac{\bar{x}-\mu _0}{s /\sqrt{n}} \nonumber \]

  • The first test statistic (\(\sigma\) known) has the standard normal distribution.
  • The second test statistic (\(\sigma\) unknown) has Student’s \(t\)-distribution with \(n-1\) degrees of freedom.
  • The population must be normally distributed.

The distribution of the second standardized test statistic (the one containing \(s\)) and the corresponding rejection region for each form of the alternative hypothesis (left-tailed, right-tailed, or two-tailed), is shown in Figure \(\PageIndex{1}\). This is just like Figure 8.2.1 except that now the critical values are from the \(t\)-distribution. Figure 8.2.1 still applies to the first standardized test statistic (the one containing (\(\sigma\)) since it follows the standard normal distribution.

ecf5f771ca148089665859c88d8679df.jpg

The \(p\)-value of a test of hypotheses for which the test statistic has Student’s \(t\)-distribution can be computed using statistical software, but it is impractical to do so using tables, since that would require \(30\) tables analogous to Figure 7.1.5, one for each degree of freedom from \(1\) to \(30\). Figure 7.1.6 can be used to approximate the \(p\)-value of such a test, and this is typically adequate for making a decision using the \(p\)-value approach to hypothesis testing, although not always. For this reason the tests in the two examples in this section will be made following the critical value approach to hypothesis testing summarized at the end of Section 8.1, but after each one we will show how the \(p\)-value approach could have been used.

Example \(\PageIndex{1}\)

The price of a popular tennis racket at a national chain store is \(\$179\). Portia bought five of the same racket at an online auction site for the following prices:

\[155\; 179\; 175\; 175\; 161 \nonumber \]

Assuming that the auction prices of rackets are normally distributed, determine whether there is sufficient evidence in the sample, at the \(5\%\) level of significance, to conclude that the average price of the racket is less than \(\$179\) if purchased at an online auction.

  • Step 1 . The assertion for which evidence must be provided is that the average online price \(\mu\) is less than the average price in retail stores, so the hypothesis test is \[H_0: \mu =179\\ \text{vs}\\ H_a: \mu <179\; @\; \alpha =0.05 \nonumber \]
  • Step 2 . The sample is small and the population standard deviation is unknown. Thus the test statistic is \[T=\frac{\bar{x}-\mu _0}{s /\sqrt{n}} \nonumber \] and has the Student \(t\)-distribution with \(n-1=5-1=4\) degrees of freedom.
  • Step 3 . From the data we compute \(\bar{x}=169\) and \(s=10.39\). Inserting these values into the formula for the test statistic gives \[T=\frac{\bar{x}-\mu _0}{s /\sqrt{n}}=\frac{169-179}{10.39/\sqrt{5}}=-2.152 \nonumber \]
  • Step 4 . Since the symbol in \(H_a\) is “\(<\)” this is a left-tailed test, so there is a single critical value, \(-t_\alpha =-t_{0.05}[df=4]\). Reading from the row labeled \(df=4\) in Figure 7.1.6 its value is \(-2.132\). The rejection region is \((-\infty ,-2.132]\).
  • Step 5 . As shown in Figure \(\PageIndex{2}\) the test statistic falls in the rejection region. The decision is to reject \(H_0\). In the context of the problem our conclusion is:

The data provide sufficient evidence, at the \(5\%\) level of significance, to conclude that the average price of such rackets purchased at online auctions is less than \(\$179\).

Rejection Region and Test Statistic

To perform the test in Example \(\PageIndex{1}\) using the \(p\)-value approach, look in the row in Figure 7.1.6 with the heading \(df=4\) and search for the two \(t\)-values that bracket the unsigned value \(2.152\) of the test statistic. They are \(2.132\) and \(2.776\), in the columns with headings \(t_{0.050}\) and \(t_{0.025}\). They cut off right tails of area \(0.050\) and \(0.025\), so because \(2.152\) is between them it must cut off a tail of area between \(0.050\) and \(0.025\). By symmetry \(-2.152\) cuts off a left tail of area between \(0.050\) and \(0.025\), hence the \(p\)-value corresponding to \(t=-2.152\) is between \(0.025\) and \(0.05\). Although its precise value is unknown, it must be less than \(\alpha =0.05\), so the decision is to reject \(H_0\).

Example \(\PageIndex{2}\)

A small component in an electronic device has two small holes where another tiny part is fitted. In the manufacturing process the average distance between the two holes must be tightly controlled at \(0.02\) mm, else many units would be defective and wasted. Many times throughout the day quality control engineers take a small sample of the components from the production line, measure the distance between the two holes, and make adjustments if needed. Suppose at one time four units are taken and the distances are measured as

Determine, at the \(1\%\) level of significance, if there is sufficient evidence in the sample to conclude that an adjustment is needed. Assume the distances of interest are normally distributed.

  • Step 1 . The assumption is that the process is under control unless there is strong evidence to the contrary. Since a deviation of the average distance to either side is undesirable, the relevant test is \[H_0: \mu =0.02\\ \text{vs}\\ H_a: \mu \neq 0.02\; @\; \alpha =0.01 \nonumber \] where \(\mu\) denotes the mean distance between the holes.
  • Step 2 . The sample is small and the population standard deviation is unknown. Thus the test statistic is \[T=\frac{\bar{x}-\mu _0}{s /\sqrt{n}} \nonumber \] and has the Student \(t\)-distribution with \(n-1=4-1=3\) degrees of freedom.
  • Step 3 . From the data we compute \(\bar{x}=0.02075\) and \(s=0.00171\). Inserting these values into the formula for the test statistic gives \[T=\frac{\bar{x}-\mu _0}{s /\sqrt{n}}=\frac{0.02075-0.02}{0.00171\sqrt{4}}=0.877 \nonumber \]
  • Step 4 . Since the symbol in \(H_a\) is “\(\neq\)” this is a two-tailed test, so there are two critical values, \(\pm t_{\alpha/2} =-t_{0.005}[df=3]\). Reading from the row in Figure 7.1.6 labeled \(df=3\) their values are \(\pm 5.841\). The rejection region is \((-\infty ,-5.841]\cup [5.841,\infty )\).
  • Step 5 . As shown in Figure \(\PageIndex{3}\) the test statistic does not fall in the rejection region. The decision is not to reject \(H_0\). In the context of the problem our conclusion is:

The data do not provide sufficient evidence, at the \(1\%\) level of significance, to conclude that the mean distance between the holes in the component differs from \(0.02\) mm.

Rejection Region and Test Statistic

To perform the test in "Example \(\PageIndex{2}\)" using the \(p\)-value approach, look in the row in Figure 7.1.6 with the heading \(df=3\) and search for the two \(t\)-values that bracket the value \(0.877\) of the test statistic. Actually \(0.877\) is smaller than the smallest number in the row, which is \(0.978\), in the column with heading \(t_{0.200}\). The value \(0.978\) cuts off a right tail of area \(0.200\), so because \(0.877\) is to its left it must cut off a tail of area greater than \(0.200\). Thus the \(p\)-value, which is the double of the area cut off (since the test is two-tailed), is greater than \(0.400\). Although its precise value is unknown, it must be greater than \(\alpha =0.01\), so the decision is not to reject \(H_0\).

Key Takeaway

  • There are two formulas for the test statistic in testing hypotheses about a population mean with small samples. One test statistic follows the standard normal distribution, the other Student’s \(t\)-distribution.
  • The population standard deviation is used if it is known, otherwise the sample standard deviation is used.
  • Either five-step procedure, critical value or \(p\)-value approach, is used with either test statistic.

IMAGES

  1. Hypothesis Testing- Meaning, Types & Steps

    hypothesis test different sample size

  2. How to determine correct sample size for hypothesis testing?

    hypothesis test different sample size

  3. PPT

    hypothesis test different sample size

  4. Hypothesis Testing:T Test

    hypothesis test different sample size

  5. Hypothesis testing tutorial using p value method

    hypothesis test different sample size

  6. Hypothesis Testing

    hypothesis test different sample size

VIDEO

  1. Two-Sample Hypothesis Tests

  2. One sample test of hypothesis lesson four part 2

  3. FA II STATISTICS/ Chapter no 7 / Testing of hypothesis/ Z distribution / Example 7.8

  4. Hypothesis Testing 3

  5. Testing Hypothesis: One Sample Test in SPSS (Urdu)

  6. How to calculate/determine the Sample size for difference in proportion/percentage between 2 groups?

COMMENTS

  1. How to Calculate Sample Size Needed for Power

    Interpreting the Statistical Power Analysis and Sample Size Results. Statistical power and sample size analysis provides both numeric and graphical results, as shown below. The text output indicates that we need 15 samples per group (total of 30) to have a 90% chance of detecting a difference of 5 units.

  2. 25.3

    Let's take a look at two examples that illustrate the kind of sample size calculation we can make to ensure our hypothesis test has sufficient power. Example 25-4 Section Let \(X\) denote the crop yield of corn measured in the number of bushels per acre.

  3. Choosing the Right Statistical Test

    For a statistical test to be valid, your sample size needs to be large enough to approximate the true distribution of the population being studied. ... or how different a linear slope is from the slope predicted by a null hypothesis. Different test statistics are used in different statistical tests.

  4. Power and Sample Size Determination

    Sample size estimates for hypothesis testing are often based on achieving 80% or 90% power. ... or 0.51 standard deviation units different. We now substitute the effect size and the appropriate Z values for the selected α and power to compute the sample size. Therefore, a sample of size n=31 will ensure that a two-sided test with α =0.05 has ...

  5. Sample size, power and effect size revisited: simplified and practical

    In order to understand and interpret the sample size, power analysis, effect size, and P value, it is necessary to know how the hypothesis of the study was formed. It is best to evaluate a study for Type I and Type II errors ( Figure 1 ) through consideration of the study results in the context of its hypotheses ( 14 - 16 ).

  6. 9.8: Effect Size, Sample Size and Power

    In other words, power increases with the sample size. This is illustrated in Figure 11.7, which shows the power of the test for a true parameter of θ=0.7, for all sample sizes N from 1 to 100, where I'm assuming that the null hypothesis predicts that θ 0 =0.5.

  7. 4.4: Hypothesis Testing

    Testing Hypotheses using Confidence Intervals. We can start the evaluation of the hypothesis setup by comparing 2006 and 2012 run times using a point estimate from the 2012 sample: x¯12 = 95.61 x ¯ 12 = 95.61 minutes. This estimate suggests the average time is actually longer than the 2006 time, 93.29 minutes.

  8. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  9. 7.1: Basics of Hypothesis Testing

    Test Statistic: z = x¯¯¯ −μo σ/ n−−√ z = x ¯ − μ o σ / n since it is calculated as part of the testing of the hypothesis. Definition 7.1.4 7.1. 4. p - value: probability that the test statistic will take on more extreme values than the observed test statistic, given that the null hypothesis is true.

  10. Issues in Estimating Sample Size for Hypothesis Testing

    Suppose we want to test the following hypotheses at aα=0.05: H 0: μ = 90 versus H 1: μ ≠ 90. To test the hypotheses, suppose we select a sample of size n=100. For this example, assume that the standard deviation of the outcome is σ=20. We compute the sample mean and then must decide whether the sample mean provides evidence to support the ...

  11. Sample Size and its Importance in Research

    ESTIMATING SAMPLE SIZE. So how large should a sample be? In hypothesis testing studies, this is mathematically calculated, conventionally, as the sample size necessary to be 80% certain of identifying a statistically significant outcome should the hypothesis be true for the population, with P for statistical significance set at 0.05. Some investigators power their studies for 90% instead of 80 ...

  12. Sample size calculation: Basic principles

    The ideal study to make a researcher happy is one where power of the study is high, or in other words, the study has high chance of making a conclusion with reasonable confidence, be it accepting or rejecting null hypothesis. Sample size matrix, includes different values of sample sizes using varying dimensions of alpha, power (1-β), and ...

  13. How to Perform a t-test with Unequal Sample Sizes

    Is it possible to perform a t-test when the sample sizes of each group are not equal? The short answer: ... = TRUE) Two Sample t-test data: program1 and program2 t = -3.3348, df = 518, p-value = 0.0009148 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -6.111504 -1.580245 sample ...

  14. Sample size determination

    Toggle Required sample sizes for hypothesis tests subsection. 3.1 Tables. 3.2 Mead's resource equation. 3.3 Cumulative distribution function. 4 Stratified sample size. 5 Qualitative ... where a study may be divided into different treatment groups, there may be different sample sizes for each group. Sample sizes may be chosen in several ways ...

  15. hypothesis testing

    I am using a public dataset from DataSUS, I want to see if there are differences between the mean/median for the same type of surgery in different hospitals. The number of surgeries is distributed along the years: For surgery type 1, hospital x has a sample size of 653 from 2011 to 2021 and hospital y has a sample size of 4502 in the same ...

  16. hypothesis testing

    As the sample sizes diverge, statistical power will converge to α α. This fact actually leads to a different suggestion, which I suspect few people have ever heard of and would probably have trouble getting past reviewers (no offense intended): a compromise power analysis. The idea is relatively straightforward: In any power analysis, α α ...

  17. 10.26: Hypothesis Test for a Population Mean (5 of 5)

    The mean pregnancy length is 266 days. We test the following hypotheses. H 0: μ = 266. H a: μ < 266. Suppose a random sample of 40 women who smoke during their pregnancy have a mean pregnancy length of 260 days with a standard deviation of 21 days. The P-value is 0.04.

  18. Hypothesis Test: Difference in Proportions

    where p 1 is the sample proportion in sample 1, where p 2 is the sample proportion in sample 2, n 1 is the size of sample 1, and n 2 is the size of sample 2. Since we have a two-tailed test, the P-value is the probability that the z-score is less than -2.13 or greater than 2.13.

  19. Hypothesis Test Sample Size

    n = Z2σ2 E2 n = Z 2 σ 2 E 2. Continuing our example of heights, let's use E = 6, σ=6, and we'd like at least 95% confidence making α=0.05 and Z=1.96. Insert the values a little math, and we find the sample size. n = 1.96262 62 = 3.84 ≈ 4 n = 1.96 2 6 2 6 2 = 3.84 ≈ 4. In this simple example, the standard deviation and difference of ...

  20. Which Hypothesis testing method can I use for unequal sample size

    All Answers (3) Welch's t-test is roboust to unequal sample sizes and can help you determine whether two groups are significantly different from each other. See the following link for more ...

  21. 8.2: Large Sample Tests for a Population Mean

    The sample is large and the population standard deviation is known. Thus the test statistic is. Z = x¯ −μ0 σ/ n−−√ Z = x ¯ − μ 0 σ / n. and has the standard normal distribution. Step 3. Inserting the data into the formula for the test statistic gives. Z = x¯ −μ0 σ/ n−−√ = 8.2 − 8.1 0.22/ 30−−√ = 2.490 Z = x ...

  22. Assessment of non-linear mixed effects model-based approaches to test

    IMA confirmed its status of promising NLMEM-based approach for hypothesis testing of the drug effect and could be used in the future, after further evaluations, as primary analysis in confirmatory trials.

  23. 8.4: Small Sample Tests for a Population Mean

    Step 1. The assertion for which evidence must be provided is that the average online price is less than the average price in retail stores, so the hypothesis test is. Step 2. The sample is small and the population standard deviation is unknown. Thus the test statistic is and has the Student -distribution with degrees of freedom.