User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.1 - introduction to hypothesis testing.

Previously we used confidence intervals to estimate unknown population parameters. We compared confidence intervals to specified parameter values and when the specific value was contained in the interval, we concluded that there was not sufficient evidence of a difference between the population parameter and the specified value. In other words, any values within the confidence intervals were reasonable estimates of the population parameter and any values outside of the confidence intervals were not reasonable estimates. Here, we are going to look at a more formal method for testing whether a given value is a reasonable value of a population parameter. To do this we need to have a hypothesized value of the population parameter. 

In this lesson we will compare data from a sample to a hypothesized parameter. In each case, we will compute the probability that a population with the specified parameter would produce a sample statistic as extreme or more extreme to the one we observed in our sample. This probability is known as the  p-value  and it is used to evaluate statistical significance.

A test is considered to be statistically significant  when the p-value is less than or equal to the level of significance, also known as the alpha (\(\alpha\)) level. For this class, unless otherwise specified, \(\alpha=0.05\); this is the most frequently used alpha level in many fields. 

Sample statistics vary from the population parameter randomly. When results are statistically significant, we are concluding that the difference observed between our sample statistic and the hypothesized parameter is unlikely due to random sampling variation.

Penn State logo

  • All bulletins
  • Undergraduate
  • Penn State Law
  • Dickinson Law
  • College of Medicine
  • Departments

Statistics (STAT)

Descriptive statistics, hypothesis testing, power, estimation, confidence intervals, regression, one- and 2-way ANOVA, Chi-square tests, diagnostics.

Prerequisite: one undergraduate course in statistics

Analysis of research data through simple and multiple regression and correlation; polynomial models; indicator variables; step-wise, piece-wise, and logistic regression.

Prerequisite: STAT 500 or equivalent; matrix algebra

Analysis of variance and design concepts; factorial, nested, and unbalanced data; ANCOVA; blocked, Latin square, split-plot, repeated measures designs.

Prerequisite: STAT 462 or STAT 501

Design principles; optimality; confounding in split-plot, repeated measures, fractional factorial, response surface, and balanced/partially balanced incomplete block designs.

Prerequisite: STAT 462 or STAT 501 ; STAT 502

Models for frequency arrays; goodness-of-fit tests; two-, three-, and higher- way tables; latent and logistic models.

Prerequisite: STAT 460 or STAT 502 or STAT 516 ; matrix algebra

Analysis of multivariate data; T2-tests; particle correlation; discrimination; MANOVA; cluster analysis; regression; growth curves; factor analysis; principal components; canonical correlations.

Prerequisite: MATH 441 , STAT 501 , STAT 502

Theory and application of sampling from finite populations.

Prerequisite: calculus; 3 credits in statistics

Research and quantitative methods for analysis of epidemiologic observational studies. Non-randomized, intervention studies for human health, and disease treatment. STAT 507 Epidemiologic Research Methods (3) This 3-credit course develops research and quantitative methods related to the design and analysis of epidemiological (mostly observational) studies. Such studies assess the health and disease status of one or more human populations or identify factors associated with health and disease status. To a lesser degree, the course also covers non-randomized, intervention (experimental) studies that may be designed and analyzed with epidemiological methods. This course is a second-level course and complements Biostat Methods, STAT 509 , which is focused on clinical (experimental) trials. Together, these two courses provide students with a complete review of research methods for the design and analysis for common studies related to human health, disease, and treatment. Prerequisite are Intro Biostats ( STAT 250 or equivalent).

Prerequisite: STAT 250 or equivalent

With rapid advances in information technology, the field of Applied Statistics and Data Science has witnessed an explosive growth in the capabilities to generate and collect data. In the business world, very large databases on commercial transactions are generated by retailers. Huge amounts of scientific data are generated in various fields as well using a wide assortment of high throughput technologies. The internet provides another example of billions of web pages consisting of textual and multimedia information that is used by millions of people. Analyzing large complex bodies of data systematically and efficiently remains a challenging problem. This course addresses this problem by covering techniques and new software that automate the analysis and exploration of large complex data sets. Data Mining methods are introduced by using examples to demonstrate the power of the statistical methods for exploring structure in data sets, discovering patterns in data, making predictions, and reducing the dimensionality by Principal Component Analysis (PCA) and other tools for visualization of high dimensional data. Exploratory data analysis, classification methods, clustering methods, and other statistical and algorithmic tools are presented and applied to actual data. In particular, the course investigates classification methods (supervised learning), and clustering methods (unsupervised learning), and other statistical and algorithmic tools as they are applied to actual data. In addition, data mining and learning techniques developed in fields other than statistics, e.g., machine learning and signal processing, will also be reviewed. The Statistics graduate program also offers more in-depth courses on data mining, STAT 557 and STAT 558 . This course focuses on how to use software to investigate and analyze large data sets, whereas STAT 557 and STAT 558 focus more on writing data mining algorithms and the computational aspects of algorithm implementation.

Prerequisite: ( STAT 501 ; STAT 462 )

An introduction to the design and statistical analysis of randomized and observational studies in biomedical research. STAT 509 Design and Analysis of Clinical Trials (3) The objective of the course is to introduce students to the various design and statistical analysis issues in biomedical research. This is intended as a survey course covering a wide variety of topics in clinical trials, bioequivalence trials, toxicological experiments, and epidemiological studies. Many of these topics do not appear in other statistics courses, although a few topics are covered in greater depth in more advanced statistics courses. Computations are performed via the SAS statistical software package. Evaluation methods include four to five homework assignments, an in-class mid-semester examination and an in-class final examination.

Prerequisite: STAT 500

Identification of models for empirical data collected over time. Use of models in forecasting.

Prerequisite: STAT 462 or STAT 501 or STAT 511

Multiple regression methodology using matrix notation; linear, polynomial, and nonlinear models; indicator variables; AOV models; piece-wise regression, autocorrelation; residual analyses.

Prerequisite: STAT 500 or equivalent; matrix algebra; calculus

AOV, unbalanced, nested factors; CRD, RCBD, Latin squares, split-plot, and repeatd measures; incomplete block, fractional factorial, response surface designs; confounding.

Prerequisite: STAT 511

Probability models, random variables, expectation, generating functions, distribution theory, limit theorems, parametric families, exponential families, sampling distributions.

Prerequisite: MATH 230

Sufficiency, completeness, likelihood, estimation, testing, decision theory, Bayesian inference, sequential procedures, multivariate distributions and inference, nonparametric inference.

Prerequisite: STAT 513

Conditional probability and expectation, Markov chains, Poisson processes, Continuous-time Markov chains, Monte Carlo methods, Markov chain Monte Carlo. STAT 515 Stochastic Processes and Monte Carlo Methods (3) This course provides an introduction to stochastic processes and Monte Carlo methods. The course covers topics usually covered in a standard introductory course on stochastic processes, including Markov chains of various kinds. It also covers modern Monte Carlo and Markov chain Monte Carlo methods. Simulation and computing are emphasized throughout the course. The course is divided into two parts: the first part (roughly 8 weeks) provides an introduction to stochastic processes, while the latter (roughly 7 weeks) focuses on Monte Carlo methods, including Markov chain Monte Carlo. The first part of the course begins with a review of elementary conditional probability and expectation before covering basic discrete-time Markov chain theory and Poisson processes. The course then provides students with an overview of continuous-time Markov chains and birth-death processes. The second part of the course covers Monte Carlo methods. Starting with basic random variate generation, the course covers classical Monte Carlo methods such as accept-reject and importance sampling before discussing Markov chain Monte Carlo (MCMC) methods, which includes the Metropolis-Hastings and Gibbs sampling algorithms, and Markov chain theory for discrete-time continuous-space Markov chains.

Prerequisite: MATH 414 , STAT 414 , or STAT 513

Measure theoretic foundation of probability, distribution functions and laws, types of convergence, central limit problem, conditional probability, special topics.

Prerequisite: MATH 403

Cross-listed with: MATH 517

Prerequisite: STAT 517

Cross-listed with: MATH 518

Selected topics in stochastic processes, including Markov and Wiener processes; stochastic integrals, optimization, and control; optimal filtering.

Prerequisite: STAT 516 , STAT 517

Cross-listed with: MATH 519

Location estimation, 2- and K- sample problems, matched pairs, tests for association and covariance analysis when the data are censored.

Prerequisite: STAT 512 , STAT 514

Computational foundations of statistics; algorithms for linear and nonlinear models, discrete algorithms in statistics, graphics, missing data, Monte Carlo techniques.

Prerequisite: STAT 501 or STAT 511 ; STAT 415 ; matrix algebra

Two-way tables; generalized linear models; logistic and conditional logistic models; loglinear models; fitting strategies; model selection; residual analysis.

A coordinate-free treatment of the theory of univariate linear models, including multiple regression and analysis of variance models.

Prerequisite: MATH 415 or STAT 415 or STAT 514 ; STAT 512 ; MATH 436 or MATH 441

Treatment of other normal models, including generalized linear, repeated measures, random effects, mixed, correlation, and some multivariate models.

Prerequisite: STAT 551

A rigorous but non-measure-theoretic introduction to statistical large-sample theory for Ph.D. students. STAT 553 Asymptotic Tools (3) STAT 553 covers most standard statistical asymptotics theory but does not require any knowledge of measure theory (it does not define convergence with probability one, for example). It covers convergence of random variables in both the univariate and multivariate settings, Slutsky's theorem(s) and the delta method, the Lindeberg-Feller central limit theorem, power and sample size, likelihood-based estimation and testing, and U-statistics. Although there is no measure theory in the course, it is a mathematically rigorous course and major results are proved. Many common applications of the theory in mathematical statistics are discussed, and most assignments require the use of a computer.

Prerequisite: STAT 513 and STAT 514

Statistical Analysis of High Throughput Biology Experiments.

Cross-listed with: BIOL 555 , MCIBS 555

This course introduces data mining and statistical/machine learning, and their applications in information retrieval, database management, and image analysis. STAT 557 Data Mining I With rapid advances in information technology, we have witnessed an explosive growth in our capabilities to generate and collect data in the last decade. In the business world, very large databases on commercial transactions have been generated by retailers. Huge amount of scientific data have been generated in various fields as well. For instance, the human genome database project has collected gigabytes of data on the human genetic code. The World Wide Web provides another example with billions of web pages consisting of textual and multimedia information that are used by millions of people. How to analyze huge bodies of data so that they can be understood and used efficiently remains a challenging problem. Data mining addresses this problem by providing techniques and software to automate the analysis and exploration of large complex data sets. Research on data mining have been pursued by researchers in a wide variety of fields, including statistics, machine learning, database management and data visualization. This course on data mining will cover methodology, major software tools and applications in this field. By introducing principal ideas in statistical learning, the course will help students to understand conceptual underpinnings of methods in data mining. Considerable amount of effort will also be put on computational aspects of algorithm implementation. To make an algorithm efficient for handling very large scale data sets, issues such as algorithm scalability need to be carefully analyzed. Data mining and learning techniques developed in fields other than statistics, e.g., machine learning and signal processing, will also be introduced. Example topics include linear classification/regression, logistic regression, model regularization, dimension reduction, prototype methods, decision trees, mixture models, and hidden Markov models. Students will be required to work on projects to practice applying existing software and to a certain extent, developing their own algorithms. Classes will be provided in three forms: lecture, project discussion, and special topic survey/research applications. Project discussion will enable students to share and compare ideas with each other and to receive specific guidance from the instructors. Efforts will be made to help students formulate real-world problems into mathematical models so that suitable algorithms can be applied with consideration of computational constraints. By surveying special topics, students will be exposed to massive literature and become more aware of recent research. Students are strongly encouraged to survey or present their own applications of data mining and statistical learning in graduate research and carry out discussions on data collection and problem formulation.

Prerequisite: STAT 318 or STAT 416 and basic programming skills

Advanced data mining techniques: temporal pattern mining, network mining, boosting, discriminative models, generative models, data warehouse, and choosing mining algorithms. IST (STAT) 558 Data Mining II (3)This course is the second course in a two-course sequence on data mining. It emphasizes advanced concepts and techniques for data mining and their application to large-scale data warehouse. Building on the statistical foundations and underpinnings of data mining introduced in Data Mining I , this course covers advanced topics on data mining; mining association rules from large-scale data warehouse, hierarchical clustering, mining patterns from temporal data, semi-supervised learning, active learning and boosting. In addition, to computational aspects of algorithm implementation, the course will also cover architecture and implementation of data warehouse, data preprocessing (including data cleansing), and the choice of mining algorithms for applications. In addition to discriminative models such as CRF and SVM models, the course will also introduce generative models such as Bayesian Net and LDA. A term project will be developed by each student to apply an advanced data mining algorithm to a multi-dimensional data set. Classes will include lectures, paper discussions, and project presentations. Paper discussions will allow students to discuss state-of-the-art literature related to data mining. Project presentations will enable students to share and compare project ideas with each other and to receive feedback from the instructor.

Prerequisite: STAT 557 or IST 557

Cross-listed with: IST 558

Classical optimal hypothesis test and confidence regions, Bayesian inference, Bayesian computation, large sample relationship between Bayesian and classical procedures.

Prerequisite: STAT 514 ; Concurrent: STAT 517

Basic limit theorems; asymptotically efficient estimators and tests; local asymptotic analysis; estimating equations and generalized linear models.

Prerequisite: STAT 561

Theoretical treatment of methods for analyzing multivariate data, including Hotelling's T2, MANOVA, discrimination, principal components, and canonical analysis.

Prerequisite: STAT 505 , STAT 551

General principles of statistical consulting and statistical consulting experience. Preparation of reports, presentations, and communication aspects of consulting are discussed. Students will be working on client provided short on-call and long term projects.

Prerequisites: STAT 502 , STAT 505 ; STAT 508 ; STAT 557 , STAT 503 ; STAT 504 ; STAT 506 ; STAT 510

Statistical consulting experience including client meetings, development of recommendation reports, and discussion of consulting solutions. STAT 581 Statistical Consulting Practicum II (1 per semester/maximum of 2) This course serves as a continuation of STAT 580 , which provides actual practical experience as a statistical consultant. In STAT 581 , each student will hold a consulting session biweekly (by appointment) with a researcher to discuss the statistical design, analysis and computation aspects required for the client's project. Written reports are required for each project and reviewed for appropriateness and accuracy by a supervising faculty member. In addition, a weekly seminar is utilized to discuss selected projects and non-standard applications of statistical methodology. This course will be offered in the spring and summer, with an anticipated enrollment of 15-20 students per semester.

Prerequisite: STAT 580

Computational methods for modern machine learning models, including applications to big data and non-differentiable objective functions.

Cross-listed with: CSE 584

Continuing seminars which consist of a series of individual lectures by faculty, students, or outside speakers.

This course is designed to help students become better teachers and communicators of statistics. INTAF 592 Teaching Statistics (1) This course is designed to help students become better teachers and communicators of statistics, and specifically to prepare students to supervise undergraduate statistics students in labs or small group settings, or even to lead their own undergraduate courses. Students learn about and discuss pedagogy in statistics, gain experience with practice teaching, and improve via individual feedback.

Creative projects, including nonthesis research, which are supervised on an individual basis and which fall outside the scope of formal courses.

Formal courses given on a topical or special interest subject which may be offered infrequently; several different topics may be taught in one year or term.

No description.

Investigates methods for assessing data collected from experimental and/or observational studies in various research setting. STAT 800 Applied Research Methods (3) This course provides students with a broad exploration of the tools and methods in Applied Statistics. In particular, it investigates basic probability distributions and methods for assessing data collected from experimental and/or observational studies in social science and other research settings. Students learn methods of point and interval estimation, including sample size determinations required to achieve a prescribed margin of error. Additionally, students examine hypothesis testing and the determination of sample sizes to achieve a prescribed power of a given test. The distinction between observational studies and randomized experiments is clarified and the limitations of the conclusions are emphasized. Research articles that are relevant to students' fields of study are used to determine how these statistical methods are being applied. Students then identify and critique appropriate research methods. Students work with various data sets to establish fundamental practices that properly analyze data and interpret results via either Minitab or SPSS statistical software as they formulate and communicate conclusions based on a given research context.

This course is designed to build upon a student's undergraduate quantitative backgrounds by giving an overview of multivariate statistical techniques. Many applied fields often require the use of large, multivariate data sets and students need to be aware of the wide range of statistical tools available to them. Major objectives of this course are to gain a working knowledge of probability theory, univariate and multivariate statistics, the use of copulas, Monte Carlo techniques, and multiple linear regression. Throughout the course, students will have the opportunity to apply these concepts to real world data sets using modern statistical software packages.

This course is designed to build upon a student's background by giving an overview of the techniques of time series analysis often used in applied settings. Many areas of research and application often utilize long time series of data in an effort to model changes and volatility in data measured consistently over time. Major objectives in this course include an overview of linear time series; AR, MA, and ARIMA models; ARCH and GARCH models; nonlinear time series models; multivariate time series models; and models of high-frequency data. Throughout the course, students will have the opportunity to apply these concepts to real world data sets using modern statistical software packages.

Prerequisite: ( MFE 801 , STAT 805 ; STAT 505 )

Print Options

Print this page.

The PDF will include all information unique to this page.

Download Complete Bulletin PDFs

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Prevent plagiarism. Run a free check.

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved April 15, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Runze Li's Homepage

Welcome to Runze Li’s Homepage

hypothesis testing statistics penn state

Runze Li, Eberly Family Chair Professor

Contact information, academic positions and education, honors and awards, professional service.

Research Interests

Selected Publications

Publication list at google scholar , researchid and orcid.

Office : 422A Thomas Building; Tel: 814-865-1555; Fax: 814-863-7114

Email : runzeli@psu….

Mailing Address : Department of Statistics, Penn State University, University Park, PA 16802-2111

Academic Positions & Education

  • Eberly Family Chair in Statistics, Penn State University, 2018 –
  • Professor of Public Health Sciences, Penn State University, 2008 –
  • Verne M. Willaman Professor of Statistics, Penn State University, 2014 – 2018
  • Distinguished Professor, Penn State University, 2012 – 2014
  • Full Professor, Penn State University, 2008 – 2012
  • Associate Professor, Penn State University, 2005 – 2008
  • Assistant Professor, Penn State University, 2000 – 2005
  • Ph.D. in Statistics from Department of Statistics , University of North Carolina at Chapel Hill in 2000
  • NSF Career Award, 2004
  • Fellow, Institute of Mathematical Statistics
  • Fellow, American Statistical Association
  • Fellow, American Association for the Advancement of Science
  • The United Nations’ World Meteorological Organization Gerbier-Mumm International Award for 2012
  • Editor of The Annals of Statistics (2013 – 2015)
  • Highly Cited Researcher in Mathematics (2014 – 2020)
  • Highly Cited Researcher in Cross-field (2022 – )
  • ICSA Distinguished Achievement Award , 2017

Editorial Service

  • Editor of Annals of Statistics (2013 – 2015)
  • Associated Editor of Journal of American Statistical Association (2006 – )
  • Associate Editor of Annals of Mathematical Sciences and Applications (2017 – )
  • Editorial board member of Science China: Mathematics (2018 – )
  • Associate Editor of Journal of Multivariate Analysis (2019 – )
  • Associate Editor of Electronic Journal of Statistics (2022 – )
  • Associated Editor of Annals of Statistics (2007 – 2012)
  • Associated Editor of Statistica Sinica (2005 – 2012)

  Conference Organization

  • Chair of Scientific Program Committee of International Conference on Big Data and Statistical Interdisciplinary Sciences, July 4 – 6, 2023, East China Normal University, Shanghai, P.R. China.
  • Chair of Scientific Program Committee of Statistical Foundations of Data Science and Their Application. May 8 – 10, 2023, Princeton University, USA.
  • Co-Chair of scientific program committee for the 4th IMS China, July 2013, Chengdu, China.
  • Co-chair of scientific program committee for the 2nd IMS Asia Pacific Rim meeting, July 2012, Japan.
  • Co-chair of scientific program committee for the 1st IMS Asia Pacific Rim meeting, July 2009, Seoul, Korea.
  • ASA Biometrics Section program chair for JSM 2007, August 2007, Salt Lake City, Utah
  • IMS program chair for ENAR 2005, March 2005, Austin, Texas.

Scientific program committee member for the following conferences

  • Scientific Program Committee of the Inaugural China Joint Statistical and Data Science Meetings (CJSM) July 10-13, 2023, Beijing, P.R. China.
  • Bernoulli Society’s World Probability and Statistics Congress, August 2020 Seoul, Korea.
  • The 2019 International Conference on Data Science, December 13 – 15, 2019, Fudan University, Shanghai, P. R. China.
  • The 2018 International Chinese Statistical Association (ICSA-China) conference, July 2 – 5, 2018, Qingdao, P. R. China
  • The 2015 Nankai Alumni Statistics Forum, July 2015, Tianjin, China
  • The 10th Frontier Statistics, June 2015, Beijing, China
  • The 1st International Conference on Big Data & Applied Statistics, December 2014 Beijing, China.
  • The 2nd Taihu Lake International Statistics Forum, July 2013, Suzhou, China.
  • The ICSA 2011 Applied Statistics Symposium. New York City, NY
  • Variable selection for high-dimensional data
  • Feature screening for ultrahigh-dimensional data
  • Longitudinal and intensive longitudinal data analysis
  • Nonparametric regression modeling and local polynomial regression
  • Semiparametric regression modeling
  • Statistical genetics and bioinformatics
  • Statistical applications to engineering, meteorological research, neural science research & social behavioral science research

Selecte d Publications

  • Selected Publications in Statistical Journals
  • Selected Interdisciplinary Research Works
  • Acknowledgement

Fang, K.-T., Li, R. and Sudjianto, A. (2006).  Design and Modeling for Computer Experiments.  Chapman and Hall/CRC. Boca Raton, FL.

Fan, J. Li, R., Zhang, C.-H. and Zou, H. (2020).  Statistical Foundations of Data Science.  Chapman and Hall/CRC. Boca Raton, FL.

B. Selected Publications in Statistical Journals:

Tong, Z, Cai, Z., Yang, S. and Li, R. (2022). Model-free conditional feature screening with FDR control.  Journal of American Statistical Association  In press. https://doi.org/10.1080/01621459.2022.2063130

Yu, X., Li, D., Xue, L. and Li, R. (2022). Power-enhanced simultaneous test of high-dimensional mean vectors and covariance matrices with application to gene-set testing.  Journal of American Statistical Association . In press. https://doi.org/10.1080/01621459.2022.2061354

Guo, X, Li, R, Liu, J. and Zeng, M. (2022). Statistical Inference for Linear Mediation Models with High-dimensional Mediators and Application to Studying Stock Reaction to COVID-19 Pandemic.  Journal of Econometrics . In press https://doi.org/10.1016/j.jeconom.2022.03.001

Li, R., Xu, K., Zhou, Y. and Zhu, L. (2022). Test the effects of high-dimensional covariates via aggregating cumulative covariances.  Journal of American Statistical Association . In press. https://doi.org/10.1080/01621459.2022.2044334

Sheng, B., Li, C., Bao, L. and Li, R. (2022). Probabilistic HIV recency classification – a logistic regression without labelled individual level training data.  Annals of Applied Statistics . Accepted.

Guo, X., Ren, H., Zou, C. and Li, R. (2021). Threshold selection for feature screening via error rate control.  Journal of American Statistical Association . In press. https://doi.org/10.1080/01621459.2021.2011735

Li, C. and Li, R. (2021). Linear hypothesis testing in linear models with high dimensional responses.  Journal of American Statistical Association . In press. https://doi.org/10.1080/01621459.2021.1884561

Zou, T, Lan, W, Li, R. and Tsai, C.-L. (2021). Inference on Covariance-Mean Regression.  Journal of Econometrics . In press. https://doi.org/10.1016/j.jeconom.2021.05.004

Nandy, D., Chiaromonte, F. and Li, R. (2022). Covariate information number for feature screening in ultrahigh-dimensional supervised problems.  Journal of American Statistical Association, 117 , 1516-1529. https://doi.org/10.1080/01621459.2020.1864380

Guo, X., Li, R., Liu, J. and Zeng, M. (2022). High-dimensional mediation analysis for selecting DNA methylation Loci mediating childhood trauma and cortisol stress reactivity.  Journal of American Statistical Association, 117 . 1110-1121. https://doi.org/10.1080/01621459.2022.2053136

Ren, H., Zou, C., Chen, N. and Li, R. (2022). Large-scale datastreams surveillance via pattern-oriented-sampling.  Journal of American Statistical Association, 117 , 794 – 808. https://doi.org/10.1080/01621459.2020.1819295

Liu, W. Ke, Y., Liu, J. and Li, R. (2022). Model-free feature screening and FDR control with knockoff features.  Journal of American Statistical Association, 117 , 428 – 443. https://doi.org/10.1080/01621459.2020.1783274.

Liu, W., Yu, X. and Li, R. (2022). Multiple-splitting project test for high dimensional mean vectors.  Journal of Machine Learning and Research 23(71) , 1-27. https://www.jmlr.org/papers/v23/20-1103.html

Cai, Z., Li, R. and Zhang, Y. (2022). A distribution free conditional independence test with applications to causal discovery.  Journal of Machine Learning and Research, 23(85) , 1-41. https://jmlr.org/papers/v23/20-682.html

Huang, Y., Li, C., Li, R. and Yang, S. (2022). An overview of tests on high-dimensional means.  Journal of Multivariate Analysis, 188 , 104813. https://doi.org/10.1016/j.jmva.2021.104813

Li, Z., Wang, Q. and Li, R. (2021). Central limit theorem for linear spectral statistics of large dimensional Kendall’s rank correlation matrices and its applications.  Annals of Statistics, 49 , 1569 – 1593. https://doi.org/10.1214/20-AOS2013

Shi, C., Song, R., Lu, W. and Li, R. (2021). Statistical inference for high-dimensional models via recursive online-score estimation.  Journal of American Statistical Association, 116 . 1307 – 1318. https://doi.org/10.1080/01621459.2019.1710154

Xiao, D., Ke, Y. and Li, R. (2021). Homogeneity structure learning in large-scale panel data with heavy-tailed errors.  Journal of Machine Learning Research, 22(13) :1-42. https://jmlr.org/papers/v22/19-1018.html

Huang, D., Zhu, X., Li, R. and Wang, H. (2021). Feature screening for network data.  Statistica Sinica, 31 , 1239 – 1259. https://doi.org/10.5705/ss.202018-0400

Wang, J., Cai, X. and Li, R. (2021). Variable selection for partially linear models via Bayesian subset modeling with diffusing prior.  Journal of Multivariate Analysis, 183 , 104733. https://doi.org/10.1016/j.jmva.2021.104733.

Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020). A new tuning-free approach to high-dimensional regression (with discussions).  Journal of American Statistical Association, 115 , 1700 – 1729.  [pdf]  [Supplement]  [Comment1]  [Comment2]  [Comment3]  [Rejoinder]

Fang, X. E., Ning, Y. and Li, R. (2020). Test of significance for high-dimensional longitudinal data.  Annals of Statistics, 48 , 2622 – 2645.  [pdf]  [Supplement]

Zhou, T., Zhu, L., Xu, C. and Li, R. (2020). Model-free forward regression via cumulative divergence.  Journal of American Statistical Association, 115 . 1393 – 1405.  [pdf]  [Supplement]

Zou, C., Wang, G. and Li, R. (2020). Consistent selection of the number of change-points via sample-splitting.  Annals of Statistics, 48 , 413 -439.  [pdf]  [Supplement]

Cui, X., Li, R., Yang, G. and Zhou, W. (2020). Empirical likelihood test for large dimensional mean vector.  Biometrika, 107 , 591 – 607.  [pdf]  [Supplement]

Wang, L., Chen, Z., Wang, C. D. and Li, R. (2020). Ultrahigh dimensional precision matrix estimation via refitted cross validation.  Journal of Econometrics, 215 , 118 – 130.  [pdf]

Li, X., Li, R., Xia, Z. and Xu, C. (2020). Distributed feature screening via componentwise debiasing.  Journal of Machine Learning and Research, 21(24) . 1 – 32.  [pdf]

Chu, W., Li, R., Liu, J. and Reimherr, M. (2020). Feature screening for generalized varying coefficient mixed effect models with application to obesity GWAS.  Annals of Applied Statistics, 14 , 276 – 298.  [pdf]  [Supplement]

Cai, Z, Li, R. and Zhu, L. (2020). Online Sufficient Dimension Reduction Through Sliced Inverse Regression.  Journal of Machine Learning and Research, 21(10) . 1 – 25.  [pdf]

Yang, G., Yang, S. and Li, R. (2020). Feature screening in ultrahigh dimensional generalized varying-coefficient models.  Statistica Sinica, 30 , 1049 – 1067.  [pdf]  [Supplement]

Zheng, S., Chen, Z., Cui, H. and Li, R. (2019). Hypothesis testing on linear structures of high dimensional covariance matrix.  Annals of Statistics, 47 , 3300 – 3334.  [pdf]  [Supplement]

Shi, C., Song, R., Chen, Z. and Li, R. (2019). Linear hypothesis testing for high dimensional generalized linear models.  Annals of Statistics, 47 , 2671 – 2703.  [pdf]  [Supplement]

Zhong, P.-S., Li, R. and Santo, S. (2019). Homogeneity test of covariance matrices and change-points identification with high-Dimensional longitudinal data.  Biometrika, 106 , 619 – 634.  [pdf]  [Supplement]

Zhu, X., Chang, X., Wang, H. and Li, R. (2019). Portal nodes screening for large scale social networks.  Journal of Econometrics, 209 , 145- 157.  [pdf]  [Supplement]

Liu, H., Wang, X., Yao, T., Li, R. and Ye, Y. (2019). Sample average approximation with sparsity-inducing penalty for high-dimensional stochastic programming.  Mathematical Programming, 78 , 69-108.

Chen, Z., Fan, J. and Li, R. (2018). Error variance estimation in ultrahigh dimensional additive models.  Journal of American Statistical Association, 113 , 315 – 327.  [pdf]

Li, R., Ren, J.J., Yang, G. and Ye, Y. (2018). Asymptotic behavior of Cox’s partial likelihood and its application to variable selection.  Statistica Sinica, 28 , 2713 – 2731.  [pdf]

Liu, J., Lou, L. and Li, R. (2018). Variable Selection for Partially Linear Models via Partial Correlation.  Journal of Multivariate Analysis, 67 , 418 – 434.

Ma, S., Li, R. and Tsai, C.-L. (2017). Variable screening via partial quantile correlation.  Journal of American Statistical Association, 112 , 650 – 663.  [pdf]  [Supplement]

Zhu, L., Xu, K., Li, R. and Zhong, W. (2017). Project correlation between two random vectors.  Biometrika, 104 , 829 – 843.  [pdf]

Liu, H., Yao, T, Li, R. and Ye, Y. (2017). Folded concave penalized sparse linear regression: complexity, sparsity, statistical performance, and algorithm theory for local solutions.  Mathematical Programming SERIES A, 166 , 207 – 240.  [pdf]

Li, R., Liu, J. and Lou, L. (2017). Variable selection via partial correlation.  Statistica Sinica, 27 , 983 -996.  [pdf]  [Supplement]

Liu, H., Yao, T. and Li, R. (2016). Global solutions to folded concave penalized nonconvex learning.  Annals of Statistics, 44 , 629 – 659.  [pdf]  [Supplement]

Pan, R., Wang, H. and Li, R. (2016). Ultrahigh dimensional multiclass Linear discriminant analysis by pairwise sure independence screening.  Journal of American Statistical Association, 111 , 169 -179.  [pdf]

Zhang, X., Wu, Y., Wang, L. and Li, R. (2016). Variable selection for support vector machine in moderately high dimensions.  Journal of Royal Statistical Society, Series B. 78 , 53 – 76.  [pdf]

Chu, W., Li, R. and Reimherr, M. (2016). Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data.  Annals of Applied Statistics, 10 , 596 – 617.  [pdf]

Li, D. and Li, R. (2016). Local composite quantile regression smoothing for Harris recurrent Markov processes.  Journal of Econometrics, 194 , 44 – 56.  [pdf]

Lan, W., Zhong, P., Li, R., Tsai, C.-L. and Wang, H. (2016). Single coefficient test in high dimensional linear models.  Journal of Econometrics, 195 , 154 – 168.  [pdf]

Zhang, X., Wu, Y., Wang, L. and Li, R. (2016). A consistent information criterion for support vector machines in diverging model spaces.  Journal of Machine Learning Research, 17 , 1 -26.  [pdf]

Zhong, W., Zhu, L., Li, R. and Cui, H. (2016). Regularized quantile regression and robust feature screening for single index models.  Statistica Sinica. 26  , 69 – 95.  [pdf]

Xu, C., Lin, S., Fang, J. and Li, R. (2016). Prediction-based termination rule for greedy learning with massive data.  Statistica Sinica, 26 , 841 – 860.  [pdf]  [supplement]

Yang, G., Yu, Y., Li, R. and Buu, A. (2016). Feature screening in ultrahigh dimensional Cox’s model.  Statistica Sinica, 26 , 881 – 902.  [pdf]

Kurum, E., Li, R., Shiffman, S. and Yao, W. (2016). Time-varying coefficient models for joint modeling binary and continuous outcomes in longitudinal data.  Statistica Sinica, 26 , 979 – 1000.  [pdf]

Liu, X, Cui, Y. and Li, R. (2016). Partial linear varying multi-index coefficient model for integrative gene-environment interactions.  Statistica Sinica, 26 , 1037 – 1060.  [pdf]

Xu, C., Zhang, Y., Li, R. and Wu, X. (2016). On the Feasibility of Distributed Kernel Regression for Big Data.  IEEE Transactions on Knowledge and Data Engineering, 28 , 3041 – 3052.  https://doi.org/10.1109/TKDE.2016.2594060

Cui, H., Li, R. and Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis.  Journal of American Statistical Association. 110 , 630 – 641.  [pdf]

Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector.  Journal of American Statistical Association. 110 , 1658 – 1669.  [pdf]

Chen, Z., Li, R. and Li, Y. (2015). Varying-coefficient models for data with auto-correlated error process.  Statistica Sinica. 25 , 709 – 724.  [pdf]  and supplement  [pdf]

Li, J., Wang, Z., Li, R. and Wu, R. (2015). Bayesian group LASSO for nonparametric varying-coefficient models with application to functional genome-wide association studies.  Annals of Applied Statistics. 9 , 640 – 664.  [pdf]

Liu, J, Zhong, W. and Li, R. (2015). A selective overview of feature screening for ultrahigh dimensional data.  Science China: Mathematics. 58 , 2033 – 2054.  [pdf]

Li, J., Zhong, W., Li, R. and Wu, R. (2014). A fast algorithm for detecting gene-gene interactions in genome-wide association studies.  The Annals of Applied Statistics. 8 , 2292 – 2318.  [pdf]

Liu, J., Li, R. and Wu, R. (2014). Feature Selection for varying coefficient models with ultrahigh dimensional covariates.  Journal of American Statistical Association. 109 , 266 – 274.  [pdf]

Chen, H., Wang, Y., Li, R. and Shear, K. (2014). A note on nonparametric regression test through penalized splines.  Statistica Sinica. 24 , 1143-1160.  [pdf]

Huang, D., Li, R. and Wang, H. (2014). Feature screening for ultrahigh dimensional categorical data with applications.  Journal of Business and Economic Statistics. 32 , 237-244.  [pdf]

Huang, M, Li, R., Wang, H. and Yao, W. (2014). Estimating mixture of Gaussian processes by kernel smoothing.  Journal of Business and Economic Statistics. 32 , 259-270.  [pdf]

Wang, L., Kim, Y. and Li, R. (2013). Calibrating nonconvex penalized regression in ultrahigh dimension.  Annals of Statistics. 41 , 2505 – 2536.  [pdf]

Huang, M., Li, R. and Wang, S. (2013). Nonparametric mixture of regression models.  Journal of American Statistical Association. 108 , 929 – 941.  [pdf]

Yao, W. and Li, R. (2013). New local estimation procedure for nonparametric regression function of longitudinal data.  Journal of Royal Statistical Society, Series B. 75 , 123-138.  [pdf]

Zhu, L., Dong, Y. and Li, R. (2013). Semiparametric estimation of conditional heteroscedasticity through single index modeling.  Statistica Sinica. 24 , 1235 – 1256.  [pdf]

Zhu, H., Li, R. and Kong, L. (2012). Multivariate varying coefficient models for functional responses.  Annals of Statistics. 40 , 2634 – 2666.  [pdf]

Fan, Y. and Li, R. (2012). Variable selection in linear mixed effects models.  Annals of Statistics. 40 , 2043 – 2068.  [pdf]

Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning.  Journal of American Statistical Association. 107 , 1129 – 1139.  [pdf]

Wang, L., Wu, Y. and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultrahigh dimension.  Journal of American Statistical Association. 107 , 214 – 222.  [pdf]

Zhu, L, Li, L., Li, R. and Zhu, L.-X. (2011). Model-free feature screening for ultrahigh dimensional data.  Journal of American Statistical Association. 106 , 1464 – 1475.  [pdf]

Kai, B., Li, R. and Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models.  Annals of Statistics. 39 , 305-332.  [pdf]

Wang, Y., Chen, H., Li, R., Duan, N. and Lewis-Fernandez, R. (2011). Prediction-based structured variable selection through receiver operating curve.  Biometrics. 67 , 896 – 905.  [pdf]

Liang, H, Liu, X., Li, R. and Tsai, C.-L. (2010). Estimation and testing for partially linear single-index models.  Annals of Statistics. 38 , 3811-3836.  [pdf]

Zhang, Y., Li, R. and Tsai, C.-L. (2010). Regularization parameter selections via generalized information criterion.  Journal of American Statistical Association. 105 , 312-323.  [pdf]

Kai, B., Li, R. and Zou, H. (2010). Local CQR smoothing: an efficient and safe alternative to local polynomial regression.  Journal of Royal Statistical Society, Series B. 72 , 49-69.  [pdf]

Ma, Y. and Li, R. (2010). Variable selection in measurement error models.  Bernoulli, 16 , 274-300.  [pdf]

Yin, J., Geng, Z., Li, R. and Wang, H. (2010). Nonparametric covariance model.  Statistica Sinica, 20 , 469-479  [pdf]  and supplement  [pdf]

Wang, L., Kai, B. and Li, R. (2009). Local rank inference for varying coefficient models.  Journal of American Statistical Association, 104 , 1631-1645.  [pdf]

Wang, L. and Li, R. (2009). Weighted Wilcoxon-type smoothly clipped absolute deviation method.  Biometrics. 65 , 564-571.  [pdf]  and Web Document  [pdf]

Liang, H. and Li, R. (2009). Variable selection for partially linear models with measurement Errors.  Journal of American Statistical Association. 104 , 234-248.  [pdf]

Li, R. and Nie, L. (2008). Efficient statistical inference procedures for partially nonlinear models and their applications.  Biometrics, 64 , 904-911.  [pdf]  Web Document  [pdf]

Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models (with discussion).  Annals of Statistics, 36 , 1509-1566.  [pdf]  [Rejoinder]

Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling.  Annals of Statistics. 36 , 261-286.  [pdf]

Wang, H., Li, R. and Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method.  Biometrika. 94 , 553-568.  [pdf]

Li, R. and Nie, L. (2007). A new estimation procedure for a partially nonlinear model via a mixed-effects approach.  The Canadian Journal of Statistics, 35 , 399-411.

Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of covariance function.  Journal of American Statistical Association. 102 , 632-641.  [pdf]

Fan, J. and Li, R. (2006). Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery.  Proceedings of the International Congress of Mathematicians  (M. Sanz-Sole, J. Soria, J.L. Varona, J. Verdera, eds.), Vol. III, European Mathematical Society, Zurich, 595-622.  [pdf]

Qu, A. and Li, R. (2006). Nonparametric modeling and inference function for longitudinal data.  Biometrics. 62 , 379-391  [pdf]

Zhang, A., Fang, K.-T., Li, R. and Sudjianto, A. (2005). Majorization framework for fractional factorial designs.  Annals of Statistics. 33 , 2837-2853.  [pdf]

Hunter, D. and Li, R. (2005). Variable selection using MM algorithms.  Annals of Statistics. 33 , 1617-1642.  [pdf]

Cai, J. Fan, J., Li, R. and Zhou, H. (2005). Variable selection for multivariate failure time data.  Biometrika. 92 , 303-316.  [pdf]

Li, R. and Sudjianto, A. (2005). Analysis of computer experiments using penalized likelihood in Gaussian kriging Models.  Technometrics. 47 , 111-120.  [pdf]

Li, R. and Chow, M. (2005). Evaluation of reproducibility for paired functional data.  Journal of Multivariate Analysis. 93 , 81-101.  [pdf]

Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis.  Journal of American Statistical Association, 99 , 710-723.  [pdf]

Fan, J. and Li, R. (2002). Variable Selection for Cox’s Proportional Hazards Model and Frailty Model.  Annals of Statistics. 30 , 74-99.  [pdf]

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and it oracle properties,  Journal of American Statistical Association. 96 , 1348-1360.  [pdf]

Liang, J., Fang, K.T., Hickernell, F. and Li, R. (2001). Testing multivariate uniformity and its applications.  Mathematics of Computation. 70 , 337-355.  [pdf]

C. Selected Interdisciplinary Research Works:

  • Social Science Research
  • Statistical genetics and Bioinformatics
  • Environmental and Meteorological Research
  • Neural Science, Chemometrics and Computer Experiments

C1. Social Science Research

Rincon, S. J., Dou, N., Murray-Kolb, L. E., Hudy, K. A., Mitchell, D. C., Li, R. and Na, M. (2022). Daily food insecurity is associated with diet quality, but not energy intake, in winter and during COVID-19, among low-income adults.  Nutrition Journal, 21:19  https://doi.org/10.1186/s12937-022-00768-y

Na, M., Dou, N., Liao, Y, Rincon, S. J., Francis, L. A., Graham-Engeland, J. E., Murray-Kolb, L. E. and Li, R (2022). Daily food insecurity predicts lower positive and higher negative affect: An ecological momentary assessment study.  Frontiers in Nutrition . https://doi.org/10.3389/fnut.2022.790519

Parikh, R. Liu, M., Li, E., Li, R. and Chen, J. (2021). Trajectories of mortality risk among patients with cancer and associated end-of-life utilization.  Nature Partner Journals (npj) Digital Medicine. 4 (104) . https://doi.org/10.1038/s41746-021-00477-6.

Buu, A., Cai, Z., Li, R., Wong, S.W., Lin, H.C., Su, W.C., Jorenby, D.E., and Piper, M.E. (2021). Validating e-cigarette dependence scales based on dynamic patterns of vaping behaviors.  Nicotine & Tobacco Research, 23 , 1484 – 1489. https://doi.org/10.1093/ntr/ntab050

Buu, A., Cai, Z., Li, R., Wong, S., Lin, H., Su, W., Jorenby, D.E., and Piper, M.E. (2021). The association between short-term emotion dynamics and cigarette dependence: a comprehensive examination of dynamic measures.  Drug and Alcohol Dependence, 218 , 108341. https://doi.org/10.1016/j.drugalcdep.2020.108341

Coffman, D., Cai, X., and Li, R. (2020). Challenges and opportunities in collecting and modeling ambulatory electrodermal activity data.  JMIR Biomedical Engineering,  , e17106. https://doi.org/10.2196/17106

Buu, A., Yang, S., Li, R., Zimmerman, M.A., Cunningham, R.M., and Walton, M.A. (2020). Examining measurement reactivity in daily diary data on substance use: results from a randomized experiment.  Addictive Behaviors, 102 , 106198. https://doi.org/10.1016/j.addbeh.2019.106198.

Trucco, E. M., Yang, S., Yang, J. J., Zucker, R. A., Li, R. and Buu, A. (2020). Time-varying Effects of GABRG1 and Maladaptive Peer Behavior on Externalizing Behavior from Childhood to Adulthood: Testing Gene x Environment x Development Effects.  Journal of Youth and Adolescence, 49 , 1351 – 1364.

Liu, W., Li, R., Zimmerman, M.A., Walton, M.A., Cunningham, R.M., and Buu, A. (2019). Statistical methods for evaluating the correlation between timeline follow-back data and daily process data with applications to research on alcohol and marijuana use.  Addictive Behaviors: Special Issue on Improving the Implementation of Quantitative methods in Addiction Research, 94 , 147 – 155

Dziak, J.J., Coffman, D. L., Reimherr, M., Petrovich, J., Li, R. and Shiffman, S. (2019). Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientists.  Statistical Survey, 13 , 150 -180.

Dziak, J., Coffman, D. L., Lanza, S. T., Li, R. and Jermiin, L. S. (2019). Sensitivity and specificity of information criteria.  Briefings in Bioinformatics . https://doi.org/10.1093/bib/bbz016

Wang, L., Ma, J., Dholakia, R., Howells, C., Lu, Y., Chen, C., Li, R., Murray, M. and Leslie, D. (2019). Changes in healthcare expenditures after the autism insurance mandate.  Research in Autism Spectrum Disorders, 57 , 97 -104.

Dierker, L., Selya, A., Lanza, S., Li, R. and Rose, J. (2018). Depression and marijuana use disorder symptoms among current marijuana users.  Addictive Bahaviors, 76 , 161 – 168.

Yang, S., Cranford, J. A., Li, R., Zucker, R. A. and Buu, A. (2017). Time-varying effect model for studying gender differences in health behavior.  Statistical Methods in Medical Research. 26 , 2812 – 2820

Yang, S., Cranford, J. A., Jester, J. M., Li, R., Zucker, R. A. and Buu, A. (2017). A time-varying effect model for examining group differences in trajectories of zero-inflated count outcomes with applications in substance abuse research.  Statistics in Medicine, 36 , 827 – 837.

Yang, H., Li, R., Zucker, R and Buu, A. (2016). Two-stage model for time-varying effects of zero-inflated count longitudinal covariates with applications in health behavior research.  Journal of Royal Statistical Society, Series C, 65 , 431 – 444.

Dziak, J., Li, R., Tan, X., Shiffman, S. and Shiyko, M. (2015). Modeling intensive longitudinal data on smoking cessation with mixtures of nonparametric trajectories and time-varying effects.  Psychological Methods. 20,  444 – 469.

Selya, A. S., Updegrove, N., Rose, J., Dierker, L., Tan, X., Hedeker, D., Li, R. and Mermelstein, R. J. (2015). Nicotine Dependence-Varying Effects of Smoking Events on Momentary Mood Changes among Adolescents.  Addictive Behaviors. 41 , 65-71.

Yang, H., Cranford, J., Li, R. and Buu, A. (2015). Two-stage model for time-varying effects of discrete longitudinal covariates with applications in analysis of daily process data.  Statistics in Medicine. 34 , 571 – 581.

Shiyko, M.P., Burkhalter, J., Li., R., and Park, B. J. (2014). Modeling nonlinear time-dependent treatment effects: An application of the time-varying effects model (TVEM).  Journal of Consulting and Clinical Psychology. 82 , 760 – 772.

Dziak, J., Li, R., Zimmerman, M. and Buu, A. (2014). Time-varying effect model for ordinal responses with applications in substance abuse research.  Statistics in Medicine. 33 , 5126 – 5137.

Buu, A., Li, R., Walton, M., Yang, H., Zimmerman, M. A., Cunningham, R. M. (2014). Changes in substance use-related health risk behaviors on the timeline follow-back interview as a function of length of recall period.  Substance Use and Misuse. 49 , 1259 – 1269.

Trail, J. B., Collins, L. M., Rivera, D. F., Li, R, Piper, M. E., Baker, T. B. (2014). Functional Data Analysis for Dynamical System Identification of Behavioral Processes.  Psychological Methods. 19 , 175 – 187.

Shiyko, M., Naab, P., Shiffman, S. and Li, R. (2014). Modeling complexity of EMA data: time-varying lagged effects of negative affect on smoking urges for subgroups of nicotine addiction.  Nicotine & Tobacco Research. 16S2 , S144 – S150.

Vasilenko, S., Piper, M., Lanza, S.T. Liu, X., Yang, J., Li, R. (2014). Time-varying processes involved in smoking lapse in a randomized trial of smoking cessation therapies.  Nicotine & Tobacco Research. 16S2 , S135 – S143.

Lanza, S.T., Vasilenko, S., Liu, X., Piper, M. and Li, R. (2014). Advancing Understanding of the Dynamics of Smoking Cessation Using the Time-Varying Effect Model.  Nicotine & Tobacco Research. 16S2 , S127 – S134.

Liu, X., Li, R., Lanza, S.T., Vasilenko, S. and Piper, M. (2013). Understanding the role of cessation fatigue in the smoking cessation process.  Drug and Alcohol Dependence. 133 , 548 – 555.

Selya1, A.S., Dierker, L. C., Rose, J. S., Hedeker, D., Tan, X., Li, R., Mermelstein, R.J. (2013). Time-varying effects of smoking quantity and nicotine dependence on adolescent smoking regularity.  Drug and Alcohol Dependence. 128 , 230-237.

Buu, A., Li, R., Tan, X. and Zucker, R. A. (2012). Statistical models for longitudinal zero-inflated count data with applications to the substance abuse field.  Statistics in Medicine. 31 , 4074 – 4086.

Tan, X., Shiyko, M., Li, R., Li, Y. and Dierker, L. (2012). Intensive longitudinal data and model with varying effects.  Psychological Methods. 17 , 61 – 77.

Shiyko, M. P., Lanza, S. T., Tan, X., Li, R. and Shiffman, S. (2012). Using the time-varying effects model (TVEM) to examine dynamic associations between negative affect and self confidence on smoking urges: differences between successful quitters and relapsers.  Prevention Science. 13  , 288 – 299.

Cole, P. M., Tan, P. Z., Hall, S. E., Zhang, Y., Crnic, K. A., Blair, C. B., and Li, R. (2011). Developmental changes in anger expression and attention focus during a delay: Learning to wait.  Developmental Psychology, 47 , 1078 – 1089. DOI: 10.1037/a0023813

Buu, A. Johnson, N.J., Li, R. and Tan, X. (2011). New variable selection methods for zero-inflated count data with applications to the substance abuse field.  Statistics in Medicine. 30  , 2326 – 2340.

Tan, X., Dierker, L., Li, R., Rose, J., and The Tobacco Etiology Research Network(TERN). (2011). How spacing of data collection may impact estimates of substance use trajectories?  Substance Use and Misuse. 46  , 758 – 768

Dierker, L., Rose, J., Tan, X., Li, R. and The Tobacco Etiology Research Network(TERN) (2010). Uncovering multiple pathways to substance use: A comparison of methods for identifying population subgroups.  The Journal of Primary Prevention. 31 , 333-348.

Collins, L. M., Dziak, J. J. and Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs.  Psychological Methods, 14 , 202-224.

C2. Statistical genetics and Bioinformatics

Yang, S., Wen, J., Eckert, S. T., Wang, Y., Liu, D., Wu, R., Li, R. and Zhan, X. (2020). Prioritizing genetic variants in GWAS using permutation-assisted lasso tuning.  Bioinformatics, 36 , 3811- 3817.

Dziak, J., Coffman, D. L., Lanza, S. T., Li, R. and Jermiin, L. S. (2020). Sensitivity and specificity of information criteria.  Briefings in Bioinformatics, 21 , 553-565.

Miao, J. Chen, Z. Sebastian, A. Wang, Z., Shrestha, S., Li, X., Praul, C., Albert, I., Li, R. and Cui, L. (2017). Sex-specific biology of the human malaria parasite revealed from transcriptomes and proteomes of male and female gametocytes.  Molecular and Cellular Proteomics, 16 , 537 – 551.

Wang, N.T., Gocik, K., Li, R., Lindsay, B. and Wu, R. (2016). A block mixture model to map eQTLs for gene clustering and networking.  Sci Rep, 6 : 21193.

Percival, C. J., Huang, Y., Jabs, E. W., Li, R. and Richtsmeier, J. T. (2014), Embryonic craniofacial bone volume and bone mineral density in Fgfr2+P253R and nonmutant mice.  Developmental Dynamics. 243 , 541 – 551.

Das, K., Li, R., Sengupta, S. and Wu, R. (2013). A Bayesian semiparametric model for bivariate sparse longitudinal data.  Statistics in Medicine. 32 , 3899 – 3910.

Das, K., Li, J., Fu, G., Wang, Z., Li, R. and Wu, R. (2013). Dynamic semiparametric Bayesian models for genetic mapping of complex trait with irregular longitudinal data.  Statistics in Medicine. 32 , 509 – 523.

Wang, Y., Huang, C., Fang, Y., Yang, Q. and Li, R. (2012). Flexible semiparametric analysis of longitudinal genetic studies by reduced rank smoothing.  Journal of Royal Statistical Society, Series C. 61 , 1 – 24.

Das, K., Li, J., Wang, Z., Gu, G., Tong, C. Li, Y., Xu, M., Ahn, K., Mauger, D.T. Li, R., and Wu, R. (2011). A dynamic model for genome-wide association studies.  Human Genetics. 129 , 629-639.

Li, J., Das, K., Fu, G., Li, R. and Wu, R. (2011). The Bayesian LASSO for genome-wide association studies.  Bioinformatics. 27 , 516 – 523.

Wang, Y., Xu, M., Wang, Z., Tao, M., Zhu, J., Li, R., Wang, L. Berceli, S.A. and Wu, R. (2011). How to cluster gene expression dynamics in response to environmental signals.  Briefings in Bioinformatics. doi:10.1093/bib/bbr032  ,

Fu, G., Wang, Z., Li, J. Das, K., Li, R. and Wu, L. (2011). Integrating ordinary differential equations into functional mapping of biological rhythms.  Journal of Biological Dynamics, 5 , 84-101.

Fu, G., Berg, A. Das, K., Li, J., Li, R. and Wu, R. (2010). A statistical model for mapping morphological shape.  Theoretical Biology and Medical Modelling, 7:28 , doi:10.1186/1742-4682-7-28

C3. Environmental and Meteorological Research

Kurum, E., Li, R., Wang, Y. and Senturk, D. (2014). Nonlinear varying-coefficient models with application to a photosynthesis study.  Journal of Agricultural, Biological, and Environmental Statistics, 19 , 57 – 81.

Yi, C., Ricciuto, D., Li, R., et al. (2010). Climate control to terrestrial carbon exchange across biomes and continents.  Environmental Research Letters. 5:034007 , doi: 10.1088/1748-9326/5/3/034007 ( This paper won the United Nations’ World Meteorological Organization (WMO) 2012 Gerbier-Mumm International Award .)

Yi, C., Li, R., Bakwin, P. S., Desai, A., Ricciuto, D. M., Burns, S., Turnipseed, A. A., Munger, J.W., Wofsy, S. C., Wilson, K., Meyers, T. P., Anderson, D. E., and Monson, R. K. (2004). A nonparametric method for separating photosynthesis and respiration components in CO_2 flux measurements.  Geophysical Research Letters. 31 , L17107, doi:10.1029/2004GL020490

C4. Neural Science, Chemometrics and Computer Experiments

Brown, G., Du, G., Farace, E., Lewis, M. M., Eslinger, P. J., McInerney, J., Kong, L., Li, R., Huang, X., and De Jesus, S., (2022). Subcortical iron accumulation pattern may predict neuropsychological outcomes after STN 3 DBS: a pilot study.  Journal of Parkinson’s Disease.  https://doi.org/10.3233/JPD-212833

Li, C., Wang, X., Du, G, Chen, H, Brown, G., Lewis, M.M., Yao, T., Li, R., Huang, X. (2021). Folded concave penalized learning in identifying high-dimensional MRI markers for Parkinson’s disease: a benchmark of whole brain MRI markers.  Journal of Neuroscience Methods, 357 , 109157. https://doi.org/10.1016/j.jneumeth.2021.109157

Du, G., Lewis, M. M., Kanekar, S., Sterling, N. W., He, L., Kong, L, Li, R. and Huang, X. (2017). Combined diffusion tensor imaging and R2* differentiate Parkinson’s disease and atypical Parkinsonism.  American Journal of Neuroradiology, 38 , 966-972.

Zhang, L., Wang, X., Wang, M., Sterling, N. W., Du, G. Lewis, M. M., Yao, T., Mailman, R. B., Li, R. Huang, X. (2017). Circulating cholesterol levels may link to the factors influencing Parkinson’s risk.  Frontiers in Neurology, 8 , 501.

Liu, H., Du, G. Zhang, L. , Lewis, M., Wang, X., Yao, T., Li, R. and Huang, X. (2016). Folded concave penalized learning in identifying multimodal MRI marker for Parkinson’s disease.  Journal of Neuroscience Methods, 268 , 1 – 6.

Zhu, H., Kong, L., Li, R., Styner, M., Gerig, G., Lin, W. and Gilmore, J. H. (2011). FADTTS: Functional Analysis of Diffusion Tensor Tract Statistics.  Neuroimage. 56 , 1412 – 1425

Yin, H., Fang, K.-T., Li, R. and Liang, Y.-Z. (2007). Empirical Kriging models and their applications to QSAR.  Journal of Chemometrics. 21 , 43-52.

Peng, X.-L., Yin, H., Li, R. and Fang, K.-T. (2006). The application of kriging and empirical kriging based on the variables selected by SCAD.  Analytica Chimica Acta, 578 , 178-185.

Fang, K.-T., Li, R. and Sudjianto, A. (2006).  Design and Modeling for Computer Experiments . Chapman and Hall/CRC. Boca Raton, FL.

Li, R. and Sudjianto, A. (2005). Analysis of computer experiments using penalized likelihood in Gaussian kriging Models.  Technometrics. 47 , 111-120.

D. Acknowledgement

My research has been supported by National Science Foundation and National Institute of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the grant agents. The pdf files posted in this website are merely for convenience. If you or your institute do not have copyright to access the papers, you should not download the papers from this homepage.

  • Diversity and Inclusion
  • Faculty Positions
  • Location and Maps
  • Our Department

Graduate Programs

  • How to Apply

Online Programs

  • Student Resources
  • Grad Student Assoc.
  • Graduate Student Highlights
  • Frequently Asked Questions
  • Online Courses
  • Prospective Students

Undergraduate Programs

  • Integrated Degree
  • Research Opportunities
  • Statistics Club
  • Syllabus Archive
  • Career Development
  • Administrative Faculty
  • Graduate Students
  • M.A.S. Students
  • Affiliated Faculty
  • Lindsay Assistant Professors
  • Online Instructors
  • Post-docs / Visitors
  • Astrostatistics
  • Bayesian Statistics
  • Biostatistics & Bioinformatics
  • Business Analytics
  • Computational Statistics
  • Functional Data Analysis
  • High Dimensional Data
  • Imaging Science
  • Social Science
  • Spatial and Spatiotemporal Data

  • Statistical Network Science
  • Statistical and Machine Learning
  • Statistics Education
  • Giving to Statistics

Dept. of Department of Statistics

Department of statistics.

We offer two distinct programs of study for our graduate students. We also offer two additional dual degrees that can be obtained in conjunction with a degree in Statistics.

Statistics faculty

The statistics program provides students with a strong foundation in statistics and the broad skills to prepare them for advanced study in statistics or employment in industry and government.

Undergraduate students

Choose the online Applied Statistics Graduate Program that fulfills your goals. Penn State World Campus offers both an online master's degree and a graduate certificate in applied statistics.

student on laptop

Featured Faculty

Nicole Lazar

Nicole Lazar

Matt Beckman

Matthew Beckman

Runze Li

Statistics Up to Date

hypothesis testing statistics penn state

Penn State doctoral student Alina Kuvelkar recently served as a teaching assistant for the 18th Summer School in Statistics for Astronomers.

hypothesis testing statistics penn state

Penn State statistician Xiang Zhu studies the genetics of heart disease in diverse populations to improve heart health for everyone.

Hero banner two preview of the dark blue Zoom background template.

Lecture Series

Distinguished lectures, weekly talks and upcoming events, department links, the statistical consulting center.

The SCC provides statistical advise and support for Penn State researchers, members of industry and government in the areas of: Research Planning, Design of Experiments and Survey Sampling, Statistical Modeling and Analysis, Analysis Results Interpretation, Advice

consulting meeting

CTSI Consulting Center

CTSI Biostatistics, Epidemiology and Research Design Center offers one-stop service for researchers who need study design, biostatistical and data management expertise. We provide Penn State researchers with a full range of biostatistical expertise and service.

CTSI

Center for Astrostatistics

The Center serves as a crossroads where researchers at the interfaces between statistics, data analysis, astronomy, space and observational physics collaborate, develop and share methodologies, and together prepare the next generation of researchers.

Jogesh Babu

  • Fast Algorithms for Estimating Covariance Matrices of Stochastic Gradient Descent Solutions April 18, 3:30 pm - April 18, 4:30 pm 201 Thomas Building
  • Tell a Clear Technical Story April 25, 9:30 am - April 25, 12:00 pm Foster Auditorium, Penn State Libraries
  • Johnson Lecture in Science Communication to be held April 25 April 25, 6:30 pm - April 25, 7:30 pm 100 Thomas Building
  • Transforming Slide Design April 26, 9:30 am - April 26, 12:00 pm Foster Auditorium, Penn State Libraries

Statology

Statistics Made Easy

Introduction to Hypothesis Testing

A statistical hypothesis is an assumption about a population parameter .

For example, we may assume that the mean height of a male in the U.S. is 70 inches.

The assumption about the height is the statistical hypothesis and the true mean height of a male in the U.S. is the population parameter .

A hypothesis test is a formal statistical test we use to reject or fail to reject a statistical hypothesis.

The Two Types of Statistical Hypotheses

To test whether a statistical hypothesis about a population parameter is true, we obtain a random sample from the population and perform a hypothesis test on the sample data.

There are two types of statistical hypotheses:

The null hypothesis , denoted as H 0 , is the hypothesis that the sample data occurs purely from chance.

The alternative hypothesis , denoted as H 1 or H a , is the hypothesis that the sample data is influenced by some non-random cause.

Hypothesis Tests

A hypothesis test consists of five steps:

1. State the hypotheses. 

State the null and alternative hypotheses. These two hypotheses need to be mutually exclusive, so if one is true then the other must be false.

2. Determine a significance level to use for the hypothesis.

Decide on a significance level. Common choices are .01, .05, and .1. 

3. Find the test statistic.

Find the test statistic and the corresponding p-value. Often we are analyzing a population mean or proportion and the general formula to find the test statistic is: (sample statistic – population parameter) / (standard deviation of statistic)

4. Reject or fail to reject the null hypothesis.

Using the test statistic or the p-value, determine if you can reject or fail to reject the null hypothesis based on the significance level.

The p-value  tells us the strength of evidence in support of a null hypothesis. If the p-value is less than the significance level, we reject the null hypothesis.

5. Interpret the results. 

Interpret the results of the hypothesis test in the context of the question being asked. 

The Two Types of Decision Errors

There are two types of decision errors that one can make when doing a hypothesis test:

Type I error: You reject the null hypothesis when it is actually true. The probability of committing a Type I error is equal to the significance level, often called  alpha , and denoted as α.

Type II error: You fail to reject the null hypothesis when it is actually false. The probability of committing a Type II error is called the Power of the test or  Beta , denoted as β.

One-Tailed and Two-Tailed Tests

A statistical hypothesis can be one-tailed or two-tailed.

A one-tailed hypothesis involves making a “greater than” or “less than ” statement.

For example, suppose we assume the mean height of a male in the U.S. is greater than or equal to 70 inches. The null hypothesis would be H0: µ ≥ 70 inches and the alternative hypothesis would be Ha: µ < 70 inches.

A two-tailed hypothesis involves making an “equal to” or “not equal to” statement.

For example, suppose we assume the mean height of a male in the U.S. is equal to 70 inches. The null hypothesis would be H0: µ = 70 inches and the alternative hypothesis would be Ha: µ ≠ 70 inches.

Note: The “equal” sign is always included in the null hypothesis, whether it is =, ≥, or ≤.

Related:   What is a Directional Hypothesis?

Types of Hypothesis Tests

There are many different types of hypothesis tests you can perform depending on the type of data you’re working with and the goal of your analysis.

The following tutorials provide an explanation of the most common types of hypothesis tests:

Introduction to the One Sample t-test Introduction to the Two Sample t-test Introduction to the Paired Samples t-test Introduction to the One Proportion Z-Test Introduction to the Two Proportion Z-Test

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

1: Overview of ANOVA

  • Last updated
  • Save as PDF
  • Page ID 33143

  • Penn State's Department of Statistics
  • The Pennsylvania State University

Upon completion of this lesson, you should be able to:

  • Become familiar with the standard ANOVA basics.
  • Apply the Exploratory Data Analysis (EDA) basics for ANOVA appropriate data.

In previous statistics courses analysis of variance (ANOVA) has been applied in very simple settings, mainly involving one group or factor as the explanatory variable. In this course, ANOVA models are extended to more complex situations involving several explanatory variables. The experimental design aspects are discussed as well. Even though the ANOVA methodology developed in the course is for data obtained from designed experimental settings, the same methods may be used to analyze data from observational studies as well. However, let us keep in mind that the conclusions made may not be as sound because observational studies do not satisfy the rigorous conditions that the designed experiments are subjected to.

If you aren't familiar with the difference between observational and experimental studies, you should be reviewing introductory statistical concepts which are essential for success in this course!

"Classic" analysis of variance (ANOVA) is a method to compare average (mean) responses to experimental manipulations in controlled environments. For example, if people who want to lose weight are randomly selected to participate in a weight-loss study, each person might be randomly assigned to a dieting group, an exercise group, and a "control" group (for which there is no intervention). The mean weight loss for each group is compared to every other group.

Recall that a fundamental tenet of the scientific method is that results should be reproducible. A designed experiment provides this through replication and generates data that requires the calculation of mean (average) responses.

  • 1.1: The Working Hypothesis The working hypothesis, boxplots, and means plots.
  • 1.2: The 7-Step Process of Statistical Hypothesis Testing Explanation of the 7 steps for statistical hypothesis testing.
  • 1.3: Chapter 1 Summary

IMAGES

  1. hypothesis test formula statistics

    hypothesis testing statistics penn state

  2. Hypothesis Testing- Meaning, Types & Steps

    hypothesis testing statistics penn state

  3. Your Guide to Master Hypothesis Testing in Statistics

    hypothesis testing statistics penn state

  4. Hypothesis Testing Statistics Formula Sheet

    hypothesis testing statistics penn state

  5. Hypothesis Testing : Infographics

    hypothesis testing statistics penn state

  6. What is Hypothesis Testing? Types and Methods

    hypothesis testing statistics penn state

VIDEO

  1. Hypothesis Testing for Mean: p-value is more than the level of significance (Hat Size Example)

  2. Hypothesis Testing

  3. Statistics with Crayons: Hypothesis Testing with Hans & Hera

  4. Why I think c = P = NP (and why c from E=mc^2 is the speed of everything)

  5. Statistics for Hypothesis Testing

  6. Hypothesis Testing: p Value for a Left Tail Test With Standardized Test Statistics z=-2.23 invnorm

COMMENTS

  1. 5.1

    A test is considered to be statistically significant when the p-value is less than or equal to the level of significance, also known as the alpha ( α) level. For this class, unless otherwise specified, α = 0.05; this is the most frequently used alpha level in many fields. Sample statistics vary from the population parameter randomly.

  2. 9.5: Additional Information and Full Hypothesis Test Examples

    The hypothesis test itself has an established process. This can be summarized as follows: Determine H0 and Ha. ... Penn State University, Greater Allegheny STAT 200: Introductory Statistics (OpenStax) GAYDOS 9: Hypothesis Testing with One Sample ... The following screen shots display the summary statistics from the hypothesis test.

  3. 10: Hypothesis Testing with Two Samples

    A statistics Worksheet: The student will select the appropriate distributions to use in each case. The student will conduct hypothesis tests and interpret the results. 10.E: Hypothesis Testing with Two Samples (Exercises) These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

  4. 9.1: Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  5. Statistics (STAT) & Penn State

    Penn State; People; Departments; Search. New Bulletin Edition: ... Statistics is the art and science of decision making in the presence of uncertainty. The purpose of Statistics 100 is to help students improve their ability to assess statistical information in both everyday life and other University courses. ... Review of hypothesis testing ...

  6. Statistics (STAT) & Penn State

    Descriptive statistics, hypothesis testing, power, estimation, confidence intervals, regression, one- and 2-way ANOVA, Chi-square tests, diagnostics. Prerequisite: one undergraduate course in statistics STAT 501: Regression Methods. ... Download Penn State Law Bulletin PDF.

  7. Hypothesis Testing

    Present the findings in your results and discussion section. Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps. Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test.

  8. Runze Li's Homepage

    Mailing Address: Department of Statistics, Penn State University, University Park, PA 16802-2111. Academic Positions & Education . Eberly Family Chair in Statistics, Penn State University, 2018 - ... Hypothesis testing on linear structures of high dimensional covariance matrix.

  9. 9.2: Outcomes and the Type I and Type II Errors

    Incorrect calculations or misunderstood summary statistics can yield errors that affect the results. A … In every hypothesis test, the outcomes are dependent on a correct interpretation of the data. ... Penn State University, Greater Allegheny STAT 200: Introductory Statistics (OpenStax) GAYDOS 9: Hypothesis Testing with One Sample ...

  10. Department of Statistics

    The Statistical Consulting Center. The SCC provides statistical advise and support for Penn State researchers, members of industry and government in the areas of: Research Planning, Design of Experiments and Survey Sampling, Statistical Modeling and Analysis, Analysis Results Interpretation, Advice. CTSI Consulting Center.

  11. PDF Hypothesis Testing, Page 1 Hypothesis Testing

    Hypothesis Testing . Author: John M. Cimbala, Penn State University Latest revision: 04 May 2022 . Introduction • An important part of statistics is hypothesis testing - making a decision about some hypothesis (reject or accept), based on statistical methods. • The four basic steps in any kind of hypothesis testing are: o Determine the ...

  12. 1.2: The 7-Step Process of Statistical Hypothesis Testing

    Step 2: State the Alternative Hypothesis. HA: treatment level means not all equal (1.2.2) The reason we state the alternative hypothesis this way is that if the null is rejected, there are many possibilities. For example, μ1 ≠ μ2 = … = μT is one possibility, as is μ1 = μ2 ≠ μ3 = … = μT. Many people make the mistake of stating the ...

  13. Introduction to Hypothesis Testing

    A hypothesis test consists of five steps: 1. State the hypotheses. State the null and alternative hypotheses. These two hypotheses need to be mutually exclusive, so if one is true then the other must be false. 2. Determine a significance level to use for the hypothesis. Decide on a significance level.

  14. How to Find P Value from a Test Statistic

    Hypothesis tests are used to test the validity of a claim that is made about a population. This claim that's on trial, in essence, is called the null hypothesis (H 0).The alternative hypothesis (H a) is the one you would believe if the null hypothesis is concluded to be untrue.Learning how to find the p-value in statistics is a fundamental skill in testing, helping you weigh the evidence ...

  15. 1: Overview of ANOVA

    Explanation of the 7 steps for statistical hypothesis testing. 1.3: Chapter 1 Summary This page titled 1: Overview of ANOVA is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by Penn State's Department of Statistics via source content that was edited to the style and standards of the LibreTexts platform; a detailed ...