• Book information

Active Statistics

Page updated: 2024-03-02, information.

statistical inferences assignment active

Web page for the book Active Statistics by Andrew Gelman and Aki Vehtari .

Published by Cambridge University Press in 2024 (March). © Copyright by Andrew Gelman and Aki Vehtari 2024.

This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging ‘flipped classroom’ environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels, and practice exam questions to help guide learning. Designed to accompany the authors’ previous textbook Regression and Other Stories , its modular nature and wealth of material allow this book to be adapted to different courses and texts or be used by learners as a hands-on workbook.

Buy Active Statistics from Cambridge University Press .

Regression and Other Stories web page .

If you notice an error submit an issue or send an email.

  • Active learning 1.1. Flipped classroom and collaborative learning 1.2. What happens during the semester? 1.3. Active learning in class 1.4. Scheduling 1.5. Assessment and feedback 1.6. Some general issues in teaching and communication
  • Setting up a course of study 2.1 What to learn and how to learn it 2.2 Computing 2.3 Course material 2.4 Real data and simulated data 2.5 Two kinds of computer demonstrations 2.6 Challenges in learning particular topics 2.7 Adapting to your goals and learning style 2.8 Using these materials in introductory or more advanced courses 2.9 Balance between challenges and solutions
  • Week by week: the first semester 3.1 Introduction to quantitative social science 3.2 Prediction as a unifying theme in statistics and causal inference 3.3 Data collection and visualization 3.4 Review of mathematics and probability 3.5 Statistical inference 3.6 Simulation 3.7 Background on regression modeling 3.8 Linear regression with a single predictor 3.9 Least squares and fitting regression models 3.10 Prediction and Bayesian inference 3.11 Linear regression with multiple predictors 3.12 Assumptions, diagnostics, and model evaluation 3.13 Regression with linear and log transformations
  • Week by week: the second semester 4.14 Review of basic statistics and regression modeling 4.15 Logistic regression163 4.16 Working with logistic regression 4.17 Other generalized linear models 4.18 Design and sample size decisions 4.19 Poststratification and missing-data imputation 4.20 How can flipping a coin help you estimate causal effects? 4.21 Causal inference using regression on the treatment variable 4.22 Causal inference as prediction 4.23 Imbalance and lack of complete overlap 4.24 Additional topics in causal inference 4.25 Advanced regression and multilevel models 4.26 Review of the course
  • Pre-test questions A.1 First semester A.2 Second semester
  • Final exam questions B.1 Multiple-choice questions for the first semester B.2 Multiple-choice questions for the second semester B.3 Take-home exam
  • Outlines of classroom activities C.1 First semester C.2 Second semester

List of classroom activities

First semester, second semester.

  • ‘This book is an extraordinarily rich and generous resource for teaching statistics. Full of stories about challenging statistical problems, the examples reflect all the messiness of real life, and encourage class discussion of what went wrong and how to do things better. The extensive collection of lesson plans and exercises provides a fine inspiration to adopt a different, more active, style of teaching.’ - David Spiegelhalter - University of Cambridge
  • ‘This is a wonderful read for any statistics teacher. The focus on real-world applications and statistical thinking ensures that everyone will gain new insights and perspectives no matter how long you have been teaching.’ - Beth Chance - California Polytechnic State University
  • ‘I have to say reading this book came as a pleasant surprise for me. I thought I was going to be reviewing another statistics book and instead, it was an insightful read on how to think about teaching statistics. I found it engaging and helpful in rethinking how I approach teaching statistics.’ - Pamela Davis-Kean - University of Michigan

Library Home

Statistical Inference For Everyone

(3 reviews)

statistical inferences assignment active

Brian Blais, Bryant University

Copyright Year: 2017

Publisher: Brian Blais

Language: English

Formats Available

Conditions of use.

Attribution-ShareAlike

Learn more about reviews.

statistical inferences assignment active

Reviewed by Kenese Io, PhD candidate, Colorado State University on 11/30/20

The book illustrates a very pragmatic approach with little theoretical application. I would recommend this text to anyone who is teaching applied stats at an early level. read more

Comprehensiveness rating: 4 see less

The book illustrates a very pragmatic approach with little theoretical application. I would recommend this text to anyone who is teaching applied stats at an early level.

Content Accuracy rating: 5

The book is accurate with a number of very helpful examples for new researchers. The examples provide examples of code for students to use and draw from as they execute their own examples. They also provide examples with commonly used datasets which is very helpful for some students who may be working on their final projects as an undergraduate or homework assignments as a first year graduate student.

Relevance/Longevity rating: 5

The book is problem or problem set oriented which will allow the book to maintain its longevity. The examples offer analysis of old data but this is very helpful as instructors can assign similar problem sets with new datasets while the students have an excellent tool to rely on.

Clarity rating: 4

The book is generally clear but given that it is problem oriented some of the theoretical background is scarce and leaves a bit to be desired. Nevertheless the examples really allow for an immersive experience.

Consistency rating: 5

The book does a great job of following a clear formula of historical background/ brief theoretical walkthrough/ long examples that force you engage critically with the assignment.

Modularity rating: 5

The book is very easy to assign as the text quickly jumps to examples of matlab code that will draw students to engage with it. I can imagine students constantly flipping between their own code and the text to help simplify analysis or execute their code.

Organization/Structure/Flow rating: 4

The book is organized relatively well. I would have liked to see a few of the later chapters earlier likt the common tests for statistical significance but it generally goes from broader to more narrow perspectives.

Interface rating: 5

The graphs and code examples are laid out well and the text works great in acrobat reader.

Grammatical Errors rating: 5

Cultural Relevance rating: 4

The text does not offer any critical analysis here but this is due to maintaining general examples. I think an instructor could easily assign more critical assignments that rely on the intuition laid out in the book.

Reviewed by Jimmy Chen, Assistant Professor, Bucknell University on 1/26/19

As far as Statistical Inference goes, the author has done a great job covering the essential topics. The breadth and the depth of the content are are well balanced. I believe this book can be a great supplemental material for any statistics or... read more

Comprehensiveness rating: 5 see less

As far as Statistical Inference goes, the author has done a great job covering the essential topics. The breadth and the depth of the content are are well balanced. I believe this book can be a great supplemental material for any statistics or probability course. Students would have no problems studying this book themselves because the author has explained concepts clearly and provided ample examples.

I think the content is fine. Examples, illustration, and computer codes are all very helpful for the readers to understand the content.

The relevance of the book is great. Most supporting examples would be easily relatable to most students. Most statistics or probability concepts discussed in the book are timeless. Detailed computer codes make it easy for verification.

Clarity rating: 5

The author has explained concepts very well. The flow of the text and examples are great and thoughtful, make it very easy to flow.

The consistency of the text is great.

The modularity of the text is great. I could easily adopt the entire book or use only certain sections of the book for my teaching.

Organization/Structure/Flow rating: 5

The topics in the text are presented in a logical, clear fashion.

The layout of the text are clear and easily readable. Imagines, charts, and tables are clear and concise. Very easy to follow.

The text contains no grammatical errors.

Cultural Relevance rating: 5

The text is not culturally insensitive or offensive in any way.

Reviewed by Adam Molnar, Assistant Professor, Oklahoma State University on 5/21/18

This book is not a comprehensive introduction to elementary statistics, or even statistical inference, as the author Brian Blais deliberately chose not to cover all topics of statistical inference. For example, the term matched pairs never... read more

Comprehensiveness rating: 2 see less

This book is not a comprehensive introduction to elementary statistics, or even statistical inference, as the author Brian Blais deliberately chose not to cover all topics of statistical inference. For example, the term matched pairs never appears; neither do Type I or Type II error. The Student's t distribution gets much less attention than in almost every other book; the author offers a rarely used standard-deviation change (page 153) as a way to keep things Gaussian. The author justifies the reduced topic set by calling typical "traditional" approaches flawed in the first pages of text, the Proposal. Instead, Blais tries to develop statistical inference from logic, in a way that might be called Bayesian inference. Other books have taken this approach, more than just Donald Berry's book mentioned on page 32. [For more references, see the ICOTS6 paper by James Albert at https://iase-web.org/documents/papers/icots6/3f1_albe.pdf ] None of those books are open-resource, though; an accurate, comprehensive textbook would have potential. This PDF does not contain that desired textbook, however. As mentioned below under accuracy, clarity, and structure, there are too many missing elements, including the lack of an index. As I read, this PDF felt more like a augmented set of lecture notes than a textbook which stands without instructor support. It's not good enough. (For more on this decision, see the other comments at the end.)

Content Accuracy rating: 2

The only non-troubling number of errors in a textbook is zero, but this book has many more than that. In the version I read from the Minnesota-hosted website, my error list includes not defining quartiles from the left (page 129), using ICR instead of IQR (page 133), misstating the 68-95-99 rule as 65-95-99 (page 134), flipping numbers in the combination of the binomial formula (page 232), repeating Figure C-2 as Figure C-1 (page 230), and titling section 2.6 "Monte Hall" instead of "Monty Hall". Infuriatingly, several of these mistakes are correct elsewhere in the book - Monty Hall in section 5.4, the binomial formula in the main text, and 68-95-99 on page 142. I'm also annoyed that some datasets have poor source citations, such as not indicating Fisher's iris data on page 165 and calling something "student measurements during a physics lab" on page 173.

Relevance/Longevity rating: 4

Because there are so many gaps, including full support for computer presentation, it would be easy to update completed sections as needed, such as when Python becomes less popular.

Clarity rating: 2

Quality of the prose is fine, but many jargon terms are not well defined. Students learning a subject need clear definitions, but they don't appear. In my notes, I see exclusive (page 36), conditioning (page 40), complement (used on page 40 but never appears in the text), posterior (page 54), correlation (page 55), uniform distribution (page 122), and Greek letters for which the reference to a help table appears on page 140, but Greek letters have appeared earlier. Additionally, several important terms receive insufficient or unusual definitions, including labeling summary description of data as inference (page 34), mutually exclusive (page 36) versus independence (page 43), and plus/minus (page 146, as this definition of +/- applies in lab bench science but not social sciences). I appreciate that the author is trying to avoid calculus with "area under the curve" on page 127, but there's not enough written for a non-calculus student to understand how these probabilities are calculated. To really understand posterior computation, a magical computer and a few graphs aren't good enough.

Internal consistency to Bayesian inference is quite strong; many of the examples repeat the steps of Bayes' Recipe. This is not a concern.

Modularity rating: 3

The book needs to be read in linear order, like most statistics books, but that's not necessarily a negative thing. Dr. Blais is trying to take the reader through a structured development of Bayesian inference, which has a single path. There are a few digressions, such as fallacies about probability reasoning, but the book generally maintains a single path from chapters 1 to at least 7. Most sections are less than 10 pages and don't involve lots of self-references. Although I rated reorganization possibility as low, due to the near-impossibility of realigning the argument, I consider it harsh to penalize the book for this.

Organization/Structure/Flow rating: 2

There isn't enough structure for a textbook; this feels more like a set of augmented lecture notes that a book for guided study. I mentioned poor definitions under "Clarity", so let me add other topics here. The most frustrating structural problem for me is the presentation of the fundamental idea of Bayesian inference, posterior proportional to prior * likelihood. The word prior first appears on page 48, but receives no clear definition until a side-note on page 97. The word posterior first appears on page 53. Despite this, the fundamental equation is never written with all three words in the correct places until page 154. That's way, way too late. The three key terms should have been defined around page 50 and drilled throughout all the sections. The computer exercises also have terrible structure. The first section with computer exercises, section 2.9 on page 72, begins with code. The reader has no idea about the language, package, or purpose of these weird words in boxes. The explanation about Python appears as Appendix A, after all the exercises. It would not have taken much to explain Python and the purpose of the computer exercises in Chapter 1 or 2, but it didn't happen. A classroom instructor could explain this in class, but the Open Resource Project doesn't provide an instructor with every book. Like the other things mentioned, the structure around computing is insufficient.

I had no problems navigating through the chapters. Images look fine as well.

Grammar and spelling are good. I only spotted one typographical error, "posterier" on page 131, and very few awkward sentences.

This is a US-centered book, since it refers to the "standard deck" of playing cards on page 36 as the US deck; other places like Germany have different suits. The book also uses "heads" and "tails" for coins, while other countries such as Mexico use different terms. I wouldn't call this a major problem, however; the pictures and diagrams make the coins and cards pretty clear. There aren't many examples involving people, so there's little scope for ethnicities and backgrounds.

On Brian Blais's webpage for the book, referenced only in Appendix A for some reason, he claims that this book is targeted to the typical Statistics 101 college student. It is NOT. Typical college students need much more support than what this book offers - better structure, better scaffolding, more worked examples, support for computing. What percentage of all college students would pick up Python given the contents presented here? My prior estimate would be 5%. Maybe students at Bryant university, where Pre-Calculus is the lowest math course offered, have a higher Python rate, but the bottom 20% of my students at Oklahoma State struggle with order of operations and using the combinations formula. They would need massive support, and Oklahoma State enrolls above-average college students. This book does not have massive support - or much at all. This makes me sad, because I've argued that we should teach hypothesis testing through credible intervals because I think students will understand the logic better than the frequentist philosophical approach. In 2014, I wrote a guest blog post [http://www.culturalcognition.net/blog/2014/9/5/teaching-how-to-teach-bayess-theorem-covariance-recognition.html] on teaching Bayes' Rule. I would value a thorough book that might work for truly typical students, but for the students in my everyone, this won't work.

Table of Contents

  • 1 Introduction to Probability
  • 2 Applications of Probability
  • 3 Random Sequences and Visualization
  • 4 Introduction to Model Comparison
  • 5 Applications of Model Comparison
  • 6 Introduction to Parameter Estimation
  • 7 Priors, Likelihoods, and Posteriors
  • 8 Common Statistical Significance Tests
  • 9 Applications of Parameter Estimation and Inference
  • 10 Multi-parameter Models
  • 11 Introduction to MCMC
  • 12 Concluding Thoughts

BibliographyAppendix A: Computational AnalysisAppendix B: Notation and StandardsAppendix C: Common Distributions and Their PropertiesAppendix D: Tables

Ancillary Material

About the book.

This is a new approach to an introductory statistical inference textbook, motivated by probability theory as logic. It is targeted to the typical Statistics 101 college student, and covers the topics typically covered in the first semester of such a course. It is freely available under the Creative Commons License, and includes a software library in Python for making some of the calculations and visualizations easier.

About the Contributors

Brian Blais professor of Science and Technology, Bryant University and a research professor at the Institute for Brain and Neural Systems, Brown University. 

Contribute to this Page

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

Unit 4A: Introduction to Statistical Inference

  • Last updated
  • Save as PDF
  • Page ID 31265

CO-1: Describe the roles biostatistics serves in the discipline of public health.

CO-6: Apply basic concepts of probability, random variation, and commonly used statistical probability distributions.

Review: We are about to move into the inference component of the course and it is a good time to be sure you understand the basic ideas presented regarding exploratory data analysis.

  • What is Data?
  • Types of Variables
  • One Categorical Variable
  • Histograms & Stemplots
  • Describing Distributions
  • Measures of Center
  • Measures of Spread
  • Measures of Position
  • The “Normal” Shape

Video: Unit 4A: Introduction to Statistical Inference (15:45)

Recall again the Big Picture, the four-step process that encompasses statistics: data production, exploratory data analysis, probability and inference.

We are about to start the fourth and final unit of this course, where we draw on principles learned in the other units (Exploratory Data Analysis, Producing Data, and Probability) in order to accomplish what has been our ultimate goal all along: use a sample to infer (or draw conclusions) about the population from which it was drawn.

As you will see in the introduction, the specific form of inference called for depends on the type of variables involved — either a single categorical or quantitative variable, or a combination of two variables whose relationship is of interest.

Introduction

Learning objectives.

LO 6.23: Explain how the concepts covered in Units 1 – 3 provide the basis for statistical inference.

We are about to start the fourth and final part of this course — statistical inference, where we draw conclusions about a population based on the data obtained from a sample chosen from it.

The purpose of this introduction is to review how we got here and how the previous units fit together to allow us to make reliable inferences. Also, we will introduce the various forms of statistical inference that will be discussed in this unit, and give a general outline of how this unit is organized.

In the Exploratory Data Analysis unit, we learned to display and summarize data that were obtained from a sample. Regardless of whether we had one variable and we examined its distribution, or whether we had two variables and we examined the relationship between them, it was always understood that these summaries applied only to the data at hand; we did not attempt to make claims about the larger population from which the data were obtained.

Such generalizations were, however, a long-term goal from the very beginning of the course. For this reason, in the unit on Producing Data , we took care to establish principles of sampling and study design that would be essential in order for us to claim that, to some extent, what is true for the sample should be also true for the larger population from which the sample originated.

These principles should be kept in mind throughout this unit on statistical inference, since the results that we will obtain will not hold if there was bias in the sampling process, or flaws in the study design under which variables’ values were measured.

Perhaps the most important principle stressed in the Producing Data unit was that of randomization. Randomization is essential, not only because it prevents bias, but also because it permits us to rely on the laws of probability, which is the scientific study of random behavior.

In the Probability unit, we established basic laws for the behavior of random variables. We ultimately focused on two random variables of particular relevance: the sample mean (x-bar) and the sample proportion (p-hat), and the last section of the Probability unit was devoted to exploring their sampling distributions.

We learned what probability theory tells us to expect from the values of the sample mean and the sample proportion, given that the corresponding population parameters — the population mean (mu, μ ) and the population proportion ( p ) — are known.

As we mentioned in that section, the value of such results is more theoretical than practical, since in real-life situations we seldom know what is true for the entire population. All we know is what we see in the sample, and we want to use this information to say something concrete about the larger population.

Probability theory has set the stage to accomplish this: learning what to expect from the value of the sample mean, given that population mean takes a certain value, teaches us (as we’ll soon learn) what to expect from the value of the unknown population mean, given that a particular value of the sample mean has been observed.

Similarly, since we have established how the sample proportion behaves relative to population proportion, we will now be able to turn this around and say something about the value of the population proportion, based on an observed sample proportion. This process — inferring something about the population based on what is measured in the sample — is (as you know) called statistical inference .

Types of Inference

LO: 1.9 Distinguish between situations using a point estimate, an interval estimate, or a hypothesis test.

We will introduce three forms of statistical inference in this unit, each one representing a different way of using the information obtained in the sample to draw conclusions about the population. These forms are:

Point Estimation

Interval estimation.

  • Hypothesis Testing

Obviously, each one of these forms of inference will be discussed at length in this section, but it would be useful to get at least an intuitive sense of the nature of each of these inference forms, and the difference between them in terms of the types of conclusions they draw about the population based on the sample results.

In point estimation , we estimate an unknown parameter using a single number that is calculated from the sample data.

Based on sample results, we estimate that p, the proportion of all U.S. adults who are in favor of stricter gun control, is 0.6.

In interval estimation , we estimate an unknown parameter using an interval of values that is likely to contain the true value of that parameter (and state how confident we are that this interval indeed captures the true value of the parameter).

Based on sample results, we are 95% confident that p, the proportion of all U.S. adults who are in favor of stricter gun control, is between 0.57 and 0.63.

In hypothesis testing , we begin with a claim about the population (we will call the null hypothesis), and we check whether or not the data obtained from the sample provide evidence AGAINST this claim.

It was claimed that among all U.S. adults, about half are in favor of stricter gun control and about half are against it. In a recent poll of a random sample of 1,200 U.S. adults, 60% were in favor of stricter gun control. This data, therefore, provides some evidence against the claim.

Soon we will determine the probability that we could have seen such a result (60% in favor) or more extreme IF in fact the true proportion of all U.S. adults who favor stricter gun control is actually 0.5 (the value in the claim the data attempts to refute).

It is claimed that among drivers 18-23 years of age (our population) there is no relationship between drunk driving and gender.

A roadside survey collected data from a random sample of 5,000 drivers and recorded their gender and whether they were drunk.

The collected data showed roughly the same percent of drunk drivers among males and among females. These data, therefore, do not give us any reason to reject the claim that there is no relationship between drunk driving and gender.

Did I Get This?: Types of Inference

In terms of organization, the Inference unit consists of two main parts: Inference for One Variable and Inference for Relationships between Two Variables. The organization of each of these parts will be discussed further as we proceed through the unit.

Inference for One Variable

The next two topics in the inference unit will deal with inference for one variable. Recall that in the Exploratory Data Analysis (EDA) unit, when we learned about summarizing the data obtained from one variable where we learned about examining distributions, we distinguished between two cases; categorical data and quantitative data.

We will make a similar distinction here in the inference unit. In the EDA unit, the type of variable determined the displays and numerical measures we used to summarize the data. In Inference, the type of variable of interest (categorical or quantitative) will determine what population parameter is of interest.

  • When the variable of interest is categorical , the population parameter that we will infer about is the population proportion (p) associated with that variable. For example, if we are interested in studying opinions about the death penalty among U.S. adults, and thus our variable of interest is “death penalty (in favor/against),” we’ll choose a sample of U.S. adults and use the collected data to make an inference about p, the proportion of U.S. adults who support the death penalty.
  • When the variable of interest is quantitative , the population parameter that we infer about is the population mean (mu, µ) associated with that variable. For example, if we are interested in studying the annual salaries in the population of teachers in a certain state, we’ll choose a sample from that population and use the collected salary data to make an inference about µ, the mean annual salary of all teachers in that state.

The following outlines describe some of the important points about the process of inferential statistics as well as compare and contrast how researchers and statisticians approach this process.

Outline of Process of Inference

Here is another restatement of the big picture of statistical inference as it pertains to the two simple examples we will discuss first.

  • A simple random sample is taken from a population of interest.
  • In order to estimate a population parameter , a statistic is calculated from the sample . For example:

Sample mean (x-bar)

Sample proportion (p-hat)

  • We then learn about the DISTRIBUTION of this statistic in repeated sampling (theoretically) . We now know these are called sampling distributions !
  • Using THIS sampling distribution we can make inferences about our population parameter based upon our sample statistic .

It is this last step of statistical inference that we are interested in discussing now.

Applied Steps (What do researchers do?)

One issue for students is that the theoretical process of statistical inference is only a small part of the applied steps in a research project. Previously, in our discussion of the role of biostatistics , we defined these steps to be:

  • Planning/design of study
  • Data collection
  • Data analysis
  • Presentation
  • Interpretation

You can see that:

  • Both exploratory data analysis and inferential methods will fall into the category of “Data Analysis” in our previous list.
  • Probability is hiding in the applied steps in the form of probability sampling plans, estimation of desired probabilities, and sampling distributions.

Among researchers, the following represent some of the important questions to address when conducting a study.

  • What is the population of interest?
  • What is the question or statistical problem?
  • How to sample to best address the question given the available resources?
  • How to analyze the data?
  • How to report the results?

AFTER you know what you are going to do, then you can begin collecting data!

Theoretical Steps (What do statisticians do?)

Statisticians, on the other hand, need to ask questions like these:

  • What assumptions can be reasonably made about the population ?
  • What parameter(s) in the population do we need to estimate in order to address the research question?
  • What statistic(s) from our sample data can be used to estimate the unknown parameter(s) ?
  • Is it unbiased ?
  • How variable will it be for the planned sample size?
  • What is the distribution of this statistic? ( Sampling Distribution )

Then, we will see that we can use the sampling distribution of a statistic to:

  • Provide confidence interval estimates for the corresponding parameter .
  • Conduct hypothesis tests about the corresponding parameter .

Standard Error of a Statistic

LO: 1.10: Define the standard error of a statistic precisely and relate it to the concept of the sampling distribution of a statistic.

In our discussion of sampling distributions , we discussed the variability of sample statistics ; here is a quick review of this general concept and a formal definition of the standard error of a statistic .

  • All statistics calculated from samples are random variables.
  • The distribution of a statistic (from a sample of a given sample size) is called the sampling distribution of the statistic.
  • The standard deviation of the sampling distribution of a particular statistic is called the standard error of the statistic and measures variability of the statistic for a particular sample size.

The standard error of a statistic is the standard deviation of the sampling distribution of that statistic , where the sampling distribution is defined as the distribution of a particular statistic in repeated sampling.

  • The standard error is an extremely common measure of the variability of a sample statistic.

In our discussion of sampling distributions, we looked at a situation involving a random sample of 100 students taken from the population of all part-time students in the United States, for which the overall proportion of females is 0.6. Here we have a categorical variable of interest, gender.

We determined that the distribution of all possible values of p-hat (that we could obtain for repeated simple random samples of this size from this population) has mean p = 0.6 and standard deviation

\(\sigma_{\hat{p}}=\sqrt{\dfrac{p(1-p)}{n}}=\sqrt{\dfrac{0.6(1-0.6)}{100}}=0.05\)

which we have now learned is more formally called the standard error of p-hat. In this case, the true standard error of p-hat will be 0.05 .

We also showed how we can use this information along with information about the center (mean or expected value) to calculate probabilities associated with particular values of p-hat. For example, what is the probability that sample proportion p-hat is less than or equal to 0.56? After verifying the sample size requirements are reasonable, we can use a normal distribution to approximate

\(P(\hat{p} \leq 0.56)=P\left(Z \leq \dfrac{0.56-0.6}{0.05}\right)=P(Z \leq-0.80)=0.2119\)

Similarly, for a quantitative variable, we looked at an example of household size in the United States which has a mean of 2.6 people and standard deviation of 1.4 people.

If we consider taking a simple random sample of 100 households, we found that the distribution of sample means (x-bar) is approximately normal for a large sample size such as n = 100.

The sampling distribution of x-bar has a mean which is the same as the population mean, 2.6, and its standard deviation is the population standard deviation divided by the square root of the sample size:

\(\dfrac{\sigma}{\sqrt{n}}=\dfrac{1.4}{\sqrt{100}}=0.14\)

Again, this standard deviation of the sampling distribution of x-bar is more commonly called the standard error of x-bar , in this case 0.14. And we can use this information (the center and spread of the sampling distribution) to find probabilities involving particular values of x-bar.

\(P(\bar{x}>3)=P\left(Z>\dfrac{3-2.6}{\dfrac{1.4}{\sqrt{100}}}\right)=P(Z>2.86)=0.0021\)

  • Wrap-Up (Inference for One Variable)

Statistics 200: Introduction to Statistical Inference

Zhou Fan, Stanford University, Autumn 2016

Classroom change: Starting Wednesday October 5, lectures will be in 370-370 (rather than McCullough 115)

Description.

Statistical concepts and methods developed in a mathematical framework: Hypothesis testing, point estimation, confidence intervals. Neyman-Pearson theory, maximum likelihood estimation, likelihood ratio tests, Bayesian analysis. Asymptotic theory and simulation-based methods.

Prerequisites: Probability theory (STATS 116), multivariable calculus (MATH 52), and basic computer programming (or willingness to learn as you go!)

Teaching staff

Requirements and policies.

Approximately weekly, due Wednesdays at 5PM. Late homeworks will NOT be accepted (unless with advance written permission from the teaching staff). Your lowest homework grade will be dropped.

Homework assignments will include simple computing exercises asking you to perform small simulations, create histograms and plots, and analyze data. You may use any language (e.g. R, Python, Matlab) and will be graded only on your results, not on the quality of your code.

Collaboration

You are encouraged to discuss homework problems with your classmates, but you must submit your own individual homework write-up, in your own words and using your own code for the programming exercises . Please indicate at the top of your write-up the names of the students with whom you worked.

statistical inferences assignment active

For reference:

statistical inferences assignment active

Webpage design courtesy of Dennis Sun

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

AP®︎/College Statistics

Course: ap®︎/college statistics   >   unit 14.

  • Significance test for a proportion free response example
  • Significance test for a proportion free response (part 2 with correction)
  • Free response example: Significance test for a mean

Choosing an appropriate inference procedure

  • (Choice A)   A t ‍   -test for a mean A A t ‍   -test for a mean
  • (Choice B)   A z ‍   -test for a proportion B A z ‍   -test for a proportion
  • (Choice C)   A t ‍   -interval for a mean C A t ‍   -interval for a mean
  • (Choice D)   A z ‍   -interval for a proportion D A z ‍   -interval for a proportion
  • (Choice E)   A paired t ‍   -test for the mean difference E A paired t ‍   -test for the mean difference

Improving Your Statistical Inferences

Daniël Lakens

statistical inferences assignment active

“No book can ever be finished. While working on it we learn just enough to find it immature the moment we turn away from it.” Karl Popper, The Open Society and its Enemies

This open educational resource integrates information from my blog , my MOOCs Improving Your Statistical Inferences and Improving Your Statistical Questions , and my scientific work . The goal is to make the information more accessible, and easier to update in the future.

I have re-used and adapted (parts of) my own open access articles, without adding quotation marks. Immense gratitude to my collaborators Casper Albers, Farid Anvari, Aaron Caldwell, Harlan Cambell, Nicholas Coles, Lisa DeBruine, Marie Delacre, Zoltan Dienes, Noah van Dongen, Alexander Etz, Ellen Evers, Jaroslav Gottfriend, Seth Green, Christopher Harms, Arianne Herrera-Bennett, Joe Hilgard, Peder Isager, Maximilian Maier, Neil McLatchie, Brian Nosek, Friedrich Pahlke, Pepijn Obels, Amy Orben, Anne Scheel, Janneke Staaks, Leo Tiokhin, Mehmet Tunç, Duygu Uygun Tunç, and Gernot Wassmer, who have contributed substantially to the ideas in this open educational resource. I would also like to thank Zeki Akyol, Emrah Er, Max Ditroilo, Lewis Halsey, Kyle Hamilton, David Lane, Jeremiah Lewis, Mike Smith, and Leong Utek who gave comments on GitHub or Twitter to improve this textbook. The first version of this textbook was created during a sabbatical at Padova University, with thanks to the Advanced Data Analysis for Psychological Science students, and Gianmarco Altoè and Ughetta Moscardino for their hospitality.

Thanks to Dale Barr and Lisa DeBruine for the webexercises package that is used to create the interactive questions at the end of each chapter. Thanks to Nick Brown for his editing service.

If you find any mistakes, or have suggestions for improvement, you can submit an issue on the GitHub page of this open educational resource. You can also download a pdf or epub version (click the download button in the menu on the top left). This work is shared under a CC-BY-NC-SA License . You can cite this resource as:

Lakens, D. (2022). Improving Your Statistical Inferences. Retrieved from https://lakens.github.io/statistical_inferences/. https://doi.org/10.5281/zenodo.6409077

You can check the Change Log at the end of this book to track updates over time, or to find the version number if you prefer to cite a specific version of this regularly updated textbook.

This work is dedicated to Kyra, the love of my life.

Picture of the author Daniel Lakens

Eindhoven University of Technology

Statistical Inference Assignment

My code repository for coursera data science specialization by john hopkins university.

The project consists of two parts:

  • A simulation exercise.
  • Basic inferential data analysis.

You will create a report to answer the questions. Given the nature of the series, ideally you’ll use knitr to create the reports and convert to a pdf. (I will post a very simple introduction to knitr). However, feel free to use whatever software that you would like to create your pdf.

Each pdf report should be no more than 3 pages with 3 pages of supporting appendix material if needed (code, figures, etcetera).

Review criteria

  • Did you show where the distribution is centered at and compare it to the theoretical center of the distribution?
  • Did you show how variable it is and compare it to the theoretical variance of the distribution?
  • Did you perform an exploratory data analysis of at least a single plot or table highlighting basic features of the data?
  • Did the student perform some relevant confidence intervals and/or tests?
  • Were the results of the tests and/or intervals interpreted in the context of the problem correctly?
  • Did the student describe the assumptions needed for their conclusions?

Part 1: Simulation Exercise

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should

  • Show the sample mean and compare it to the theoretical mean of the distribution.
  • Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
  • Show that the distribution is approximately normal. Focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.

This exercise is asking you to use your knowledge of the theory given in class to relate the two distributions.

  • View my work on rpubs
  • View it on pdf

Part 2: Basic Inferential Data Analysis Instructions

Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.

  • Load the ToothGrowth data and perform some basic exploratory data analyses
  • Provide a basic summary of the data.
  • Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

State your conclusions and the assumptions needed for your conclusions.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Inferential Statistics | An Easy Introduction & Examples

Inferential Statistics | An Easy Introduction & Examples

Published on September 4, 2020 by Pritha Bhandari . Revised on June 22, 2023.

While descriptive statistics summarize the characteristics of a data set, inferential statistics help you come to conclusions and make predictions based on your data.

When you have collected data from a sample , you can use inferential statistics to understand the larger population from which the sample is taken.

Inferential statistics have two main uses:

  • making estimates about populations (for example, the mean SAT score of all 11th graders in the US).
  • testing hypotheses to draw conclusions about populations (for example, the relationship between SAT scores and family income).

Table of contents

Descriptive versus inferential statistics, estimating population parameters from sample statistics, hypothesis testing, other interesting articles, frequently asked questions about inferential statistics.

Descriptive statistics allow you to describe a data set, while inferential statistics allow you to make inferences based on a data set.

  • Descriptive statistics

Using descriptive statistics, you can report characteristics of your data:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability concerns how spread out the values are.

In descriptive statistics, there is no uncertainty – the statistics precisely describe the data that you collected. If you collect data from an entire population, you can directly compare these descriptive statistics to those from other populations.

Inferential statistics

Most of the time, you can only acquire data from samples, because it is too difficult or expensive to collect data from the whole population that you’re interested in.

While descriptive statistics can only summarize a sample’s characteristics, inferential statistics use your sample to make reasonable guesses about the larger population.

With inferential statistics, it’s important to use random and unbiased sampling methods . If your sample isn’t representative of your population, then you can’t make valid statistical inferences or generalize .

Sampling error in inferential statistics

Since the size of a sample is always smaller than the size of the population, some of the population isn’t captured by sample data. This creates sampling error , which is the difference between the true population values (called parameters) and the measured sample values (called statistics).

Sampling error arises any time you use a sample, even if your sample is random and unbiased. For this reason, there is always some uncertainty in inferential statistics. However, using probability sampling methods reduces this uncertainty.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

statistical inferences assignment active

The characteristics of samples and populations are described by numbers called statistics and parameters :

  • A statistic is a measure that describes the sample (e.g., sample mean ).
  • A parameter is a measure that describes the whole population (e.g., population mean).

Sampling error is the difference between a parameter and a corresponding statistic. Since in most cases you don’t know the real population parameter, you can use inferential statistics to estimate these parameters in a way that takes sampling error into account.

There are two important types of estimates you can make about the population: point estimates and interval estimates .

  • A point estimate is a single value estimate of a parameter. For instance, a sample mean is a point estimate of a population mean.
  • An interval estimate gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.

Both types of estimates are important for gathering a clear idea of where a parameter is likely to lie.

Confidence intervals

A confidence interval uses the variability around a statistic to come up with an interval estimate for a parameter. Confidence intervals are useful for estimating parameters because they take sampling error into account.

While a point estimate gives you a precise value for the parameter you are interested in, a confidence interval tells you the uncertainty of the point estimate. They are best used in combination with each other.

Each confidence interval is associated with a confidence level. A confidence level tells you the probability (in percentage) of the interval containing the parameter estimate if you repeat the study again.

A 95% confidence interval means that if you repeat your study with a new sample in exactly the same way 100 times, you can expect your estimate to lie within the specified range of values 95 times.

Although you can say that your estimate will lie within the interval a certain percentage of the time, you cannot say for sure that the actual population parameter will. That’s because you can’t know the true value of the population parameter without collecting data from the full population.

However, with random sampling and a suitable sample size, you can reasonably expect your confidence interval to contain the parameter a certain percentage of the time.

Your point estimate of the population mean paid vacation days is the sample mean of 19 paid vacation days.

Hypothesis testing is a formal process of statistical analysis using inferential statistics. The goal of hypothesis testing is to compare populations or assess relationships between variables using samples.

Hypotheses , or predictions, are tested using statistical tests . Statistical tests also estimate sampling errors so that valid inferences can be made.

Statistical tests can be parametric or non-parametric. Parametric tests are considered more statistically powerful because they are more likely to detect an effect if one exists.

Parametric tests make assumptions that include the following:

  • the population that the sample comes from follows a normal distribution of scores
  • the sample size is large enough to represent the population
  • the variances , a measure of variability , of each group being compared are similar

When your data violates any of these assumptions, non-parametric tests are more suitable. Non-parametric tests are called “distribution-free tests” because they don’t assume anything about the distribution of the population data.

Statistical tests come in three forms: tests of comparison, correlation or regression.

Comparison tests

Comparison tests assess whether there are differences in means, medians or rankings of scores of two or more groups.

To decide which test suits your aim, consider whether your data meets the conditions necessary for parametric tests, the number of samples, and the levels of measurement of your variables.

Means can only be found for interval or ratio data , while medians and rankings are more appropriate measures for ordinal data .

Correlation tests

Correlation tests determine the extent to which two variables are associated.

Although Pearson’s r is the most statistically powerful test, Spearman’s r is appropriate for interval and ratio variables when the data doesn’t follow a normal distribution.

The chi square test of independence is the only test that can be used with nominal variables.

Regression tests

Regression tests demonstrate whether changes in predictor variables cause changes in an outcome variable. You can decide which regression test to use based on the number and types of variables you have as predictors and outcomes.

Most of the commonly used regression tests are parametric. If your data is not normally distributed, you can perform data transformations.

Data transformations help you make your data normally distributed using mathematical operations, like taking the square root of each value.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Confidence interval
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Prevent plagiarism. Run a free check.

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

A statistic refers to measures about the sample , while a parameter refers to measures about the population .

A sampling error is the difference between a population parameter and a sample statistic .

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). Inferential Statistics | An Easy Introduction & Examples. Scribbr. Retrieved March 25, 2024, from https://www.scribbr.com/statistics/inferential-statistics/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, parameter vs statistic | definitions, differences & examples, descriptive statistics | definitions, types, examples, hypothesis testing | a step-by-step guide with easy examples, what is your plagiarism score.

Fiveable

Find what you need to study

3.7 Inference and Experiments

5 min read • december 29, 2022

Kanya Shah

Jed Quiaoit

Drawing conclusions from the data and analyzing possible areas of error helps create a valid inference about the population from which the sample was chosen in the context of well-designed experiments. 

Statistical inference is a method of using data to make conclusions about a larger population. In statistical inference , we attribute our conclusions based on the data to the distribution from which the data were collected. This means that we assume that the sample we have collected is representative of the larger population and that the conclusions we draw from the sample can be generalized to the population.

For example, if we collect data on the height of a sample of 100 people and calculate the mean height , we can use statistical inference to make conclusions about the mean height of the entire population of people. We do this by assuming that the sample of 100 people is representative of the larger population and that the mean height we calculated for the sample is the same as the mean height of the population.

Statistical inference allows us to make conclusions about a population based on a sample, even if we do not have access to the entire population. This is an important tool in research, as it allows us to study small samples of people or other entities and draw conclusions about the larger population. 🤔

Inferences for Studies/Samples

Sampling variability refers to the fact that different random samples of the same size from the same population can produce different estimates of a population parameter, such as the mean or standard deviation . This variability is a natural occurrence in statistical sampling and is due to the fact that each sample is a unique subset of the population. 😀

Larger samples tend to produce more accurate estimates that are closer to the true population value than smaller random samples. This is because larger samples are more representative of the population and are less likely to be affected by sampling error . Sampling error is the difference between the estimate obtained from a sample and the true population value.

The larger the sample size, the smaller the sampling error is likely to be.

Inferences for Experiments

Random assignment of treatments to experimental units is a key aspect of experimental research design. It involves randomly assigning subjects or other experimental units to different treatment conditions in order to control for extraneous variables. By randomly assigning subjects to different conditions, the researcher can be confident that any observed differences between the groups are due to the treatment rather than other factors. 🎰

Random assignment allows researchers to conclude that some observed changes are so large as to be unlikely to have occurred by chance. Such changes are said to be statistically significant , which means that they are likely to be real rather than due to random variation.

If the experimental units used in an experiment are representative of some larger group of units, the results of the experiment can be generalized to the larger group. Random selection of experimental units gives a better chance that the units will be representative of the larger group, which increases the validity of the study. Random selection of units ensures that the data will be representative of the designated population. 👨‍👩‍👧‍👦 

We'll learn more about how to determine if differences are enough to be considered statistically significant in Unit 6 and Unit 7 .

Inference about a population can be made only if the individuals from a population taking part in the study were randomly selected. 

A well designed experiment that randomly assigns experimental units to treatments allows inferences about cause and effect .

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2Fea742de6dc014987b153a8accae41342--ap-statistics-teacher-stuff.jpg?alt=media&token=1782a990-8965-40e1-8253-a6886880c4fc

Courtesy of Pinterest

Practice Problem

A researcher is interested in studying the effectiveness of a new study technique on college students' grades. The researcher plans to recruit 100 students from a large university and randomly assign them to either the control group or the experimental group. The control group will receive the traditional study technique, while the experimental group will receive the new study technique.

At the end of the study, the researcher collects data on the students' grades and finds that the mean grade of the experimental group is significantly higher than the mean grade of the control group. The researcher concludes that the new study technique is more effective than the traditional technique.

Based on the experimental design described above, can the researcher generalize the results of the study to the larger population of college students? Explain your answer.

It's possible that the researcher could generalize the results of the study to the larger population of college students if the experimental design was well-controlled and the sample of 100 students was representative of the larger population.

One key factor to consider when determining whether the results of a study can be generalized to a larger population is the sampling method used. If the researcher used a random sampling method to recruit the students for the study, it is more likely that the sample of 100 students is representative of the larger population of college students. This would increase the validity of the study and allow the researcher to make more reliable conclusions about the effectiveness of the new study technique.

However, there are other factors that could affect the generalizability of the study's results. For example, if the experimental group and control group were not well-matched on important characteristics such as age, gender, or ability level, it could affect the results of the study. Additionally, if the study was conducted over a short period of time or in a limited location, it could limit the generalizability of the results.

Key Terms to Review ( 9 )

Cause and Effect

Experimental Units

Mean Height

Random Assignment

Random Selection

Sampling Error

Standard Deviation

Statistical Inference

Statistically Significant

Fiveable

Stay Connected

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Help | Advanced Search

Statistics > Machine Learning

Title: active statistical inference.

Abstract: Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful p-values. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Lesson 1: measures of central tendency, dispersion and association, overview section  .

A partial description of the joint distribution of the data is provided here. Three aspects of the data are of importance, the first two of which you should already be familiar with from univariate statistics. These are:

  • Central Tendency: What is a typical value for each variable?
  • Dispersion: How far apart are the individual observations from a central value for a given variable?
  • Association: This might (or might not!) be a new measure for you. When more than one variable are studied  together,  how does each variable relate to the remaining variables? How are the variables simultaneously related to one another? Are they positively or negatively related?

Statistics, as a subject matter, is the science and art of using sample information to make generalizations about populations.

Here are the Notations that will be used :

\(X_{ij}\) = Observation for variable j in subject i .

\(p\) = Number of variables

\(n\) = Number of subjects

In the example to come, we'll have data on 737 people (subjects) and 5 nutritional outcomes (variables). So,

\(p\) = 5 variables

\(n\) = 737 subjects

In multivariate statistics we will always be working with vectors of observations. So in this case we are going to arrange the data for the p variables on each subject into a vector. In the expression below, \(\textbf{X}_i\) is the vector of observations for the \(i^{th}\) subject, \(i\) = 1 to \(n\) (737). Therefore, the data for the \(j^{th}\) variable will be located in the \(j^{th}\) element of this subject's vector, \(j\) = 1 to \(p\) (5).

\[\mathbf{X}_i = \left(\begin{array}{l}X_{i1}\\X_{i2}\\ \vdots \\ X_{ip}\end{array}\right)\]

  • interpret measures of central tendancy, dispersion, and association;
  • calculate sample means, variances, covariances, and correlations using a hand calculator;
  • use software like SAS or Minitab to compute sample means, variances, covariances, and correlations.

Subscribe to the PwC Newsletter

Join the community, edit social preview.

statistical inferences assignment active

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row.

  • ACTIVE LEARNING

Remove a task

Add a method, remove a method, edit datasets, active statistical inference.

5 Mar 2024  ·  Tijana Zrnic , Emmanuel J. Candès · Edit social preview

Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful p-values. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.

Code Edit Add Remove Mark official

Tasks edit add remove, datasets edit, results from the paper edit add remove, methods edit add remove.

IMAGES

  1. Why It Matters: Linking Probability to Statistical Inference

    statistical inferences assignment active

  2. Making Statistical Inferences Using Measures of Central Tendency Worksheets

    statistical inferences assignment active

  3. PPT

    statistical inferences assignment active

  4. Introduction to Inferential Statistics and Hypothesis Testing

    statistical inferences assignment active

  5. PPT

    statistical inferences assignment active

  6. PPT

    statistical inferences assignment active

VIDEO

  1. BARC Assignment Statistical Mechanics @physicsgalaxy1537

  2. FIN534 Assignment 1b : Descriptive Statistical Measures & Hypothesis Testing

  3. DSC510

  4. Laws of Probability: The Central Limit Theorem in Plain English

  5. Inferences about a Population Proportion Using Simple Random Samples

  6. Assignment for Statistical Method for Accounting _Aqil Lukmanul Hakim (M73132)

COMMENTS

  1. Statistical Inferences Assignment and Quiz 100% Flashcards

    Study with Quizlet and memorize flashcards containing terms like Which of the following criteria are necessary conditions for making a statistical inference? sample size greater than 30, or an approximately normal data set sample size greater than 100 convenience sample simple random sample systematic sample, In a random sample of 520 adults, the mean was 150 and the standard deviation was 4.5 ...

  2. MATH 1281

    Statistical Inference (MATH 1281) 2 days ago. Identify two variables from your field of interest. Find the data associated with those two variables or make up some data. 1. Briefly explain the variables you have selected and the reason of your selection. 2. Explain the significance of using a scatter plot for your data.

  3. Active Statistics

    3.2 Prediction as a unifying theme in statistics and causal inference 3.3 Data collection and visualization 3.4 Review of mathematics and probability 3.5 Statistical inference 3.6 Simulation 3.7 Background on regression modeling 3.8 Linear regression with a single predictor 3.9 Least squares and fitting regression models

  4. PDF Active Statistics

    3.5 Statistical inference 76 3.6 Simulation 87 ... and causal inference, and for each week we have homework assignments, stories, activities, computer demonstrations, drills, and discussion questions. You can read these on your own as the topics come ... 978-1-009-43621-2 — Active Statistics Andrew Gelman, Aki Vehtari Frontmatter

  5. Statistical Inference For Everyone

    This is a new approach to an introductory statistical inference textbook, motivated by probability theory as logic. It is targeted to the typical Statistics 101 college student, and covers the topics typically covered in the first semester of such a course. It is freely available under the Creative Commons License, and includes a software library in Python for making some of the calculations ...

  6. Unit 4A: Introduction to Statistical Inference

    Video. Video: Unit 4A: Introduction to Statistical Inference (15:45) Recall again the Big Picture, the four-step process that encompasses statistics: data production, exploratory data analysis, probability and inference. We are about to start the fourth and final unit of this course, where we draw on principles learned in the other units ...

  7. Statistics 200: Introduction to Statistical Inference

    Homework assignments will include simple computing exercises asking you to perform small simulations, create histograms and plots, and analyze data. You may use any language (e.g. R, Python, Matlab) and will be graded only on your results, not on the quality of your code. ... Larry Wasserman, All of Statistics: A concise course in statistical ...

  8. Statistical Inference Course (Johns Hopkins)

    Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood ...

  9. Improving your statistical inferences

    There are 8 modules in this course. This course aims to help you to draw better statistical inferences from empirical research. First, we will discuss how to correctly interpret p-values, effect sizes, confidence intervals, Bayes Factors, and likelihood ratios, and how these statistics answer different questions you might be interested in.

  10. Choosing an appropriate inference procedure

    Lesson 1: Prepare for the exam. Significance test for a proportion free response example. Significance test for a proportion free response (part 2 with correction) Free response example: Significance test for a mean. Choosing an appropriate inference procedure. AP®︎/College Statistics.

  11. Improving Your Statistical Inferences

    This open educational resource integrates information from my blog, my MOOCs Improving Your Statistical Inferences and Improving Your Statistical Questions, and my scientific work. The goal is to make the information more accessible, and easier to update in the future. I have re-used and adapted (parts of) my own open access articles, without ...

  12. PDF STAT:5100 (22S:193) Statistical Inference I Homework Assignments

    Statistics STAT:5100 (22S:193), Fall 2015 Tierney Assignment 1 Due on Monday, August 31, 2015. 1. For each of the following experiments, describe a reasonable sample space: (a) Toss a coin four times. (b) Count the number of insect-damaged leaves on a plant. (c) Measure the lifetime (in hours) of a particular brand of light bulb.

  13. Statistical Inference Assignment

    Statistical Inference Assignment. The project consists of two parts: A simulation exercise. Basic inferential data analysis. You will create a report to answer the questions. Given the nature of the series, ideally you'll use knitr to create the reports and convert to a pdf. (I will post a very simple introduction to knitr).

  14. Improving Your Statistical Questions

    There are 6 modules in this course. This course aims to help you to ask better statistical questions when performing empirical research. We will discuss how to design informative studies, both when your predictions are correct, as when your predictions are wrong. We will question norms, and reflect on how we can improve research practices to ...

  15. Inferential Statistics

    Inferential Statistics | An Easy Introduction & Examples. Published on September 4, 2020 by Pritha Bhandari.Revised on June 22, 2023. While descriptive statistics summarize the characteristics of a data set, inferential statistics help you come to conclusions and make predictions based on your data. When you have collected data from a sample, you can use inferential statistics to understand ...

  16. Statistical Inference and Estimation

    A statistical model is a representation of a complex phenomena that generated the data. It has mathematical formulations that describe relationships between random variables and parameters. It makes assumptions about the random variables, and sometimes parameters. Residuals are a representation of a lack-of-fit, that is of the portion of the ...

  17. AP Statistics 2024

    Inferences for Experiments. Random assignment of treatments to experimental units is a key aspect of experimental research design. ... Statistical inference allows us to make conclusions about a population based on a sample, even if we do not have access to the entire population. This is an important tool in research, as it allows us to study ...

  18. PDF Introduction to Statistical Inference

    Three Modes of Statistical Inference. Descriptive Inference: summarizing and exploring data. Inferring "ideal points" from rollcall votes Inferring "topics" from texts and speeches Inferring "social networks" from surveys. Predictive Inference: forecasting out-of-sample data points. Inferring future state failures from past failures ...

  19. Inferential Statistics

    There are 5 modules in this course. This course covers commonly used statistical inference methods for numerical and categorical data. You will learn how to set up and perform hypothesis tests, interpret p-values, and report the results of your analysis in a way that is interpretable for clients or the public.

  20. [2403.03208] Active Statistical Inference

    Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data.

  21. Lesson 1: Measures of Central Tendency, Dispersion and Association

    Upon successful completion of this lesson, you should be able to: interpret measures of central tendancy, dispersion, and association; calculate sample means, variances, covariances, and correlations using a hand calculator; use software like SAS or Minitab to compute sample means, variances, covariances, and correlations.

  22. Intro to Statistical Inference Course I Stanford Online

    Statistical inference is the process of using data analysis to draw conclusions about a population or process beyond the existing data. Inferential statistical analysis infers properties of a population by testing hypotheses and deriving estimates. For example, you might survey a representation of people in a region and, using statistical ...

  23. Active Statistical Inference

    5 Mar 2024 · Tijana Zrnic , Emmanuel J. Candès ·. Edit social preview. Inspired by the concept of active learning, we propose active inference - a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning ...