
Statistics Made Easy

How to Use summary() Function in R (With Examples)
The summary() function in R can be used to quickly summarize the values in a vector, data frame, regression model, or ANOVA model in R.
This syntax uses the following basic syntax:
The following examples show how to use this function in practice.
Example 1: Using summary() with Vector
The following code shows how to use the summary() function to summarize the values in a vector:
The summary() function automatically calculates the following summary statistics for the vector:
- Min: The minimum value
- 1st Qu: The value of the 1st quartile (25th percentile)
- Median: The median value
- 3rd Qu: The value of the 3rd quartile (75th percentile)
- Max: The maximum value
Note that if there are any missing values (NA) in the vector, the summary() function will automatically exclude them when calculating the summary statistics:
Example 2: Using summary() with Data Frame
The following code shows how to use the summary() function to summarize every column in a data frame:
Example 3: Using summary() with Specific Data Frame Columns
The following code shows how to use the summary() function to summarize specific columns in a data frame:
Example 4: Using summary() with Regression Model
The following code shows how to use the summary() function to summarize the results of a linear regression model:
Related: How to Interpret Regression Output in R
Example 5: Using summary() with ANOVA Model
The following code shows how to use the summary() function to summarize the results of an ANOVA model in R:
Related: How to Interpret ANOVA Results in R
Additional Resources
The following tutorials offer more information on calculating summary statistics in R:
How to Calculate Five Number Summary in R The Easiest Way to Create Summary Tables in R How to Create Relative Frequency Tables in R
Published by Zach
Leave a reply cancel reply.
Your email address will not be published. Required fields are marked *
summary: Object Summaries
Description.
summary is a generic function used to produce result summaries of the results of various model fitting functions. The function invokes particular methods which depend on the class of the first argument.
# S3 method for default summary(object, …, digits, quantile.type = 7) # S3 method for data.frame summary(object, maxsum = 7, digits = max(3, getOption("digits")-3), …)
# S3 method for factor summary(object, maxsum = 100, …)
# S3 method for matrix summary(object, …)
# S3 method for summaryDefault format(x, digits = max(3L, getOption("digits") - 3L), …) # S3 method for summaryDefault print(x, digits = max(3L, getOption("digits") - 3L), …)
an object for which a summary is desired.
a result of the default method of summary() .
integer, indicating how many levels should be shown for factor s.
integer, used for number formatting with signif () (for summary.default ) or format () (for summary.data.frame ). In summary.default , if not specified (i.e., missing (.) ), signif() will not be called anymore (since R >= 3.4.0, where the default has been changed to only round in the print and format methods).
integer code used in quantile(*, type=quantile.type) for the default method.
additional arguments affecting the summary produced.
The form of the value returned by summary depends on the class of its argument. See the documentation of the particular methods for details of what is produced by that method.
The default method returns an object of class c("summaryDefault", " table ") which has specialized format and print methods. The factor method returns an integer vector.
The matrix and data frame methods return a matrix of class " table " , obtained by applying summary to each column and collating the results.
For factor s, the frequency of the first maxsum - 1 most frequent levels is shown, and the less frequent levels are summarized in "(Others)" (resulting in at most maxsum frequencies).
The functions summary.lm and summary.glm are examples of particular methods which summarize the results produced by lm and glm .
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S . Wadsworth & Brooks/Cole.
anova , summary.glm , summary.lm .
Run the code above in your browser using DataCamp Workspace
- Data Structure & Algorithm Classes (Live)
- System Design (Live)
- DevOps(Live)
- Explore More Live Courses
- Interview Preparation Course
- Data Science (Live)
- GATE CS & IT 2024
- Data Structure & Algorithm-Self Paced(C++/JAVA)
- Data Structures & Algorithms in Python
- Explore More Self-Paced Courses
- C++ Programming - Beginner to Advanced
- Java Programming - Beginner to Advanced
- C Programming - Beginner to Advanced
- Android App Development with Kotlin(Live)
- Full Stack Development with React & Node JS(Live)
- Java Backend Development(Live)
- React JS (Basic to Advanced)
- JavaScript Foundation
- Complete Data Science Program(Live)
- Mastering Data Analytics
- CBSE Class 12 Computer Science
- School Guide
- All Courses
- Linked List
- Binary Tree
- Binary Search Tree
- Advanced Data Structure
- All Data Structures
- Asymptotic Analysis
- Worst, Average and Best Cases
- Asymptotic Notations
- Little o and little omega notations
- Lower and Upper Bound Theory
- Analysis of Loops
- Solving Recurrences
- Amortized Analysis
- What does 'Space Complexity' mean ?
- Pseudo-polynomial Algorithms
- Polynomial Time Approximation Scheme
- A Time Complexity Question
- Searching Algorithms
- Sorting Algorithms
- Graph Algorithms
- Pattern Searching
- Geometric Algorithms
- Mathematical
- Bitwise Algorithms
- Randomized Algorithms
- Greedy Algorithms
- Dynamic Programming
- Divide and Conquer
- Backtracking
- Branch and Bound
- All Algorithms
- Company Preparation
- Practice Company Questions
- Interview Experiences
- Experienced Interviews
- Internship Interviews
- Competitive Programming
- Design Patterns
- System Design Tutorial
- Multiple Choice Quizzes
- Go Language
- Tailwind CSS
- Foundation CSS
- Materialize CSS
- Semantic UI
- Angular PrimeNG
- Angular ngx Bootstrap
- jQuery Mobile
- jQuery EasyUI
- React Bootstrap
- React Rebass
- React Desktop
- React Suite
- ReactJS Evergreen
- ReactJS Reactstrap
- BlueprintJS
- TensorFlow.js
- English Grammar
- School Programming
- Number System
- Trigonometry
- Probability
- Mensuration
- Class 8 Syllabus
- Class 9 Syllabus
- Class 10 Syllabus
- Class 8 Notes
- Class 9 Notes
- Class 10 Notes
- Class 11 Notes
- Class 12 Notes
- Class 8 Maths Solution
- Class 9 Maths Solution
- Class 10 Maths Solution
- Class 11 Maths Solution
- Class 12 Maths Solution
- Class 7 Notes
- History Class 7
- History Class 8
- History Class 9
- Geo. Class 7
- Geo. Class 8
- Geo. Class 9
- Civics Class 7
- Civics Class 8
- Business Studies (Class 11th)
- Microeconomics (Class 11th)
- Statistics for Economics (Class 11th)
- Business Studies (Class 12th)
- Accountancy (Class 12th)
- Macroeconomics (Class 12th)
- Machine Learning
- Data Science
- Mathematics
- Operating System
- Computer Networks
- Computer Organization and Architecture
- Theory of Computation
- Compiler Design
- Digital Logic
- Software Engineering
- GATE 2024 Live Course
- GATE Computer Science Notes
- Last Minute Notes
- GATE CS Solved Papers
- GATE CS Original Papers and Official Keys
- GATE CS 2023 Syllabus
- Important Topics for GATE CS
- GATE 2023 Important Dates
- Software Design Patterns
- HTML Cheat Sheet
- CSS Cheat Sheet
- Bootstrap Cheat Sheet
- JS Cheat Sheet
- jQuery Cheat Sheet
- Angular Cheat Sheet
- Facebook SDE Sheet
- Amazon SDE Sheet
- Apple SDE Sheet
- Netflix SDE Sheet
- Google SDE Sheet
- Wipro Coding Sheet
- Infosys Coding Sheet
- TCS Coding Sheet
- Cognizant Coding Sheet
- HCL Coding Sheet
- FAANG Coding Sheet
- Love Babbar Sheet
- Mass Recruiter Sheet
- Product-Based Coding Sheet
- Company-Wise Preparation Sheet
- Array Sheet
- String Sheet
- Graph Sheet
- ISRO CS Original Papers and Official Keys
- ISRO CS Solved Papers
- ISRO CS Syllabus for Scientist/Engineer Exam
- UGC NET CS Notes Paper II
- UGC NET CS Notes Paper III
- UGC NET CS Solved Papers
- Campus Ambassador Program
- School Ambassador Program
- Geek of the Month
- Campus Geek of the Month
- Placement Course
- Testimonials
- Student Chapter
- Geek on the Top
- Geography Notes
- History Notes
- Science & Tech. Notes
- Ethics Notes
- Polity Notes
- Economics Notes
- UPSC Previous Year Papers
- SSC CGL Syllabus
- General Studies
- Subjectwise Practice Papers
- Previous Year Papers
- SBI Clerk Syllabus
- General Awareness
- Quantitative Aptitude
- Reasoning Ability
- SBI Clerk Practice Papers
- SBI PO Syllabus
- SBI PO Practice Papers
- IBPS PO 2022 Syllabus
- English Notes
- Reasoning Notes
- Mock Question Papers
- IBPS Clerk Syllabus
- Apply for a Job
- Apply through Jobathon
- Hire through Jobathon
- All DSA Problems
- Problem of the Day
- GFG SDE Sheet
- Top 50 Array Problems
- Top 50 String Problems
- Top 50 Tree Problems
- Top 50 Graph Problems
- Top 50 DP Problems
- Solving For India-Hackthon
- GFG Weekly Coding Contest
- Job-A-Thon: Hiring Challenge
- BiWizard School Contest
- All Contests and Events
- Saved Videos
- What's New ?
- Data Structures
- Interview Preparation
- Topic-wise Practice
- Latest Blogs
- Write & Earn
- Web Development
Related Articles
- Write Articles
- Pick Topics to write
- Guidelines to Write
- Get Technical Writing Internship
- Write an Interview Experience
- Change column name of a given DataFrame in R
- Convert Factor to Numeric and Numeric to Factor in R Programming
- Adding elements in a vector in R programming - append() method
- Clear the Console and the Environment in R Studio
- Printing Output of an R Program
- Comments in R
- How to Replace specific values in column in R DataFrame ?
- Creating a Data Frame from Vectors in R Programming
- Filter data by multiple conditions in R using Dplyr
- How to change Row Names of DataFrame in R ?
- Loops in R (for, while, repeat)
- R Programming Language - Introduction
- Taking Input from User in R Programming
- Change Color of Bars in Barchart using ggplot2 in R
- Remove rows with NA in one column of R DataFrame
- Converting a List to Vector in R Language - unlist() Function
- Group by function in R using Dplyr
- How to Change Axis Scales in R Plots?
- Inverse of Matrix in R
- K-Means Clustering in R Programming
- Logistic Regression in R Programming
- How to Split Column Into Multiple Columns in R DataFrame?
- Skewness and Kurtosis in R Programming
- Calculate Time Difference between Dates in R Programming - difftime() Function
- How to filter R dataframe by multiple conditions?
- Cross-Validation in R programming
- Convert String from Uppercase to Lowercase in R programming - tolower() method
- Matrix Multiplication in R
- Reading Files in R Programming
- How to filter R DataFrame by values in a column?
Get Summary of Results produced by Functions in R Programming – summary() Function
- Last Updated : 23 Jun, 2020
summary() function in R Language is a generic function used to produce result summaries of the results of various model fitting functions.
Syntax: summary(object, maxsum) Parameters: object: R object maxsum: integer value which indicates how many levels should be shown for factors
Please Login to comment...
- R Object-Function
- R-Functions
Improve your Coding Skills with Practice
Start your coding journey now.
Related Tags
What is the summarize() method in R?
The summarize() function is used in the R program to summarize the data frame into just one value or vector. This summarization is done through grouping observations by using categorical values at first, using the groupby() function .
The dplyr package is used to get the summary of the dataset. The summarize() function offers the summary that is based on the action done on grouped or ungrouped data.
Summarize grouped data
The operations that can be performed on grouped data are average , factor , count , mean , etc.
In the example above, we use the summarize() function to obtain the mean weight of all the plant species in the PlantGrowth dataset.
Summarize ungrouped data
We can also summarize ungrouped data. This can be done by using three functions.
- summarize_all()
- summarize_at()
- summazrize_if()
1. summarize_all()
This function summarizes all the columns of data based on the action which is to be performed.
- action : The function to apply on data frame columns. It can be either lambda or use funs() .
In the code snippet below, we load the mtcars Motor Trend US magazine dataset dataset in the data variable. In the variable sample , we are loading the top six observations to process. The sample %>% summarize_all(mean) will show the mean of the six observations in the result.
2. summarize_at()
It performs the action on the specific column and generates the summary based on that action.
- vector_of_columns : The list of column names or character vector of column names.
In the code snippet below, we load the mtcars Motor Trend US magazine dataset dataset in the data variable. In the variable sample , we are loading the top six observations to process. The sample %>% group_by(hp) %>% summarize_at(c('cyl','mpg'),mean) will show the mean of the 'cyl' and 'mpg' observations in the result, grouping with hp (dataset feature/column name).
3. summarize_if()
In this function, we specify a condition and the summary will be generated if the condition is satisfied.
- predicate : A predicate function to apply to logical values or data frame columns.
A predicate function in R returns only True/False.
In the code snippet below, we use the predicate function is.numeric and mean as an action.
RELATED TAGS
CONTRIBUTOR
View all Courses
Learn in-demand tech skills in half the time
For Enterprise
For Individuals
For HR & Recruiting
For Bootcamps
Educative Learning
Educative Onboarding
Educative Skill Assessments
Educative Projects
Privacy Policy
Terms of Service
Business Terms of Service
Become an Author
Become an Affiliate
Become a Contributor
Educative Blog
Educative Sessions
Educative Answers
Frequently Asked Questions
GitHub Students Scholarship
Course Catalog
Early Access Courses
Earn Referral Credits
CodingInterview.com
Copyright © 2023 Educative, Inc. All rights reserved.

How to Compute Summary Statistics by Group in R (3 Examples)
This page shows how to calculate descriptive statistics by group in R .
The article contains the following topics:
If you want to know more about these topics, keep reading!
Construction of Example Data
First, we’ll need to create some exemplifying data:
set.seed(549298) # Create example data data <- data.frame(x = rnorm(500, 1, 3), group = LETTERS[1:5]) head(data) # Print head of example data # x group # 1 0.38324291 A # 2 -0.06604541 B # 3 -1.98454741 C # 4 3.44815045 D # 5 4.11107771 E # 6 4.07278357 A
Have a look at the previous output of the RStudio console. It shows that our exemplifying data has two columns. The variable x contains randomly distributed numeric values and the variable group contains five different grouping labels.
We could return descriptive statistics of our numeric data column x using the summary function as shown below:
summary(data$x) # Summary of entire data # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.765 -1.045 1.115 1.117 3.151 10.216
However, this would only return the summary statistics of the whole data. In the following examples I’ll therefore show different ways how to get summary statistics for each group of our data.
Keep on reading!
Example 1: Descriptive Summary Statistics by Group Using tapply Function
In this example, I’ll show how to use the basic installation of the R programming language to return descriptive summary statistics by group. More precisely, I’m using the tapply function :
tapply(data$x, data$group, summary) # Summary by group using tapply # $A # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.236 -1.161 1.530 1.339 3.834 8.747 # # $B # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.148 -1.002 0.944 1.037 3.004 10.216 # # $C # Min. 1st Qu. Median Mean 3rd Qu. Max. # -6.636 -1.282 1.340 1.030 2.956 8.667 # # $D # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.7652 -1.2207 0.7849 0.7280 2.3334 8.3459 # # $E # Min. 1st Qu. Median Mean 3rd Qu. Max. # -5.4817 -0.3648 1.5931 1.4498 3.3325 7.6403
The output of the previous R syntax is a list containing one list element for each group. Each of these list elements contains basic summary statistics for the corresponding group.
Example 2: Descriptive Summary Statistics by Group Using dplyr Package
Another alternative for the computation of descriptive summary statistics is provided by the dplyr package .
First, we have to install and load the dplyr package:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package
Now, we can apply the group_by and summarize functions to calculate summary statistics by group:
data %>% # Summary by group using dplyr group_by(group) %>% summarize(min = min(x), q1 = quantile(x, 0.25), median = median(x), mean = mean(x), q3 = quantile(x, 0.75), max = max(x)) # # A tibble: 5 x 7 # group min q1 median mean q3 max # <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 A -7.24 -1.16 1.53 1.34 3.83 8.75 # 2 B -7.15 -1.00 0.944 1.04 3.00 10.2 # 3 C -6.64 -1.28 1.34 1.03 2.96 8.67 # 4 D -7.77 -1.22 0.785 0.728 2.33 8.35 # 5 E -5.48 -0.365 1.59 1.45 3.33 7.64
The output of the previous R code is a tibble that contains basically the same values as the list created in Example 1. Whether you prefer to use the basic installation or the dplyr package is a matter of taste.
Example 3: Descriptive Summary Statistics by Group Using purrr Package
In Example 3, I’ll illustrate another alternative for the calculation of summary statistics by group in R.
This example relies on the functions of the purrr package (another add-on package provided by the tidyverse ).
We first have to install and load the purrr package:
install.packages("purrr") # Install & load purrr library("purrr")
Now, we can use the following R code to produce another kind of output showing descriptive stats by group:
data %>% # Summary by group using purrr split(.$group) %>% map(summary) # $A # x group # Min. :-7.236 A:100 # 1st Qu.:-1.161 B: 0 # Median : 1.530 C: 0 # Mean : 1.339 D: 0 # 3rd Qu.: 3.834 E: 0 # Max. : 8.747 # # $B # x group # Min. :-7.148 A: 0 # 1st Qu.:-1.002 B:100 # Median : 0.944 C: 0 # Mean : 1.037 D: 0 # 3rd Qu.: 3.004 E: 0 # Max. :10.216 # # $C # x group # Min. :-6.636 A: 0 # 1st Qu.:-1.282 B: 0 # Median : 1.340 C:100 # Mean : 1.030 D: 0 # 3rd Qu.: 2.956 E: 0 # Max. : 8.667 # # $D # x group # Min. :-7.7652 A: 0 # 1st Qu.:-1.2207 B: 0 # Median : 0.7849 C: 0 # Mean : 0.7280 D:100 # 3rd Qu.: 2.3334 E: 0 # Max. : 8.3459 # # $E # x group # Min. :-5.4817 A: 0 # 1st Qu.:-0.3648 B: 0 # Median : 1.5931 C: 0 # Mean : 1.4498 D: 0 # 3rd Qu.: 3.3325 E:100 # Max. : 7.6403
Again, the values are basically the same.
Video, Further Resources & Summary
Have a look at the following video of my YouTube channel. I’m explaining the topics of this article in the video:
Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube privacy policy
If you accept this notice, your choice will be saved and the page will refresh.

In addition, I can recommend having a look at the other tutorials on this homepage. A selection of articles can be found below.
- R Programming Tutorials
In this article, I showed how to get a summary statistics table for each group of a data frame in the R programming language. Don’t hesitate to let me know in the comments section, if you have further questions and/or comments.
Subscribe to the Statistics Globe Newsletter
Get regular updates on the latest tutorials, offers & news at Statistics Globe. I hate spam & you may opt out anytime: Privacy Policy .

4 Comments . Leave new
Thanks for the tutorial! Just a small note: in the summary by group using dplyr, the function should be ‘summarise’ (with S) instead of ‘summarize’ (with Z).
Hey Giuliana,
Thank you for the kind comment! summarise and summarize are treated the same, though. Have a look here for more details.
Regards, Joachim
thanks again
You are very welcome Andre! 🙂
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Post Comment

I’m Joachim Schork. On this website, I provide statistics tutorials as well as code in Python and R programming.
Statistics Globe Newsletter
Get regular updates on the latest tutorials, offers & news at Statistics Globe. I hate spam & you may opt out anytime: Privacy Policy .
Related Tutorials

R dplyr group_by & summarize Functions don’t Work Properly (Example)

summary Function in R (3 Examples)

In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation.
Object Summaries
Description.
summary is a generic function used to produce result summaries of the results of various model fitting functions. The function invokes particular methods which depend on the class of the first argument.
For factor s, the frequency of the first maxsum - 1 most frequent levels is shown, and the less frequent levels are summarized in "(Others)" (resulting in at most maxsum frequencies).
The functions summary.lm and summary.glm are examples of particular methods which summarize the results produced by lm and glm .
The form of the value returned by summary depends on the class of its argument. See the documentation of the particular methods for details of what is produced by that method.
The default method returns an object of class c("summaryDefault", " table ") which has specialized format and print methods. The factor method returns an integer vector.
The matrix and data frame methods return a matrix of class " table " , obtained by applying summary to each column and collating the results.
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S . Wadsworth & Brooks/Cole.
anova , summary.glm , summary.lm .
What is summary() Function in R
The summarize() is a built-in R function for data summarization. It allows developers to reduce a data frame into a single value or vector by grouping observations according to categorical variables using the groupby() function.
The summary() function returns the following statistics.
- Minimum value
- The first quartile (25th percentile)
- Median (50th percentile)
- Third quartile (75th percentile)
- Maximum value
object: It is an object for which a summary is desired.
maxsum: An integer indicates how many levels should be shown for factors.
digits: An integer used for number formatting with signif().
Return Value
The summary() function returns the value that depends on the class of its argument.
Example 1: Simple use of summary() function
Let’s apply the summary() function to a vector that will act like the R object.
As you can see from the output that the summary() of a vector returns descriptive statistics such as the minimum , the 1st quantile , the median , the mean , the 3rd quantile , and the maximum value of our input data.
Example 2: How to get the summary() of list in R
To get the summary of the list in R , you can use the summary() function. To define a list, use the list() function and pass the elements as arguments.
Example 3: How to get a summary of an array in R
To get the summary of an array in R, use the summary() function. To create an array in R , use the array() function. The array() function takes a vector as an argument and uses the dim parameter to create an array.
Example 4: How to get a summary() of the matrix in R
To get the summary of a matrix in R, use the summary() function. To create a matrix in R , use the matrix() function, and pass the vector , nrow , and ncol parameters.
Example 5: How to get a summary of a data frame in R
To get the summary of a data frame in R, you can use the summary() function. To create a data frame in R , use data.frame() function.
Example 6: Applying a summary() function on Linear Regression Model
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered an explanatory variable, and the other is a dependent variable.
A widespread application of the summary functions is the computation of summary statistics of statistical models. For example, let’s see the following code.
Our example data consists of two randomly distributed numeric vectors. As a result, we can estimate a linear regression model.
The data object mod contains the output of our linear regression. We applied the summary() function to this model object to print summary statistics for this model.
That’s it for the summary() function in R.

Krunal Lathiya is a Software Engineer with over eight years of experience. He has developed a strong foundation in computer science principles and a passion for problem-solving. In addition, Krunal has excellent knowledge of Data Science and Machine Learning, and he is an expert in R Language.
Leave a Comment Cancel reply
Save my name, email, and website in this browser for the next time I comment.
Privacy Overview

R news and tutorials contributed by hundreds of R bloggers
R tutorial series: summary and descriptive statistics.
Posted on November 1, 2009 by John M. Quick in R bloggers | 0 Comments
[social4i size="small" align="align-left"] --> [This article was first published on R Tutorial Series , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here ) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Summary (or descriptive) statistics are the first figures used to represent nearly every dataset. They also form the foundation for much more complicated computations and analyses. Thus, in spite of being composed of simple methods, they are essential to the analysis process. This tutorial will explore the ways in which R can be used to calculate summary statistics, including the mean, standard deviation, range, and percentiles. Also introduced is the summary function, which is one of the most useful tools in the R set of commands.
Tutorial Files
Before we start, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains hypothetical age and income data for 20 subjects. Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.
In R, a mean can be calculated on an isolated variable via the mean(VAR) command, where VAR is the name of the variable whose mean you wish to compute. Alternatively, a mean can be calculated for each of the variables in a dataset by using the mean(DATAVAR) command, where DATAVAR is the name of the variable containing the data. The code sample below demonstrates both uses of the mean function.
> #calculate the mean of a variable with mean(VAR) > #what is the mean Age in the sample? > mean(Age) [1] 32.3 > #calculate the mean of all variables in a dataset with mean(DATAVAR) > #what is the mean of each variable in the dataset? > mean(dataset) Age…… Income 32.3….. 34000.0
Standard Deviation
Within R, standard deviations are calculated in the same way as means. The standard deviation of a single variable can be computed with the sd(VAR) command, where VAR is the name of the variable whose standard deviation you wish to retrieve. Similarly, a standard deviation can be calculated for each of the variables in a dataset by using the sd(DATAVAR) command, where DATAVAR is the name of the variable containing the data. The code sample below demonstrates both uses of the standard deviation function.
> #calculate the standard deviation of a variable with sd(VAR) > #what is the standard deviation of Age in the sample? > sd(Age) [1] 19.45602 > #calculate the standard deviation of all variables in a dataset with sd(DATAVAR) > #what is the standard deviation of each variable in the dataset? > sd(dataset) Age………….. Income 19.45602…. 32306.10175
Minimum and Maximum
Keeping with the pattern, a minimum can be computed on a single variable using the min(VAR) command. The maximum, via max(VAR), operates identically. However, in contrast to the mean and standard deviation functions, min(DATAVAR) or max(DATAVAR) will retrieve the minimum or maximum value from the entire dataset, not from each individual variable . Therefore, it is recommended that minimums and maximums be calculated on individual variables, rather than entire datasets, in order to produce more useful information. The sample code below demonstrates the use of the min and max functions.
> #calculate the min of a variable with min(VAR) > #what is the minimum age found in the sample? > min(Age) [1] 5 > #calculate the max of a variable with max(VAR) > #what is the maximum age found in the sample? > max(Age) [1] 70
The range of a particular variable, that is, its maximum and minimum, can be retrieved using the range(VAR) command. As with the min and max functions, using range(DATAVAR) is not very useful, since it considers the entire dataset, rather than each individual variable. Consequently, it is recommended that ranges also be computed on individual variables. This operation is demonstrated in the following code sample.
> #calculate the range of a variable with range(VAR) > #what range of age values are found in the sample? > range(Age) [1] 5….70
Percentiles
Values from percentiles (quantiles).
Given a dataset and a desired percentile, a corresponding value can be found using the quantile(VAR, c(PROB1, PROB2,…)) command. Here, VAR refers to the variable name and PROB1, PROB2, etc., relate to probability values. The probabilities must be between 0 and 1, therefore making them equivalent to decimal versions of the desired percentiles (i.e. 50% = 0.5). The following example shows how this function can be used to find the data value that corresponds to a desired percentile.
> #calculate desired percentile values using quantile(VAR, c(PROB1, PROB2,…)) > #what are the 25th and 75th percentiles for age in the sample? > quantile(Age, c(0.25, 0.75)) 25%……. 75% 17.75….. 44.25
Note that quantile(VAR) command can also be used. When probabilities are not specified, the function will default to computing the 0, 25, 50, 75, and 100 percentile values, as shown in the following example.
> #calculate the default percentile values using quantile(VAR) > #what are the 0, 25, 50, 75, and 100 percentiles for age in the sample? > quantile(Age) 0%…… 25%…… 50%…… 75%…… 100% 5.00… 17.75…… 30.00… 44.25….. 70.00
Percentiles from Values (Percentile Rank)
In the opposite situation, where a percentile rank corresponding to a given value is needed, one has to devise a custom method. To begin, consider the steps involved in calculating a percentile rank.
- count the number of data points that are at or below the given value
- divide by the total number of data points
- multiply by 100
From the preceding steps, the formula for calculating a percentile rank can be derived: percentile rank = length(VAR[VAR , and = operators, supposing that the function were to be applied to different scenarios. The second, length(VAR), counts the total number of data points in the variable. Together, they accomplish steps one and two of the percentile rank computation process. The final step is to multiply the result of the division by 100 to transform the decimal value into a percentage. A sample percentile rank calculation is demonstrated below.
> #calculate the percentile rank for a given value using the custom formula: length(VAR[VAR <> > #in the sample, an age of 45 is at what percentile rank? > length(Age[Age [1] 75
A very useful multipurpose function in R is summary(X), where X can be one of any number of objects, including datasets, variables, and linear models, just to name a few. When used, the command provides summary data related to the individual object that was fed into it. Thus, the summary function has different outputs depending on what kind of object it takes as an argument. Besides being widely applicable, this method is valuable because it often provides exactly what is needed in terms of summary statistics. A couple examples of how summary(X) can be used are displayed in the following code sample. I encourage you to use the summary command often when exploring ways to analyze your data in R. This function will be revisited throughout the R Tutorial Series.
> #summarize a variable with summary(VAR) > summary(Age)
The output of the preceding summary is pictured below.

> #summarize a dataset with summary(DATAVAR) > summary(dataset)

Complete Summary Statistics Analysis
To see a complete example of how summary statistics can be used to analyze data in R, please download the summary statistics analysis example (.txt) file.
Up Next: Zero-Order Correlations
Thank you for participating in the Summary and Descriptive Statistics tutorial. I hope that it has been useful to your work with R and statistics. Please let me know of any feedback, questions, or requests that you have in the comments section of this article. Our next guide will be on the topic of Zero-Order Correlations.
To leave a comment for the author, please follow the link and comment on their blog: R Tutorial Series . R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job . Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
- descriptive statistics
- R Tutorial Series
- summary statistics
Copyright © 2022 | MH Corporate basic by MH Themes
Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

IMAGES
VIDEO
COMMENTS
The summary() function in R can be used to quickly summarize the values in a vector, data frame, regression model, or ANOVA model in R.
summary is a generic function used to produce result summaries of the results of various model fitting functions. The function invokes particular methods which
summary() function in R Language is a generic function used to produce result summaries of the results of various model fitting functions.
The summarize() function is used in the R program to summarize the data frame into just one value or vector. This summarization is done through grouping
Definition: The summary R function computes summary statistics of data and model objects. Basic R Syntax: Please find the basic R programming syntax of the
Example 1: Descriptive Summary Statistics by Group Using tapply Function ... The output of the previous R syntax is a list containing one list element for each
Summarize Function in R Programming ... As its name implies, the summarize function reduces a data frame to a summary of just one vector or value.
summary is a generic function used to produce result summaries of the results of various model fitting functions. The function invokes particular methods which
What is summary() Function in R ... The summarize() is a built-in R function for data summarization. It allows developers to reduce a data frame
A very useful multipurpose function in R is summary(X), where X can be one of any number of objects, including datasets, variables, and linear