Logo for TRU Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

2.3 Descriptive and Inferential Statistics

Learning objectives.

  • Describe descriptive statistics and know how to produce them.
  • Describe inferential statistics and why they are used.

Descriptive statistics

In the previous section, we looked at some of the research designs psychologists use. In this section, we will provide an overview of some of the statistical approaches researchers take to understanding the results that are obtained in research. Descriptive statistics are the first step in understanding how to interpret the data you have collected. They are called descriptive because they organize and summarize some important properties of the data set. Keep in mind that researchers are often collecting data from hundreds of participants; descriptive statistics allow them to make some basic interpretations about the results without having to eyeball each result individually.

Let’s work through a hypothetical example to show how descriptive statistics help researchers to understand their data. Let’s assume that we have asked 40 people to report how many hours of moderate-to-vigorous physical activity they get each week. Let’s begin by constructing a frequency distribution of our hypothetical data that will show quickly and graphically what scores we have obtained.

We can now construct a histogram that will show the same thing on a graph (see Figure 2.5 ). Note how easy it is to see the shape of the frequency distribution of scores.

Many variables that psychologists are interested in have distributions where most of the scores are located near the centre of the distribution, the distribution is symmetrical, and it is bell-shaped (see Figure 2.6 ). A data distribution that is shaped like a bell is known as a normal distribution . Normal distributions are common in human traits because of the tendency for variability; traits like intelligence, wealth, shoe size, and so on, are distributed such that relatively few people are either extremely high or low scorers, and most people fall somewhere near the middle.

A distribution can be described in terms of its central tendency — that is, the point in the distribution around which the data are centred — and its dispersion or spread . The arithmetic average, or arithmetic mean , symbolized by the letter M , is the most commonly used measure of central tendency. It is computed by calculating the sum of all the scores of the variable and dividing this sum by the number of participants in the distribution, denoted by the letter N . In the data presented in Figure 2.6, the mean height of the students is 67.12 inches (170.48 cm). The sample mean is usually indicated by the letter M .

In some cases, however, the data distribution is not symmetrical. This occurs when there are one or more extreme scores, known as outliers , at one end of the distribution. Consider, for instance, the variable of family income (see Figure 2.7 ), which includes an outlier at a value of $3,800,000. In this case, the mean is not a good measure of central tendency. Although it appears from Figure 2.7 that the central tendency of the family income variable should be around $70,000, the mean family income is actually $223,960. The single very extreme income has a disproportionate impact on the mean, resulting in a value that does not well represent the central tendency.

The median is used as an alternative measure of central tendency when distributions are not symmetrical. The median is the score in the centre of the distribution, meaning that 50% of the scores are greater than the median and 50% of the scores are less than the median. In our case, the median household income of $73,000 is a much better indication of central tendency than is the mean household income of $223,960.

A final measure of central tendency, known as the mode , represents the value that occurs most frequently in the distribution. You can see from Figure 2.7 that the mode for the family income variable is $93,000; it occurs four times.

In addition to summarizing the central tendency of a distribution, descriptive statistics convey information about how the scores of the variable are spread around the central tendency. Dispersion refers to the extent to which the scores are all tightly clustered around the central tendency (see Figure 2.8 ). Here, there are many scores close to the middle of the distribution.

In other instances, they may be more spread out away from it (see Figure 2.9 ). Here, the scores are further away from the middle of the distribution.

One simple measure of dispersion is to find the largest (i.e., the maximum) and the smallest (i.e., the minimum) observed values of the variable and to compute the range of the variable as the maximum observed score minus the minimum observed score. You can check that the range of the height variable shown in Figure 2.6 above is 72 – 62 = 10.

The standard deviation , symbolized as s , is the most commonly used measure of variability around the mean. Distributions with a larger standard deviation have more spread. Those with small deviations have scores that do not stray very far from the average score. Thus, standard deviation is a good measure of the average deviation from the mean in a set of scores. In the examples above, the standard deviation of height is s = 2.74, and the standard deviation of family income is s = $745,337. These standard deviations would be more informative if we had others to compare them to. For example, suppose we obtained a different sample of adult heights and compared it to those shown in Figure 2.6 above. If the standard deviation was very different, that would tell us something important about the variability in the second sample as compared to the first. A more relatable example might be student grades: a professor could keep track of student grades over many semesters. If the standard deviations were relatively similar from semester to semester, this would indicate that the amount of variability in student performance is fairly constant. If the standard deviation suddenly went up, that would indicate that there are more students with very low scores, very high scores, or both. It’s useful to see how standard deviation is calculated: a good demonstration can be found at Khan Academy .

The standard deviation in the normal distribution has some interesting properties (see Figure 2.10 ). Approximately 68% of the data fall within 1 standard deviation above or below the mean score: 34% fall above the mean, and 34% fall below. In other words, about 2/3 of the population are within 1 standard deviation of the mean. Therefore, if some variable is normally distributed (e.g., height, IQ, etc.), you can quickly work out where approximately 2/3 of the population fall by knowing the mean and standard deviation.

Inferential statistics

We have seen that descriptive statistics are useful in providing an initial way to describe, summarize, and interpret a set of data. They are limited in usefulness because they tell us nothing about how meaningful the data are. The second step in analyzing data requires inferential statistics . Inferential statistics provide researchers with the tools to make inferences about the meaning of the results. Specifically, they allow researchers to generalize from the sample they used in their research to the greater population, which the sample represents. Keep in mind that psychologists, like other scientists, rely on relatively small samples to try to understand populations.

This is not a textbook about statistics, so we will limit the discussion of inferential statistics. However, all students of psychology should become familiar with one very important inferential statistic: the significance test. In the simplest, non-mathematical terms, the significance test is the researcher’s estimate of how likely it is that their results were simply the result of chance. Significance testing is not the same thing as estimating how meaningful or large the results are. For example, you might find a very small difference between two experimental conditions that is statistically significant.

Typically, most researchers use the convention that if significance testing shows that a result has a less than 5% probability of being due to chance alone, the result is considered to be real and to generalize to the population. If the significance test shows that the probability of chance causing the outcome is greater than 5%, it is considered to be a non-significant result and, consequently, of little value; non-significant results are more likely to be chance findings and, therefore, should not be generalized to the population. Significance tests are reported as p values , for example, p< .05 means the probability of being caused by chance is less than 5%. P values are reported by all statistical programs so students no longer need to calculate them by hand. Most often, p values are used to determine whether or not effects detected in the research are present. So, if p< .05, then we can conclude that an effect is present, and the difference between the two groups is real.

Thus, p values provide information about the presence of an effect. However, for information about how meaningful or large an effect is, significance tests are of little value. For that, we need some measure of effect size. Effect size is a measure of magnitude; for example, if there is a difference between two experimental groups, how large is the difference? There are a few different statistics for calculating effect sizes.

In summary, statistics are an important tool in helping researchers understand the data that they have collected. Once the statistics have been calculated, the researchers interpret their results. Thus, while statistics are heavily used in the analysis of data, the interpretation of the results requires a researcher’s knowledge, analysis, and expertise.

Key Takeaways

  • Descriptive statistics organize and summarize some important properties of the data set. Frequency distributions and histograms are effective tools for visualizing the data set. Measures of central tendency and dispersion are descriptive statistics.
  • Many human characteristics are normally distributed.
  • Measures of central tendency describe the central point around which the scores are distributed. There are three different measures of central tendency.
  • The range and standard deviation show the dispersion of scores as well as the shape of the distribution of the scores. The standard deviation of the normal distribution has some special properties.
  • Inferential statistics provide researchers with the tools to make inferences about the meaning of the results, specifically about generalizing from the sample they used in their research to the greater population, which the sample represents.
  • Significance tests are commonly used to assess the probability that observed results were due to chance. Effect sizes are commonly used to estimate how large an effect has been obtained.

Exercises and Critical Thinking

  • Keep track of something you do over a week, such as your daily amount of exercise, sleep, cups of coffee, or social media time. Record your scores for each day. At the end of the week, construct a frequency distribution of your results, and draw a histogram that represents them. Calculate all three measures of central tendency, and decide which one best represents your data and why. Invite a friend or family member to participate, and do the same for their data. Compare your data sets. Whose shows the greatest dispersion around the mean, and how do you know?
  • The data for one person cannot generalize to the population. Consider why people might have different scores than yours.

Image Attribution

Figure 2.5. Used under a CC BY-NC-SA 4.0 license.

Figure 2.6. Used under a CC BY-NC-SA 4.0 license.

Figure 2.7. Used under a CC BY-NC-SA 4.0 license.

Figure 2.8. Used under a CC BY-NC-SA 4.0 license.

Figure 2.9. Used under a CC BY-NC-SA 4.0 license.

Figure 2.10. Empirical Rule by Dan Kernler is used under a CC BY-SA 4.0 license.

Long Descriptions

Figure 2.7. Of the 25 families, 24 families have an income between $44,000 and $111,000, and only one family has an income of $3,800,000. The mean income is $223,960, while the median income is $73,000.

[Return to Figure 2.7]

Psychology - 1st Canadian Edition Copyright © 2020 by Sally Walters is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Descriptive Statistics | Definitions, Types, Examples

Published on July 9, 2020 by Pritha Bhandari . Revised on June 21, 2023.

Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population.

In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

The next step is inferential statistics , which help you decide whether your data confirms or refutes your hypothesis and whether it is generalizable to a larger population.

Table of contents

Types of descriptive statistics, frequency distribution, measures of central tendency, measures of variability, univariate descriptive statistics, bivariate descriptive statistics, other interesting articles, frequently asked questions about descriptive statistics.

There are 3 main types of descriptive statistics:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability or dispersion concerns how spread out the values are.

Types of descriptive statistics

You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and multivariate analysis.

  • Go to a library
  • Watch a movie at a theater
  • Visit a national park

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

descriptive statistics in psychology research

A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarize the frequency of every possible value of a variable in numbers or percentages. This is called a frequency distribution .

  • Simple frequency distribution table
  • Grouped frequency distribution table

From this table, you can see that more women than men or people with another gender identity took part in the study. In a grouped frequency distribution, you can group numerical response values and add up the number of responses for each group. You can also convert each of these numbers to percentages.

Measures of central tendency estimate the center, or average, of a data set. The mean, median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using the first 6 responses of our survey.

The mean , or M , is the most commonly used method for finding the average.

To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N .

The median is the value that’s exactly in the middle of a data set.

To find the median, order each response value from the smallest to the biggest. Then , the median is the number in the middle. If there are two numbers in the middle, find their mean.

The mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode.

To find the mode, order your data set from lowest to highest and find the response that occurs most frequently.

Measures of variability give you a sense of how spread out the response values are. The range, standard deviation and variance each reflect different aspects of spread.

The range gives you an idea of how far apart the most extreme response scores are. To find the range , simply subtract the lowest value from the highest value.

Standard deviation

The standard deviation ( s or SD ) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:

  • List each score and find their mean.
  • Subtract the mean from each score to get the deviation from the mean.
  • Square each of these deviations.
  • Add up all of the squared deviations.
  • Divide the sum of the squared deviations by N – 1.
  • Find the square root of the number you found.

Step 5: 421.5/5 = 84.3

Step 6: √84.3 = 9.18

The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.

To find the variance, simply square the standard deviation. The symbol for variance is s 2 .

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread. Programs like SPSS and Excel can be used to easily calculate these.

If you were to only consider the mean as a measure of central tendency, your impression of the “middle” of the data set can be skewed by outliers, unlike the median or mode.

Likewise, while the range is sensitive to outliers , you should also consider the standard deviation and variance to get easily comparable measures of spread.

If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests .

Multivariate analysis is the same as bivariate analysis but with more than two variables.

Contingency table

In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read “across” the table to see how the independent and dependent variables relate to each other.

Interpreting a contingency table is easier when the raw data is converted to percentages. Percentages make each row comparable to the other by making it seem as if each group had only 100 observations or participants. When creating a percentage-based contingency table, you add the N for each independent variable on the end.

From this table, it is more clear that similar proportions of children and adults go to the library over 17 times a year. Additionally, children most commonly went to the library between 5 and 8 times, while for adults, this number was between 13 and 16.

Scatter plots

A scatter plot is a chart that shows you the relationship between two or three variables . It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

From your scatter plot, you see that as the number of movies seen at movie theaters increases, the number of visits to the library decreases. Based on your visual assessment of a possible linear relationship, you perform further tests of correlation and regression.

Descriptive statistics: Scatter plot

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Statistical power
  • Pearson correlation
  • Degrees of freedom
  • Statistical significance

Methodology

  • Cluster sampling
  • Stratified sampling
  • Focus group
  • Systematic review
  • Ethnography
  • Double-Barreled Question

Research bias

  • Implicit bias
  • Publication bias
  • Cognitive bias
  • Placebo effect
  • Pygmalion effect
  • Hindsight bias
  • Overconfidence bias

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.
  • Univariate statistics summarize only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 21). Descriptive Statistics | Definitions, Types, Examples. Scribbr. Retrieved March 13, 2024, from https://www.scribbr.com/statistics/descriptive-statistics/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, central tendency | understanding the mean, median & mode, variability | calculating range, iqr, variance, standard deviation, inferential statistics | an easy introduction & examples, what is your plagiarism score.

Logo for University of Iowa Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Unit 3. Descriptive Statistics for Psychological Research

J Toby Mordkoff and Leyre Castro

Summary. This unit briefly reviews the distinction between descriptive and inferential statistics and then discusses the ways in which both numerical and categorical data are usually summarized for psychological research.  Different measures of center and spread, and when to use them, are explained.  The shape of the data is also discussed.

Prerequisite Units

Unit 1. Introduction to Statistics for Psychological Science

Unit 2. Managing Data

Introduction

Assume that you are interested in some attribute or characteristic of a very large number of people, such as the average hours of sleep per night for all undergraduates at all universities.  Clearly, you are not going to do this by measuring the hours of sleep for every student, as that would be difficult to impossible.  So, instead, you will probably take a relatively small sample of students (e.g., 100 people), ask each of them how many hours of sleep they usually get, and then use these data to estimate the average for all undergraduates.

The process outlined above can be thought of as having three phases or steps: (1) collect a sample, (2) summarize the data in the sample, and (3) use the summarized data to make the estimate of the entire population.  The issues related to collecting the sample, such as how one ensures that the sample is representative of the entire population will not be discussed here.  Likewise, the way that one uses the summary of a sample to calculate an estimate of the population will not be explained here.  This unit will focus on the second step: the way in which psychologists summarize data.

The general label for procedures that summarize data is descriptive statistics .  This can be contrasted with procedures that make estimates of population values, which are known as inferential statistics .  Thus, descriptive and inferential statistics each give different insights into the nature of the data gathered.  Descriptive statistics describe the data so that the big picture can be seen.  How?  By organizing and summarizing properties of a data set.  Calculating descriptive statistics takes unordered observations and logically organizes them in some way.  This allow us to describe the data obtained, but it does not make conclusions beyond the sample.  This is important, because part of conducting (good) research is being able to communicate your findings to other people, and descriptive statistics will allow you to do this quickly, clearly, and precisely.

To prepare you for what follows, please note two things in advance.  First, there are several different ways that we can summarize a large set of data.  Most of all: we can use numbers or we can use graphical representations.  Furthermore, when the data are numerical, we will have options for several of the summary values that we need to calculate.  This may seem confusing at first; hopefully, it soon will make sense.  Second, but related to the first, the available options for summarizing data often depend on the type of data that we have collected.  For example, numerical data, such as hours of sleep per night, are summarized differently from categorical data, such as favorite flavors of ice-cream.

The key to preventing this from becoming confusing is to keep the function of descriptive statistics in mind: we are trying to summarize a large amount of data in a way that can be communicated quickly, clearly, and precisely.  In some cases, a few numbers will do the trick; in other cases, you will need to create a plot of the data.

This unit will only discuss the ways in which a single set of values are summarized.  When you collect more than one piece of information from every participant in the sample –e.g., you not only ask them how many hours of sleep they usually get, but also ask them for their favorite flavor of ice-cream– then you can do three things using descriptive statistics: summarized the first set of values (on their own), summarize the second set of values (on their own), and summarize the relationship between the two sets of values.  This unit only covers the first two of these three.  Different ways to summarize the relationship between two sets of values will be covered in Units 7 and 8.

Summarizing Numerical Data

The most-popular way to summarize a set of numerical data –e.g., hours of sleep per night– is in terms of two or three aspects.  One always includes values for the center of the data and the spread of the data; in some cases, the shape of the data is also described.  A measure of center is a single value that attempts to describe an entire set of data by identifying the central position within that set of data.  The full, formal label for this descriptive statistic is measure of central tendency , but most people simply say “center.”  Another label for this is the “average.”

A measure of spread is also a single number, but this one indicates how widely the data are distributed around their center.  Another way of saying this is to talk about the “variability” of the data.  If all of the individual pieces of data are located close to the center, then the value of spread will be low; if the data are widely distributed, then the value of spread will be high.

What makes this a little bit complicated is that there are multiple ways to mathematically define the center and spread of a set of data.  For example, both the mean and the median (discussed in detail below) are valid measures of central tendency.  Similarly, both the variance (or standard deviation) and the inter-quartile range (also discussed below) are valid measures of spread.  This might suggest that there are at least four combinations of center and spread (i.e., two versions of center crossed with two version of spread), but that isn’t true.  The standard measures of center and spread actually come in pairs, such that your choice with regard to one forces you to use a particular option for the other.  If you define the center as the mean, for example, then you have to use variance (or standard deviation) for spread; if you define the center as the median, then you have to use the inter-quartile range for spread.  Because of this dependency, in what follows we shall discuss the standard measures of center and spread in pairs.  When this is finished, we shall mention some of the less popular alternatives and then, finally, turn to the issue of shape.

Measures of Center and Spread Based on Moments

The mean and variance of a set of numerical values are (technically) the first and second moments of the set of data.  Although it is not used very often in psychology, the term “moment” is quite popular in physics, where the first moment is the center of mass and the second moment is rotational inertia  (these are very useful concepts when describing how hard it is to throw or spin something).  The fact that the mean and variance of a set of numbers are the first and second moments isn’t all that important; the key is that they are based on the same approach to the data, which is why they are one of the standard pairs of measures for describing a set of numerical data.

The mean is the most popular and well known measure of central tendency.  It is what most people intend when they use the word “average.”  The mean can be calculated for any set of numerical data, discrete or continuous, regardless of units or details.  The mean is equal to the sum of all values divided by the number of values.  So, if we have n  values in a data set and they have values x 1 , x 2 , …, x n , the mean is calculated using the following formula:

 \begin{equation*} \bar {X} = \frac {\sum {X_i}} {n} \end{equation*}

Before moving forward, note two things about using the mean as the measure of center.  First, the mean is rarely one of the actual values from the original set of data.  As an extreme example: when the data are discrete (e.g., whole numbers, like the number of siblings), the mean will almost never match any of the specific values, because the mean will almost never be a whole number as well.

Second, an important property of the mean is that it includes and depends on every value in your set of data.  If any value in the data set is changed, then the mean will change.  In other words, the mean is “sensitive” to all of the data.

Variance and Standard Deviation

When the center is defined as the mean, the measure of spread to use is the variance (or the square-root of this value, which is the standard deviation ). Variance is defined as the average of the squared deviations from the mean.  The formula for variance is:

 \begin{equation*} \ Variance \medspace of \thinspace X = \frac {\sum {(X_i - \bar {X})^2}} {n - 1} \end{equation*}

(2 – 4.33) 2 + (4 – 4.33) 2 + (7 – 4.33) 2 = 12.6667

and then divide by (3−1) →  6.33

Note that, because each sub-step of the summation involves a value that has been squared, the value of variance cannot be a negative number.  Note, also, that when all of the individual pieces of data are the same, they will all be equal to the mean, so you will be adding up numbers that are all zero, so variance will also be zero.  These both make sense, because here we are calculating a measure of how spread out the data are, which will be zero when all of the data are the same and cannot be less than this.

As mentioned above, some people prefer to express this measure of spread in terms of the square-root of the variance, which is the standard deviation.  The main reason for doing this is because the units of variance are the square of the units of the original data, whereas the units of standard deviation are the same as the units of the original data.  Thus, for example, if you have response times of 2, 4, and 7 seconds, which have a mean of 4.33 seconds, then the variance is 6.33 seconds 2 (which is difficult to conceptualize), but also have a standard deviation of 2.52 seconds (which is easy to think about).

Conceptually, you can think of the standard deviation as the typical distance of any score from the mean.  In other words, the standard deviation represents the standard amount by which individual scores deviate from the mean.  The standard deviation uses the mean of the data as a baseline or reference point, and measures variability by considering the distance between each score and the mean.

Note that similar to the mean, both the variance and the standard deviation are sensitive to every value in the set of data; if any one piece of data is changed, then not only will the mean change, but the variance and standard deviation will also be changed.

 \bar {Y}\

Table 3.1. Number of study hours before an exam (X, Hours), and the grade obtained in that exam (Y, Grade) for 15 participants. The two most right columns show the deviation scores for each X and Y score.

Once we have the deviation scores for each participant, we square each of the deviation scores, and sum them.

(-5.66) 2 + (-2.66) 2 + (2.34) 2 + (0.34) 2 + (-1.66) 2 + (1.34) 2 + (4.34) 2 + (6.34) 2 + (-3.66) 2 + (-4.66) 2 + …

…( 2.34) 2 + (3.34) 2 + (-0.66) 2 + (-1.66) 2 + (0.34) 2  = 166.334

We then divide that sum by one less than the number of scores, 15 – 1 in this case:

 \ 166.334 / 14 = 11.66 \

So, 11.66 is the variance for the number of hours in our sample of participants.

In order to obtain the standard deviation, we calculate the square root of the variance:

 \sqrt {11.66 } = 3.42\

We follow the same steps to calculate the standard deviation of our participants’ grade.  First, we square each of the deviation scores (most right column in Table 3.1), and sum them:

(-8.46) 2 + (-6.46) 2 + (2.54) 2 + (-1.46) 2 + (-2.46) 2 + (-0.46) 2 + (8.54) 2 + (9.54) 2 + (-3.46) 2 + …

… (-5.46) 2 + (6.54) 2 + (5.54) 2 + (-2.46) 2 + (-3.46) 2 + (1.54) 2 = 427.734

Next, we divide that sum by one less than the number of scores, 14:

 \ 427.734 / 14 = 30.55 \

So, 30.55 is the variance for the grade in our sample of participants.

 \sqrt {30.55 } = 5.53\

Thus, you can summarize the data in our sample saying that the mean hours of study time are 13.66 , with a standard deviation of 3.42 , whereas the mean grade is 86.46 , with a standard deviation of 5.53.

Measures of Center and Spread Based on Percentiles

The second pair of measures for center and spread are based on percentile ranks and percentile values, instead of moments.  In general, the percentile rank for a given value is the percent of the data that is smaller (i.e., lower in value).  As a simple example, if the data are 2, 4, and 7, then the percentile rank for 5 is 67%, because two of the three values are smaller than 5.  Percentile ranks are usually easy to calculate.  In contrast, a percentile value (which is kind of the “opposite” of a percentile rank) is much more complicated.  For example, the percentile value for 67% when the data are 2, 4, and 7 is something between 4 and 7, because any value between 4 and 7 would be larger than two-thirds of the data.  (FYI: the percentile value is this case is 5.02.)  Fortunately, we won’t need to worry about the details when calculating that standard measures of center and spread when using the percentile-based method.

The median –which is how the percentile-based method defines center – is best thought of the middle score when the data have been arranged in order of magnitude.  To see how this can be done by hand, assume that we start with the data below:

We first re-arrange these data from smallest to largest:

The median is the middle of this new set of scores; in this case, the value (i n blue) is 56.  This is the middle value because there are 5 scores lower than it and 5 scores higher than it.  Finding the median is very easy when you have an odd number of scores.

What happens when you have an even number of scores?  What if you had only 10 scores, instead of 11?  In this case, you take the middle two scores, and calculate the mean of them.  So, if we start with the following data (which are the same as above, with the last one omitted):

We again re-arrange that data from smallest to largest:

And then calculate the mean of the 5th and 6th values (tied for the middle , in blue) to get a median of 55.50.

In general, the median is the value that splits the entire set of data into two equal halves.  Because of this, the other name for the median is 50th percentile –50% of the data are below this value and 50% of the data are above this value.  This makes the median a reasonable alternative definition of center.

Inter-Quartile Range

The inter-quartile range (typically named using its initials, IQR) is the measure of spread that is paired with the median as the measure of center.  As the name suggests, the IQR divides the data into four sub-sets, instead of just two: the bottom quarter, the next higher quarter, the next higher quarter, and the top quarter (the same as for the median, you must start by re-arranging the data from smallest to largest).  As described above, the median is the dividing line between the middle two quarters.  The IQR is the distance between the dividing line between the bottom two quarters and the dividing line between the top two quarters.

Technically, the IQR is the distance between the 25th percentile and the 75th percentile.  You calculate the value for which 25% of the data is below this point, then you calculate the value for which 25% of the data is above this point, and then you subtract the first from the second.  Because the 75th percentile cannot be lower than the 25th percentile (and is almost always much higher), the value for IQR cannot be negative number.

Returning to our example set of 11 values, for which the median was 56, the way that you can calculate the IQR by hand is as follows.  First, focus only on those values that are to the left of (i.e., lower than) the middle value:

Then calculate the “median” of these values.  In this case, the answer is 45, because the third box is the middle of these five boxes.  Therefore, the 25th percentile is 45.

Next, focus on the values that are to the right of (i.e., higher than) the original median:

The middle of these values, which is 77, is the 75th percentile.  Therefore, the IQR for these data is 32, because 77 – 45 = 32.  Note how, when the original set of data has an odd number of values (which made it easy to find the median), the middle value in the data set was ignored when finding the 25th and 75th percentiles.  In the above example, the number of values to be examined in each subsequent step was also odd (i.e., 5 each), so we selected the middle value of each subset to get the 25th and 75th percentiles.

If the number of values to be examined in each subsequent step had been even (e.g., if we had started with 9 values, so that 4 values would be used to get the 25th percentile), then the same averaging rule as we use for median would be used: use the average of the two values that tie for being in the middle.  For example, if these are the data (which are the first nine values from the original example after being sorted):

The median (in blue) is 55, the 25th percentile (the average of the two values in green) is 40, and the 75th percentile (the average of the two values in red) is 61.  Therefore, the IQR for these data is 61 – 40 = 21.

A similar procedure is used when you start with an even number of values, but with a few extra complications (these complications are caused by the particular method of calculating percentiles that is typically used in the psychology).  The first change to the procedure for calculating IQR is that now every value is included in one of the two sub-steps for getting the 25th and 75th percentile; none are omitted.  For example, if we use the same set of 10 values from above (i.e., the original 11 values with the highest omitted), for which the median was 55.50, then here is what we would use in the first sub-step:

In this case, the 25th percentile will be calculated from an odd number of values (5).  We start in the same way before, with the middle of these values (in green), which is 45.  Then we adjust it by moving the score 25% of the distance towards next lower value, which is 35.  The distance between these two values is 2.50 –i.e., (45 – 35) x .25 = 2.50– so the final value for the 25th percentile is 42.50.

The same thing is done for 75th percentile.  This time we would start with:

The starting value (in red) of 65 would then be moved 25% of the distance towards the next higher, which is 77, producing a 75th percentile of 68 –i.e., 65 + ((77 – 65) x .25) = 68.  Note how we moved the value away from the median in both cases.  If we don’t do this –if we used the same simple method as we used when the original set of data had an odd number of values– then we would slightly under-estimate the value of IQR.

Finally, if we start with an even number of pieces of data and also have an even number for each of the sub-steps (e.g., we started with 8 values), then we again have to apply the correction.  Whether you have to shift the 25th and 75th percentiles depends on original number of pieces of data, not the number that are used for the subsequent sub-steps.  To demonstrate this, here are the first eight values from the original set of data:

The first step to calculating the 25th percentile is to average the two values (in green) that tied for being in the middle of the lower half of the data; the answer is 40.  Then, as above, move this value 25% of the distance away from the median –i.e., move it down by 2.50, because (45 – 35) x .25 = 2.50.  The final value is 37.50.

Then do the same for the upper half of the data:

Start with the average of the two values (in red) that tied for being in the middle and then shift this value 25% of their difference away from the center.  The mean of the two values is 56.50 and after shifting the 75th percentile is 56.75.  Thus, the IQR for these eight pieces of data is 56.75 – 37.50 = 19.25.

Note the following about the median and IQR: because these are both based on percentiles, they are not always sensitive to every value in the set of data.  Look again at the original set of 11 values used in the examples.  Now imagine that the first (lowest) value was 4, instead of 14.  Would either the median or the IQR change?  The answer is No, neither would change .  Now imagine that the last (highest) value was 420, instead of 92.  Would either the median or IQR change?  Again, the answer is No .

Some of the other values can also change without altering the median and/or IQR, but not all of them.  If you changed the 56 in the original set to being 50, instead, for example, then the median would drop from 56 to 55, but the IQR would remain 32.  In contrast, if you only changed the 45 to being a 50, then the IQR would drop from 32 to 27, but the median would remain 56.

The one thing that is highly consistent is how you can decrease the lowest value and/or increase the highest value without changing either the median or IQR (as long as you start with at least 5 pieces of data).  This is an important property of percentiles-based methods: they are relatively insensitive to the most extreme values.  This is quite different from moments-based methods; the mean and variance of a set of data are both sensitive to every value.

Other Measures of Center and Spread

Although a vast majority of psychologists use either the mean and variance (as a pair) or the median and IQR (as a pair) as their measures of center and spread, occasionally you might come across a few other options.

The mode is a (rarely-used) way of defining the center of a set of data.  The mode is simply the value that appears the most often in a set of data.  For example, if your data are 2, 3, 3, 4, 5, and 9, then the mode is 3 because there are two 3s in the data and no other value appears more than once.  When you think about other sets of example data, you will probably see why the mode is not very popular.  First, many sets of data do not have a meaningful mode.  For the set of 2, 4, and 7, all three different values appear once each, so no value is more frequent than any other value.  When the data are continuous and measured precisely (e.g., response time in milliseconds), then this problem will happen quite often.  Now consider the set of 2, 3, 3, 4, 5, 5, 7, and 9; these data have two modes: 3 and 5.  This also happens quite often, especially when the data are discrete, such as when they must all be whole numbers.

But the greatest problem with using the mode as the measure of center is that it is often at one of the extremes, instead of being anywhere near the middle.  Here is a favorite example (even if it is not from psychology): the amount of federal income tax paid.  The most-frequent value for this –i.e., the mode of federal income tax paid– is zero.  This also happens to be the same as the lowest value.  In contrast, in 2021, for example, the mean amount of federal income tax paid was a little bit over $10,000.

Another descriptive statistic that you might come across is the range of the data.  Sometimes this is given as the lowest and highest values –e.g., “the participant ages ranged from 18 to 24 years”– which provides some information about center and spread simultaneously.  Other times the range is more specifically intended as only a measure of spread, so the difference between the highest and lowest values is given –e.g., “the average age was 21 years with a range of 6 years.”  There is nothing inherently wrong with providing the range, but it is probably best used as a supplement to one of the pairs of measures for center and spread.  This is true because range (in either format) often fails to provide sufficient detail.  For example, the set of 18, 18, 18, 18, and 24 and the set of 18, 24, 24, 24, and 24 both range from 18 to 24 (or have a range of 6), even though the data sets are clearly quite different.

Choosing the Measures of Center and Spread

When it comes to deciding which measures to use for center and spread when describing a set of numerical data –which is almost always a choice between mean and variance (or standard deviation) or median and IQR– the first thing to keep in mind is that this is not a question of “which is better?”; it is a question of which is more appropriate for the situation.  That is, the mean and the median are not just alternative ways of calculating a value for the center of a set of data; they use different definitions of the meaning of center.

So how should you make this decision?  One factor that you should consider focuses on a key difference between moments and percentiles that was mentioned above: how the mean and variance of a set of data both depend on every value, whereas the median and IQR are often unaffected by the specific values at the upper and lower extremes.  Therefore, if you believe that every value in the set of data is equally important and equally representative of whatever is being studied, then you should probably use the mean and variance for your descriptive statistics. In contrast, if you believe that some extreme values might be outliers (e.g., the participant wasn’t taking the study very seriously or was making random fast guesses), then you might want to use the median and IQR instead.

Another related factor to consider is the shape of the distribution of values in the set of data.  If the values are spread around the center in a roughly symmetrical manner, then the mean and the median will be very similar, but if there are more extreme values in one tail of the distribution (e.g., there are more extreme values above the middle than below), this will pull the mean away from the median, and the latter might better match what you think of as the center.

Finally, if you are calculating descriptive statistics as part of a process that will later involve making inferences about the population from which the sample was taken, you might want to consider the type of statistics that you will be using later.  Many inferential statistics (including t -tests, ANOVA, and the standard form of the correlation coefficient) are based on moments so, if you plan to use these later, it would be probably more appropriate to summarize the data in terms of mean and variance (or standard deviation).  Other statistics (including sign tests and alternative forms of the correlation coefficient) are based on percentiles, so if you plan to use these instead, then the median and IQR might be more appropriate for the descriptive statistics.

Hybrid Methods

Although relatively rare, there is one alternative to making a firm decision between moments (i.e., mean and variance) and percentiles (i.e., median and IQR) –namely, hybrid methods.  One example of this is as follows.  First, sort the data from smallest to largest (in the same manner as when using percentiles).  Then remove a certain number of values from the beginning and end of the list.  The most popular version of this is to remove the lowest 2.5% and the highest 2.5% of the data; for example, if you started with 200 pieces of data, remove the first 5 and the last 5, keeping the middle 190.  Then switch methods and calculate the mean and variance of the retained data.  This method is trying to have the best of both worlds: it is avoiding outliers by removing the extreme values, but it is remaining sensitive to all the data that are being retained.  When this method is used, the correct label for the final two values are the “trimmed mean” and “trimmed variance.”

Measures of Shape for Numerical Data

As the name suggests, the shape of a set of data is best thought about in terms of how the data would look if you made some sort of figure or plot of the values.  The most popular way to make a plot of a single set of numerical values starts by putting all of the data into something that is called a frequency table .  In brief, a frequency table is a list of all possible values, along with how many times each value occurs in the set of data.  This is easy to create when there are not very many different values (e.g., number of siblings); it becomes more complicated when almost every value in the set of data is unique (e.g., response time in milliseconds).

The key to resolving the problem of having too many unique values is to “bin” the data.  To bin a set of data, you choose a set of equally-spaced cut-offs, which will determine the borders of adjacent bins.  For example, if you are working with response times which happen to range from about 300 to 600 milliseconds (with every specific value being unique), you might decide to use bins that are 50 milliseconds wide, such that all values from 301 to 350 go in the first bin, all values from 351 to 400 go in the second bin, etc.  Most spreadsheet-based software packages (e.g., Excel) have built-in procedures to do this for you.

As an illustration of this process, let’s go back to the set of 11 values we have used in previous examples:

Based on the total number of values and their range, we decide to use bins that are 20 units wide.  Here are the same data in a frequency table:

Once you have a list of values or bins and the number of pieces of data in each, you can make a frequency histogram of the data, as shown in Figure 3.1:

Histogram with 5 bars

Based on this histogram, we can start to make descriptive statements about the shape of the data.  In general, these will concern two aspects, known as skewness and kurtosis , as we shall see next.

Skewness refers to the lack of symmetry.  It the left and right sides of the plot are mirror images of each other, then the distribution has no skew, because it is symmetrical; this is the case of the normal distribution (see Figure 3.2).  This clearly is not true for the example in Figure 3.1.  If the distribution has a longer tail on the left side, as is true here, then the data are said to have negative skew .  If the distribution has a longer “tail” on the right, then the distribution is said to have positive skew .  Note that you need to focus on the skinny part of each end of the plot.  The example in Figure 3.1 might appear to be heavier on the right, but skew is determined by the length of the skinny tails, which is clearly much longer on the left.  As a reference, Figure 3.2. shows you a normal distribution, perfectly symmetrical, so its skewness is zero; to the left and to the right, you can see two skewed distributions, positive and negative.  Most of the data points in the distribution with a positive skew have low values, and has a long tail on its right side.  The opposite is true for the distribution with negative skew: most of its data points have high values, and has a long tail on its left side.

Distributions with different skewness

The other aspect of shape, kurtosis, is a bit more complicated.  In general, kurtosis refers to how sharply the data are peaked, and is established in reference to a baseline or standard shape, the normal distribution, that has kurtosis zero.  When we have a nearly flat distribution, for example when every value occurs equally often, the kurtosis is negative.  When the distribution is very pointy, the kurtosis is positive.

If the shape of your data looks like a bell curve, then it’s said to be mesokurtic (“meso” means middle or intermediate in Greek).  If the shape of your data is flatter than this, then it’s said to be platykurtic (“platy” means flat in Greek).  If your shape is more pointed from this, then your data are leptokurtic (“lepto” means thin, narrow, or pointed in Greek).  Examples of these shapes can be seen in Figure 3.3.

Distributions with different levels of kurtosis

Both skew and kurtosis can vary a lot; these two attributes of shape are not completely independent.  That is, it is impossible for a perfectly flat distribution to have any skew; it is also impossible for a highly-skewed distribution to have zero kurtosis.  A large proportion of the data that is collected by psychologists is approximately normal, but with a long right tail.  In this situation, a good verbal label for the overall shape could be positively-skewed normal, even if that seems a bit contradictory, because the true normal distribution is actually symmetrical (see Figures 3.2 and 3.3).  The goal is to summarize the shape in a way that is easy to understand while being as accurate as possible.  You can always show a picture of your distribution to your audience.  A simple summary of the shape of the histogram in Figure 3.1 could be: roughly normal, but with a lot of negative skew ; this tells your audience that the data have a decent-sized peak in the middle, but the lower tail is a lot longer than the upper tail.

Numerical Values for Skew and Kurtosis

In some rare situations, you might want to be even more precise about the shape of a set of data.  Assuming that you used the mean and variance as your measures of center and spread, in these cases, you can use some (complicated) formulae to calculate specific numerical values for skew and kurtosis.  These are the third and fourth moments of the distribution (which is why they can only be used with the mean and variance, because those are the first and second moments of the data).  The details of these measures are beyond this course, but to give you an idea, as indicated above, values that depart from zero tells you that the shape is different from the normal distribution. A value of skew that is less than –1 or greater than +1 implies that the shape is notably skewed, whereas a value of kurtosis that is more than 1 unit away from zero imply that the data are not mesokurtic.

Summarizing Categorical Data

By definition, you cannnot summarize a set of categorical data (e.g., favorite colors) in terms of a numerical mean and/or a numerical spread.  It also does not make much sense to talk about shape, because this would depend on the order in which you placed the options on the X-axis of the plot.  Therefore, in this situation, we usually make a frequency table (with the options in any order that we wish).  You can also make a frequency histogram, but be careful not to read anything important into the apparent shape, because changing the order of the options would completely alter the shape.

An issue worth mentioning here is something that is similar to the process of binning.  Assume, for example, that you have taken a sample of 100 undergraduates, asking each for their favorite genre of music.  Assume that a majority of the respondents chose either pop (24), hip-hop (27), rock (25), or classical (16), but a few chose techno (3), trance (2), or country (3).  In this situation, you might want to combine all of the rare responses into one category with the label Other .  The reason for doing this is that it is difficult to come to any clear conclusions when something is rare.  As a general rule, if a category contains fewer than 5% of the observations, then it should probably be combined with one or more other options.  An example frequency table for such data is this:

Finally, to be technically accurate, it should be mentioned that there are some ways to quantify whether each of the options is being selected the same percent of the time, including the Chi-square (pronounced “kai-squared”) test and relative entropy (which comes from physics), but these are not very usual.  In general, most researchers just make a table and/or maybe a histogram to show the distribution of the categorical values.

A set of individuals selected from a population, typically intended to represent the population in a research study.

A variable that consists of separate, indivisible categories. No values can exist between two neighboring categories.

The entire set of individuals of interest for a given research question.

An individual value in a dataset that is substantially different (larger or smaller) than the other values in the dataset.

The end sections of a data distribution where the scores taper off.

Statistical analyses and techniques that are used to make inferences beyond what is observed in a given sample, and make decisions about what the data mean.

Data Analysis in the Psychological Sciences: A Practical, Applied, Multimedia Approach Copyright © 2023 by J Toby Mordkoff and Leyre Castro is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Descriptive Statistics for Summarising Data

Ray w. cooksey.

UNE Business School, University of New England, Armidale, NSW Australia

This chapter discusses and illustrates descriptive statistics . The purpose of the procedures and fundamental concepts reviewed in this chapter is quite straightforward: to facilitate the description and summarisation of data. By ‘describe’ we generally mean either the use of some pictorial or graphical representation of the data (e.g. a histogram, box plot, radar plot, stem-and-leaf display, icon plot or line graph) or the computation of an index or number designed to summarise a specific characteristic of a variable or measurement (e.g., frequency counts, measures of central tendency, variability, standard scores). Along the way, we explore the fundamental concepts of probability and the normal distribution. We seldom interpret individual data points or observations primarily because it is too difficult for the human brain to extract or identify the essential nature, patterns, or trends evident in the data, particularly if the sample is large. Rather we utilise procedures and measures which provide a general depiction of how the data are behaving. These statistical procedures are designed to identify or display specific patterns or trends in the data. What remains after their application is simply for us to interpret and tell the story.

The first broad category of statistics we discuss concerns descriptive statistics . The purpose of the procedures and fundamental concepts in this category is quite straightforward: to facilitate the description and summarisation of data. By ‘describe’ we generally mean either the use of some pictorial or graphical representation of the data or the computation of an index or number designed to summarise a specific characteristic of a variable or measurement.

We seldom interpret individual data points or observations primarily because it is too difficult for the human brain to extract or identify the essential nature, patterns, or trends evident in the data, particularly if the sample is large. Rather we utilise procedures and measures which provide a general depiction of how the data are behaving. These statistical procedures are designed to identify or display specific patterns or trends in the data. What remains after their application is simply for us to interpret and tell the story.

Reflect on the QCI research scenario and the associated data set discussed in Chap. 10.1007/978-981-15-2537-7_4. Consider the following questions that Maree might wish to address with respect to decision accuracy and speed scores:

  • What was the typical level of accuracy and decision speed for inspectors in the sample? [see Procedure 5.4 – Assessing central tendency.]
  • What was the most common accuracy and speed score amongst the inspectors? [see Procedure 5.4 – Assessing central tendency.]
  • What was the range of accuracy and speed scores; the lowest and the highest scores? [see Procedure 5.5 – Assessing variability.]
  • How frequently were different levels of inspection accuracy and speed observed? What was the shape of the distribution of inspection accuracy and speed scores? [see Procedure 5.1 – Frequency tabulation, distributions & crosstabulation.]
  • What percentage of inspectors would have ‘failed’ to ‘make the cut’ assuming the industry standard for acceptable inspection accuracy and speed combined was set at 95%? [see Procedure 5.7 – Standard ( z ) scores.]
  • How variable were the inspectors in their accuracy and speed scores? Were all the accuracy and speed levels relatively close to each other in magnitude or were the scores widely spread out over the range of possible test outcomes? [see Procedure 5.5 – Assessing variability.]
  • What patterns might be visually detected when looking at various QCI variables singly and together as a set? [see Procedure 5.2 – Graphical methods for dispaying data, Procedure 5.3 – Multivariate graphs & displays, and Procedure 5.6 – Exploratory data analysis.]

This chapter includes discussions and illustrations of a number of procedures available for answering questions about data like those posed above. In addition, you will find discussions of two fundamental concepts, namely probability and the normal distribution ; concepts that provide building blocks for Chaps. 10.1007/978-981-15-2537-7_6 and 10.1007/978-981-15-2537-7_7.

Procedure 5.1: Frequency Tabulation, Distributions & Crosstabulation

Frequency tabulation and distributions.

Frequency tabulation serves to provide a convenient counting summary for a set of data that facilitates interpretation of various aspects of those data. Basically, frequency tabulation occurs in two stages:

  • First, the scores in a set of data are rank ordered from the lowest value to the highest value.
  • Second, the number of times each specific score occurs in the sample is counted. This count records the frequency of occurrence for that specific data value.

Consider the overall job satisfaction variable, jobsat , from the QCI data scenario. Performing frequency tabulation across the 112 Quality Control Inspectors on this variable using the SPSS Frequencies procedure (Allen et al. 2019 , ch. 3; George and Mallery 2019 , ch. 6) produces the frequency tabulation shown in Table 5.1 . Note that three of the inspectors in the sample did not provide a rating for jobsat thereby producing three missing values (= 2.7% of the sample of 112) and leaving 109 inspectors with valid data for the analysis.

Frequency tabulation of overall job satisfaction scores

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab1_HTML.jpg

The display of frequency tabulation is often referred to as the frequency distribution for the sample of scores. For each value of a variable, the frequency of its occurrence in the sample of data is reported. It is possible to compute various percentages and percentile values from a frequency distribution.

Table 5.1 shows the ‘Percent’ or relative frequency of each score (the percentage of the 112 inspectors obtaining each score, including those inspectors who were missing scores, which SPSS labels as ‘System’ missing). Table 5.1 also shows the ‘Valid Percent’ which is computed only for those inspectors in the sample who gave a valid or non-missing response.

Finally, it is possible to add up the ‘Valid Percent’ values, starting at the low score end of the distribution, to form the cumulative distribution or ‘Cumulative Percent’ . A cumulative distribution is useful for finding percentiles which reflect what percentage of the sample scored at a specific value or below.

We can see in Table 5.1 that 4 of the 109 valid inspectors (a ‘Valid Percent’ of 3.7%) indicated the lowest possible level of job satisfaction—a value of 1 (Very Low) – whereas 18 of the 109 valid inspectors (a ‘Valid Percent’ of 16.5%) indicated the highest possible level of job satisfaction—a value of 7 (Very High). The ‘Cumulative Percent’ number of 18.3 in the row for the job satisfaction score of 3 can be interpreted as “roughly 18% of the sample of inspectors reported a job satisfaction score of 3 or less”; that is, nearly a fifth of the sample expressed some degree of negative satisfaction with their job as a quality control inspector in their particular company.

If you have a large data set having many different scores for a particular variable, it may be more useful to tabulate frequencies on the basis of intervals of scores.

For the accuracy scores in the QCI database, you could count scores occurring in intervals such as ‘less than 75% accuracy’, ‘between 75% but less than 85% accuracy’, ‘between 85% but less than 95% accuracy’, and ‘95% accuracy or greater’, rather than counting the individual scores themselves. This would yield what is termed a ‘grouped’ frequency distribution since the data have been grouped into intervals or score classes. Producing such an analysis using SPSS would involve extra steps to create the new category or ‘grouping’ system for scores prior to conducting the frequency tabulation.

Crosstabulation

In a frequency crosstabulation , we count frequencies on the basis of two variables simultaneously rather than one; thus we have a bivariate situation.

For example, Maree might be interested in the number of male and female inspectors in the sample of 112 who obtained each jobsat score. Here there are two variables to consider: inspector’s gender and inspector’s j obsat score. Table 5.2 shows such a crosstabulation as compiled by the SPSS Crosstabs procedure (George and Mallery 2019 , ch. 8). Note that inspectors who did not report a score for jobsat and/or gender have been omitted as missing values, leaving 106 valid inspectors for the analysis.

Frequency crosstabulation of jobsat scores by gender category for the QCI data

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab2_HTML.jpg

The crosstabulation shown in Table 5.2 gives a composite picture of the distribution of satisfaction levels for male inspectors and for female inspectors. If frequencies or ‘Counts’ are added across the gender categories, we obtain the numbers in the ‘Total’ column (the percentages or relative frequencies are also shown immediately below each count) for each discrete value of jobsat (note this column of statistics differs from that in Table 5.1 because the gender variable was missing for certain inspectors). By adding down each gender column, we obtain, in the bottom row labelled ‘Total’, the number of males and the number of females that comprised the sample of 106 valid inspectors.

The totals, either across the rows or down the columns of the crosstabulation, are termed the marginal distributions of the table. These marginal distributions are equivalent to frequency tabulations for each of the variables jobsat and gender . As with frequency tabulation, various percentage measures can be computed in a crosstabulation, including the percentage of the sample associated with a specific count within either a row (‘% within jobsat ’) or a column (‘% within gender ’). You can see in Table 5.2 that 18 inspectors indicated a job satisfaction level of 7 (Very High); of these 18 inspectors reported in the ‘Total’ column, 8 (44.4%) were male and 10 (55.6%) were female. The marginal distribution for gender in the ‘Total’ row shows that 57 inspectors (53.8% of the 106 valid inspectors) were male and 49 inspectors (46.2%) were female. Of the 57 male inspectors in the sample, 8 (14.0%) indicated a job satisfaction level of 7 (Very High). Furthermore, we could generate some additional interpretive information of value by adding the ‘% within gender’ values for job satisfaction levels of 5, 6 and 7 (i.e. differing degrees of positive job satisfaction). Here we would find that 68.4% (= 24.6% + 29.8% + 14.0%) of male inspectors indicated some degree of positive job satisfaction compared to 61.2% (= 10.2% + 30.6% + 20.4%) of female inspectors.

This helps to build a picture of the possible relationship between an inspector’s gender and their level of job satisfaction (a relationship that, as we will see later, can be quantified and tested using Procedure 10.1007/978-981-15-2537-7_6#Sec14 and Procedure 10.1007/978-981-15-2537-7_7#Sec17).

It should be noted that a crosstabulation table such as that shown in Table 5.2 is often referred to as a contingency table about which more will be said later (see Procedure 10.1007/978-981-15-2537-7_7#Sec17 and Procedure 10.1007/978-981-15-2537-7_7#Sec115).

Frequency tabulation is useful for providing convenient data summaries which can aid in interpreting trends in a sample, particularly where the number of discrete values for a variable is relatively small. A cumulative percent distribution provides additional interpretive information about the relative positioning of specific scores within the overall distribution for the sample.

Crosstabulation permits the simultaneous examination of the distributions of values for two variables obtained from the same sample of observations. This examination can yield some useful information about the possible relationship between the two variables. More complex crosstabulations can be also done where the values of three or more variables are tracked in a single systematic summary. The use of frequency tabulation or cross-tabulation in conjunction with various other statistical measures, such as measures of central tendency (see Procedure 5.4 ) and measures of variability (see Procedure 5.5 ), can provide a relatively complete descriptive summary of any data set.

Disadvantages

Frequency tabulations can get messy if interval or ratio-level measures are tabulated simply because of the large number of possible data values. Grouped frequency distributions really should be used in such cases. However, certain choices, such as the size of the score interval (group size), must be made, often arbitrarily, and such choices can affect the nature of the final frequency distribution.

Additionally, percentage measures have certain problems associated with them, most notably, the potential for their misinterpretation in small samples. One should be sure to know the sample size on which percentage measures are based in order to obtain an interpretive reference point for the actual percentage values.

For example

In a sample of 10 individuals, 20% represents only two individuals whereas in a sample of 300 individuals, 20% represents 60 individuals. If all that is reported is the 20%, then the mental inference drawn by readers is likely to be that a sizeable number of individuals had a score or scores of a particular value—but what is ‘sizeable’ depends upon the total number of observations on which the percentage is based.

Where Is This Procedure Useful?

Frequency tabulation and crosstabulation are very commonly applied procedures used to summarise information from questionnaires, both in terms of tabulating various demographic characteristics (e.g. gender, age, education level, occupation) and in terms of actual responses to questions (e.g. numbers responding ‘yes’ or ‘no’ to a particular question). They can be particularly useful in helping to build up the data screening and demographic stories discussed in Chap. 10.1007/978-981-15-2537-7_4. Categorical data from observational studies can also be analysed with this technique (e.g. the number of times Suzy talks to Frank, to Billy, and to John in a study of children’s social interactions).

Certain types of experimental research designs may also be amenable to analysis by crosstabulation with a view to drawing inferences about distribution differences across the sets of categories for the two variables being tracked.

You could employ crosstabulation in conjunction with the tests described in Procedure 10.1007/978-981-15-2537-7_7#Sec17 to see if two different styles of advertising campaign differentially affect the product purchasing patterns of male and female consumers.

In the QCI database, Maree could employ crosstabulation to help her answer the question “do different types of electronic manufacturing firms ( company ) differ in terms of their tendency to employ male versus female quality control inspectors ( gender )?”

Software Procedures

Procedure 5.2: graphical methods for displaying data.

Graphical methods for displaying data include bar and pie charts, histograms and frequency polygons, line graphs and scatterplots. It is important to note that what is presented here is a small but representative sampling of the types of simple graphs one can produce to summarise and display trends in data. Generally speaking, SPSS offers the easiest facility for producing and editing graphs, but with a rather limited range of styles and types. SYSTAT, STATGRAPHICS and NCSS offer a much wider range of graphs (including graphs unique to each package), but with the drawback that it takes somewhat more effort to get the graphs in exactly the form you want.

Bar and Pie Charts

These two types of graphs are useful for summarising the frequency of occurrence of various values (or ranges of values) where the data are categorical (nominal or ordinal level of measurement).

  • A bar chart uses vertical and horizontal axes to summarise the data. The vertical axis is used to represent frequency (number) of occurrence or the relative frequency (percentage) of occurrence; the horizontal axis is used to indicate the data categories of interest.
  • A pie chart gives a simpler visual representation of category frequencies by cutting a circular plot into wedges or slices whose sizes are proportional to the relative frequency (percentage) of occurrence of specific data categories. Some pie charts can have a one or more slices emphasised by ‘exploding’ them out from the rest of the pie.

Consider the company variable from the QCI database. This variable depicts the types of manufacturing firms that the quality control inspectors worked for. Figure 5.1 illustrates a bar chart summarising the percentage of female inspectors in the sample coming from each type of firm. Figure 5.2 shows a pie chart representation of the same data, with an ‘exploded slice’ highlighting the percentage of female inspectors in the sample who worked for large business computer manufacturers – the lowest percentage of the five types of companies. Both graphs were produced using SPSS.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig1_HTML.jpg

Bar chart: Percentage of female inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig2_HTML.jpg

Pie chart: Percentage of female inspectors

The pie chart was modified with an option to show the actual percentage along with the label for each category. The bar chart shows that computer manufacturing firms have relatively fewer female inspectors compared to the automotive and electrical appliance (large and small) firms. This trend is less clear from the pie chart which suggests that pie charts may be less visually interpretable when the data categories occur with rather similar frequencies. However, the ‘exploded slice’ option can help interpretation in some circumstances.

Certain software programs, such as SPSS, STATGRAPHICS, NCSS and Microsoft Excel, offer the option of generating 3-dimensional bar charts and pie charts and incorporating other ‘bells and whistles’ that can potentially add visual richness to the graphic representation of the data. However, you should generally be careful with these fancier options as they can produce distortions and create ambiguities in interpretation (e.g. see discussions in Jacoby 1997 ; Smithson 2000 ; Wilkinson 2009 ). Such distortions and ambiguities could ultimately end up providing misinformation to researchers as well as to those who read their research.

Histograms and Frequency Polygons

These two types of graphs are useful for summarising the frequency of occurrence of various values (or ranges of values) where the data are essentially continuous (interval or ratio level of measurement) in nature. Both histograms and frequency polygons use vertical and horizontal axes to summarise the data. The vertical axis is used to represent the frequency (number) of occurrence or the relative frequency (percentage) of occurrences; the horizontal axis is used for the data values or ranges of values of interest. The histogram uses bars of varying heights to depict frequency; the frequency polygon uses lines and points.

There is a visual difference between a histogram and a bar chart: the bar chart uses bars that do not physically touch, signifying the discrete and categorical nature of the data, whereas the bars in a histogram physically touch to signal the potentially continuous nature of the data.

Suppose Maree wanted to graphically summarise the distribution of speed scores for the 112 inspectors in the QCI database. Figure 5.3 (produced using NCSS) illustrates a histogram representation of this variable. Figure 5.3 also illustrates another representational device called the ‘density plot’ (the solid tracing line overlaying the histogram) which gives a smoothed impression of the overall shape of the distribution of speed scores. Figure 5.4 (produced using STATGRAPHICS) illustrates the frequency polygon representation for the same data.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig3_HTML.jpg

Histogram of the speed variable (with density plot overlaid)

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig4_HTML.jpg

Frequency polygon plot of the speed variable

These graphs employ a grouped format where speed scores which fall within specific intervals are counted as being essentially the same score. The shape of the data distribution is reflected in these plots. Each graph tells us that the inspection speed scores are positively skewed with only a few inspectors taking very long times to make their inspection judgments and the majority of inspectors taking rather shorter amounts of time to make their decisions.

Both representations tell a similar story; the choice between them is largely a matter of personal preference. However, if the number of bars to be plotted in a histogram is potentially very large (and this is usually directly controllable in most statistical software packages), then a frequency polygon would be the preferred representation simply because the amount of visual clutter in the graph will be much reduced.

It is somewhat of an art to choose an appropriate definition for the width of the score grouping intervals (or ‘bins’ as they are often termed) to be used in the plot: choose too many and the plot may look too lumpy and the overall distributional trend may not be obvious; choose too few and the plot will be too coarse to give a useful depiction. Programs like SPSS, SYSTAT, STATGRAPHICS and NCSS are designed to choose an ‘appropriate’ number of bins to be used, but the analyst’s eye is often a better judge than any statistical rule that a software package would use.

There are several interesting variations of the histogram which can highlight key data features or facilitate interpretation of certain trends in the data. One such variation is a graph is called a dual histogram (available in SYSTAT; a variation called a ‘comparative histogram’ can be created in NCSS) – a graph that facilitates visual comparison of the frequency distributions for a specific variable for participants from two distinct groups.

Suppose Maree wanted to graphically compare the distributions of speed scores for inspectors in the two categories of education level ( educlev ) in the QCI database. Figure 5.5 shows a dual histogram (produced using SYSTAT) that accomplishes this goal. This graph still employs the grouped format where speed scores falling within particular intervals are counted as being essentially the same score. The shape of the data distribution within each group is also clearly reflected in this plot. However, the story conveyed by the dual histogram is that, while the inspection speed scores are positively skewed for inspectors in both categories of educlev, the comparison suggests that inspectors with a high school level of education (= 1) tend to take slightly longer to make their inspection decisions than do their colleagues who have a tertiary qualification (= 2).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig5_HTML.jpg

Dual histogram of speed for the two categories of educlev

Line Graphs

The line graph is similar in style to the frequency polygon but is much more general in its potential for summarising data. In a line graph, we seldom deal with percentage or frequency data. Instead we can summarise other types of information about data such as averages or means (see Procedure 5.4 for a discussion of this measure), often for different groups of participants. Thus, one important use of the line graph is to break down scores on a specific variable according to membership in the categories of a second variable.

In the context of the QCI database, Maree might wish to summarise the average inspection accuracy scores for the inspectors from different types of manufacturing companies. Figure 5.6 was produced using SPSS and shows such a line graph.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig6_HTML.jpg

Line graph comparison of companies in terms of average inspection accuracy

Note how the trend in performance across the different companies becomes clearer with such a visual representation. It appears that the inspectors from the Large Business Computer and PC manufacturing companies have better average inspection accuracy compared to the inspectors from the remaining three industries.

With many software packages, it is possible to further elaborate a line graph by including error or confidence intervals bars (see Procedure 10.1007/978-981-15-2537-7_8#Sec18). These give some indication of the precision with which the average level for each category in the population has been estimated (narrow bars signal a more precise estimate; wide bars signal a less precise estimate).

Figure 5.7 shows such an elaborated line graph, using 95% confidence interval bars, which can be used to help make more defensible judgments (compared to Fig. 5.6 ) about whether the companies are substantively different from each other in average inspection performance. Companies whose confidence interval bars do not overlap each other can be inferred to be substantively different in performance characteristics.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig7_HTML.jpg

Line graph using confidence interval bars to compare accuracy across companies

The accuracy confidence interval bars for participants from the Large Business Computer manufacturing firms do not overlap those from the Large or Small Electrical Appliance manufacturers or the Automobile manufacturers.

We might conclude that quality control inspection accuracy is substantially better in the Large Business Computer manufacturing companies than in these other industries but is not substantially better than the PC manufacturing companies. We might also conclude that inspection accuracy in PC manufacturing companies is not substantially different from Small Electrical Appliance manufacturers.

Scatterplots

Scatterplots are useful in displaying the relationship between two interval- or ratio-scaled variables or measures of interest obtained on the same individuals, particularly in correlational research (see Fundamental Concept 10.1007/978-981-15-2537-7_6#Sec1 and Procedure 10.1007/978-981-15-2537-7_6#Sec4).

In a scatterplot, one variable is chosen to be represented on the horizontal axis; the second variable is represented on the vertical axis. In this type of plot, all data point pairs in the sample are graphed. The shape and tilt of the cloud of points in a scatterplot provide visual information about the strength and direction of the relationship between the two variables. A very compact elliptical cloud of points signals a strong relationship; a very loose or nearly circular cloud signals a weak or non-existent relationship. A cloud of points generally tilted upward toward the right side of the graph signals a positive relationship (higher scores on one variable associated with higher scores on the other and vice-versa). A cloud of points generally tilted downward toward the right side of the graph signals a negative relationship (higher scores on one variable associated with lower scores on the other and vice-versa).

Maree might be interested in displaying the relationship between inspection accuracy and inspection speed in the QCI database. Figure 5.8 , produced using SPSS, shows what such a scatterplot might look like. Several characteristics of the data for these two variables can be noted in Fig. 5.8 . The shape of the distribution of data points is evident. The plot has a fan-shaped characteristic to it which indicates that accuracy scores are highly variable (exhibit a very wide range of possible scores) at very fast inspection speeds but get much less variable and tend to be somewhat higher as inspection speed increases (where inspectors take longer to make their quality control decisions). Thus, there does appear to be some relationship between inspection accuracy and inspection speed (a weak positive relationship since the cloud of points tends to be very loose but tilted generally upward toward the right side of the graph – slower speeds tend to be slightly associated with higher accuracy.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig8_HTML.jpg

Scatterplot relating inspection accuracy to inspection speed

However, it is not the case that the inspection decisions which take longest to make are necessarily the most accurate (see the labelled points for inspectors 7 and 62 in Fig. 5.8 ). Thus, Fig. 5.8 does not show a simple relationship that can be unambiguously summarised by a statement like “the longer an inspector takes to make a quality control decision, the more accurate that decision is likely to be”. The story is more complicated.

Some software packages, such as SPSS, STATGRAPHICS and SYSTAT, offer the option of using different plotting symbols or markers to represent the members of different groups so that the relationship between the two focal variables (the ones anchoring the X and Y axes) can be clarified with reference to a third categorical measure.

Maree might want to see if the relationship depicted in Fig. 5.8 changes depending upon whether the inspector was tertiary-qualified or not (this information is represented in the educlev variable of the QCI database).

Figure 5.9 shows what such a modified scatterplot might look like; the legend in the upper corner of the figure defines the marker symbols for each category of the educlev variable. Note that for both High School only-educated inspectors and Tertiary-qualified inspectors, the general fan-shaped relationship between accuracy and speed is the same. However, it appears that the distribution of points for the High School only-educated inspectors is shifted somewhat upward and toward the right of the plot suggesting that these inspectors tend to be somewhat more accurate as well as slower in their decision processes.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig9_HTML.jpg

Scatterplot displaying accuracy vs speed conditional on educlev group

There are many other styles of graphs available, often dependent upon the specific statistical package you are using. Interestingly, NCSS and, particularly, SYSTAT and STATGRAPHICS, appear to offer the most variety in terms of types of graphs available for visually representing data. A reading of the user’s manuals for these programs (see the Useful additional readings) would expose you to the great diversity of plotting techniques available to researchers. Many of these techniques go by rather interesting names such as: Chernoff’s faces, radar plots, sunflower plots, violin plots, star plots, Fourier blobs, and dot plots.

These graphical methods provide summary techniques for visually presenting certain characteristics of a set of data. Visual representations are generally easier to understand than a tabular representation and when these plots are combined with available numerical statistics, they can give a very complete picture of a sample of data. Newer methods have become available which permit more complex representations to be depicted, opening possibilities for creatively visually representing more aspects and features of the data (leading to a style of visual data storytelling called infographics ; see, for example, McCandless 2014 ; Toseland and Toseland 2012 ). Many of these newer methods can display data patterns from multiple variables in the same graph (several of these newer graphical methods are illustrated and discussed in Procedure 5.3 ).

Graphs tend to be cumbersome and space consuming if a great many variables need to be summarised. In such cases, using numerical summary statistics (such as means or correlations) in tabular form alone will provide a more economical and efficient summary. Also, it can be very easy to give a misleading picture of data trends using graphical methods by simply choosing the ‘correct’ scaling for maximum effect or choosing a display option (such as a 3-D effect) that ‘looks’ presentable but which actually obscures a clear interpretation (see Smithson 2000 ; Wilkinson 2009 ).

Thus, you must be careful in creating and interpreting visual representations so that the influence of aesthetic choices for sake of appearance do not become more important than obtaining a faithful and valid representation of the data—a very real danger with many of today’s statistical packages where ‘default’ drawing options have been pre-programmed in. No single plot can completely summarise all possible characteristics of a sample of data. Thus, choosing a specific method of graphical display may, of necessity, force a behavioural researcher to represent certain data characteristics (such as frequency) at the expense of others (such as averages).

Virtually any research design which produces quantitative data and statistics (even to the extent of just counting the number of occurrences of several events) provides opportunities for graphical data display which may help to clarify or illustrate important data characteristics or relationships. Remember, graphical displays are communication tools just like numbers—which tool to choose depends upon the message to be conveyed. Visual representations of data are generally more useful in communicating to lay persons who are unfamiliar with statistics. Care must be taken though as these same lay people are precisely the people most likely to misinterpret a graph if it has been incorrectly drawn or scaled.

Procedure 5.3: Multivariate Graphs & Displays

Graphical methods for displaying multivariate data (i.e. many variables at once) include scatterplot matrices, radar (or spider) plots, multiplots, parallel coordinate displays, and icon plots. Multivariate graphs are useful for visualising broad trends and patterns across many variables (Cleveland 1995 ; Jacoby 1998 ). Such graphs typically sacrifice precision in representation in favour of a snapshot pictorial summary that can help you form general impressions of data patterns.

It is important to note that what is presented here is a small but reasonably representative sampling of the types of graphs one can produce to summarise and display trends in multivariate data. Generally speaking, SYSTAT offers the best facilities for producing multivariate graphs, followed by STATGRAPHICS, but with the drawback that it is somewhat tricky to get the graphs in exactly the form you want. SYSTAT also has excellent facilities for creating new forms and combinations of graphs – essentially allowing graphs to be tailor-made for a specific communication purpose. Both SPSS and NCSS offer a more limited range of multivariate graphs, generally restricted to scatterplot matrices and variations of multiplots. Microsoft Excel or STATGRAPHICS are the packages to use if radar or spider plots are desired.

Scatterplot Matrices

A scatterplot matrix is a useful multivariate graph designed to show relationships between pairs of many variables in the same display.

Figure 5.10 illustrates a scatterplot matrix, produced using SYSTAT, for the mentabil , accuracy , speed , jobsat and workcond variables in the QCI database. It is easy to see that all the scatterplot matrix does is stack all pairs of scatterplots into a format where it is easy to pick out the graph for any ‘row’ variable that intersects a column ‘variable’.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig10_HTML.jpg

Scatterplot matrix relating mentabil , accuracy , speed , jobsat & workcond

In those plots where a ‘row’ variable intersects itself in a column of the matrix (along the so-called ‘diagonal’), SYSTAT permits a range of univariate displays to be shown. Figure 5.10 shows univariate histograms for each variable (recall Procedure 5.2 ). One obvious drawback of the scatterplot matrix is that, if many variables are to be displayed (say ten or more); the graph gets very crowded and becomes very hard to visually appreciate.

Looking at the first column of graphs in Fig. 5.10 , we can see the scatterplot relationships between mentabil and each of the other variables. We can get a visual impression that mentabil seems to be slightly negatively related to accuracy (the cloud of scatter points tends to angle downward to the right, suggesting, very slightly, that higher mentabil scores are associated with lower levels of accuracy ).

Conversely, the visual impression of the relationship between mentabil and speed is that the relationship is slightly positive (higher mentabil scores tend to be associated with higher speed scores = longer inspection times). Similar types of visual impressions can be formed for other parts of Fig. 5.10 . Notice that the histogram plots along the diagonal give a clear impression of the shape of the distribution for each variable.

Radar Plots

The radar plot (also known as a spider graph for obvious reasons) is a simple and effective device for displaying scores on many variables. Microsoft Excel offers a range of options and capabilities for producing radar plots, such as the plot shown in Fig. 5.11 . Radar plots are generally easy to interpret and provide a good visual basis for comparing plots from different individuals or groups, even if a fairly large number of variables (say, up to about 25) are being displayed. Like a clock face, variables are evenly spaced around the centre of the plot in clockwise order starting at the 12 o’clock position. Visual interpretation of a radar plot primarily relies on shape comparisons, i.e. the rise and fall of peaks and valleys along the spokes around the plot. Valleys near the centre display low scores on specific variables, peaks near the outside of the plot display high scores on specific variables. [Note that, technically, radar plots employ polar coordinates.] SYSTAT can draw graphs using polar coordinates but not as easily as Excel can, from the user’s perspective. Radar plots work best if all the variables represented are measured on the same scale (e.g. a 1 to 7 Likert-type scale or 0% to 100% scale). Individuals who are missing any scores on the variables being plotted are typically omitted.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig11_HTML.jpg

Radar plot comparing attitude ratings for inspectors 66 and 104

The radar plot in Fig. 5.11 , produced using Excel, compares two specific inspectors, 66 and 104, on the nine attitude rating scales. Inspector 66 gave the highest rating (= 7) on the cultqual variable and inspector 104 gave the lowest rating (= 1). The plot shows that inspector 104 tended to provide very low ratings on all nine attitude variables, whereas inspector 66 tended to give very high ratings on all variables except acctrain and trainapp , where the scores were similar to those for inspector 104. Thus, in general, inspector 66 tended to show much more positive attitudes toward their workplace compared to inspector 104.

While Fig. 5.11 was generated to compare the scores for two individuals in the QCI database, it would be just as easy to produce a radar plot that compared the five types of companies in terms of their average ratings on the nine variables, as shown in Fig. 5.12 .

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig12_HTML.jpg

Radar plot comparing average attitude ratings for five types of company

Here we can form the visual impression that the five types of companies differ most in their average ratings of mgmtcomm and least in the average ratings of polsatis . Overall, the average ratings from inspectors from PC manufacturers (black diamonds with solid lines) seem to be generally the most positive as their scores lie on or near the outer ring of scores and those from Automobile manufacturers tend to be least positive on many variables (except the training-related variables).

Extrapolating from Fig. 5.12 , you may rightly conclude that including too many groups and/or too many variables in a radar plot comparison can lead to so much clutter that any visual comparison would be severely degraded. You may have to experiment with using colour-coded lines to represent different groups versus line and marker shape variations (as used in Fig. 5.12 ), because choice of coding method for groups can influence the interpretability of a radar plot.

A multiplot is simply a hybrid style of graph that can display group comparisons across a number of variables. There are a wide variety of possible multiplots one could potentially design (SYSTAT offers great capabilities with respect to multiplots). Figure 5.13 shows a multiplot comprising a side-by-side series of profile-based line graphs – one graph for each type of company in the QCI database.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig13_HTML.jpg

Multiplot comparing profiles of average attitude ratings for five company types

The multiplot in Fig. 5.13 , produced using SYSTAT, graphs the profile of average attitude ratings for all inspectors within a specific type of company. This multiplot shows the same story as the radar plot in Fig. 5.12 , but in a different graphical format. It is still fairly clear that the average ratings from inspectors from PC manufacturers tend to be higher than for the other types of companies and the profile for inspectors from automobile manufacturers tends to be lower than for the other types of companies.

The profile for inspectors from large electrical appliance manufacturers is the flattest, meaning that their average attitude ratings were less variable than for other types of companies. Comparing the ease with which you can glean the visual impressions from Figs. 5.12 and 5.13 may lead you to prefer one style of graph over another. If you have such preferences, chances are others will also, which may mean you need to carefully consider your options when deciding how best to display data for effect.

Frequently, choice of graph is less a matter of which style is right or wrong, but more a matter of which style will suit specific purposes or convey a specific story, i.e. the choice is often strategic.

Parallel Coordinate Displays

A parallel coordinate display is useful for displaying individual scores on a range of variables, all measured using the same scale. Furthermore, such graphs can be combined side-by-side to facilitate very broad visual comparisons among groups, while retaining individual profile variability in scores. Each line in a parallel coordinate display represents one individual, e.g. an inspector.

The interpretation of a parallel coordinate display, such as the two shown in Fig. 5.14 , depends on visual impressions of the peaks and valleys (highs and lows) in the profiles as well as on the density of similar profile lines. The graph is called ‘parallel coordinate’ simply because it assumes that all variables are measured on the same scale and that scores for each variable can therefore be located along vertical axes that are parallel to each other (imagine vertical lines on Fig. 5.14 running from bottom to top for each variable on the X-axis). The main drawback of this method of data display is that only those individuals in the sample who provided legitimate scores on all of the variables being plotted (i.e. who have no missing scores) can be displayed.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig14_HTML.jpg

Parallel coordinate displays comparing profiles of average attitude ratings for five company types

The parallel coordinate display in Fig. 5.14 , produced using SYSTAT, graphs the profile of average attitude ratings for all inspectors within two specific types of company: the left graph for inspectors from PC manufacturers and the right graph for automobile manufacturers.

There are fewer lines in each display than the number of inspectors from each type of company simply because several inspectors from each type of company were missing a rating on at least one of the nine attitude variables. The graphs show great variability in scores amongst inspectors within a company type, but there are some overall patterns evident.

For example, inspectors from automobile companies clearly and fairly uniformly rated mgmtcomm toward the low end of the scale, whereas the reverse was generally true for that variable for inspectors from PC manufacturers. Conversely, inspectors from automobile companies tend to rate acctrain and trainapp more toward the middle to high end of the scale, whereas the reverse is generally true for those variables for inspectors from PC manufacturers.

Perhaps the most creative types of multivariate displays are the so-called icon plots . SYSTAT and STATGRAPHICS offer an impressive array of different types of icon plots, including, amongst others, Chernoff’s faces, profile plots, histogram plots, star glyphs and sunray plots (Jacoby 1998 provides a detailed discussion of icon plots).

Icon plots generally use a specific visual construction to represent variables scores obtained by each individual within a sample or group. All icon plots are thus methods for displaying the response patterns for individual members of a sample, as long as those individuals are not missing any scores on the variables to be displayed (note that this is the same limitation as for radar plots and parallel coordinate displays). To illustrate icon plots, without generating too many icons to focus on, Figs. 5.15 , 5.16 , 5.17 and 5.18 present four different icon plots for QCI inspectors classified, using a new variable called BEST_WORST , as either the worst performers (= 1 where their accuracy scores were less than 70%) or the best performers (= 2 where their accuracy scores were 90% or greater).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig15_HTML.jpg

Chernoff’s faces icon plot comparing individual attitude ratings for best and worst performing inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig16_HTML.jpg

Profile plot comparing individual attitude ratings for best and worst performing inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig17_HTML.jpg

Histogram plot comparing individual attitude ratings for best and worst performing inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig18_HTML.jpg

Sunray plot comparing individual attitude ratings for best and worst performing inspectors

The Chernoff’s faces plot gets its name from the visual icon used to represent variable scores – a cartoon-type face. This icon tries to capitalise on our natural human ability to recognise and differentiate faces. Each feature of the face is controlled by the scores on a single variable. In SYSTAT, up to 20 facial features are controllable; the first five being curvature of mouth, angle of brow, width of nose, length of nose and length of mouth (SYSTAT Software Inc., 2009 , p. 259). The theory behind Chernoff’s faces is that similar patterns of variable scores will produce similar looking faces, thereby making similarities and differences between individuals more apparent.

The profile plot and histogram plot are actually two variants of the same type of icon plot. A profile plot represents individuals’ scores for a set of variables using simplified line graphs, one per individual. The profile is scaled so that the vertical height of the peaks and valleys correspond to actual values for variables where the variables anchor the X-axis in a fashion similar to the parallel coordinate display. So, as you examine a profile from left to right across the X-axis of each graph, you are looking across the set of variables. A histogram plot represents the same information in the same way as for the profile plot but using histogram bars instead.

Figure 5.15 , produced using SYSTAT, shows a Chernoff’s faces plot for the best and worst performing inspectors using their ratings of job satisfaction, working conditions and the nine general attitude statements.

Each face is labelled with the inspector number it represents. The gaps indicate where an inspector had missing data on at least one of the variables, meaning a face could not be generated for them. The worst performers are drawn using red lines; the best using blue lines. The first variable is jobsat and this variable controls mouth curvature; the second variable is workcond and this controls angle of brow, and so on. It seems clear that there are differences in the faces between the best and worst performers with, for example, best performers tending to be more satisfied (smiling) and with higher ratings for working conditions (brow angle).

Beyond a broad visual impression, there is little in terms of precise inferences you can draw from a Chernoff’s faces plot. It really provides a visual sketch, nothing more. The fact that there is no obvious link between facial features, variables and score levels means that the Chernoff’s faces icon plot is difficult to interpret at the level of individual variables – a holistic impression of similarity and difference is what this type of plot facilitates.

Figure 5.16 produced using SYSTAT, shows a profile plot for the best and worst performing inspectors using their ratings of job satisfaction, working conditions and the nine attitude variables.

Like the Chernoff’s faces plot (Fig. 5.15 ), as you read across the rows of the plot from left to right, each plot corresponds respectively to a inspector in the sample who was either in the worst performer (red) or best performer (blue) category. The first attitude variable is jobsat and anchors the left end of each line graph; the last variable is polsatis and anchors the right end of the line graph. The remaining variables are represented in order from left to right across the X-axis of each graph. Figure 5.16 shows that these inspectors are rather different in their attitude profiles, with best performers tending to show taller profiles on the first two variables, for example.

Figure 5.17 produced using SYSTAT, shows a histogram plot for the best and worst performing inspectors based on their ratings of job satisfaction, working conditions and the nine attitude variables. This plot tells the same story as the profile plot, only using histogram bars. Some people would prefer the histogram icon plot to the profile plot because each histogram bar corresponds to one variable, making the visual linking of a specific bar to a specific variable much easier than visually linking a specific position along the profile line to a specific variable.

The sunray plot is actually a simplified adaptation of the radar plot (called a “star glyph”) used to represent scores on a set of variables for each individual within a sample or group. Remember that a radar plot basically arranges the variables around a central point like a clock face; the first variable is represented at the 12 o’clock position and the remaining variables follow around the plot in a clockwise direction.

Unlike a radar plot, while the spokes (the actual ‘star’ of the glyph’s name) of the plot are visible, no interpretive scale is evident. A variable’s score is visually represented by its distance from the central point. Thus, the star glyphs in a sunray plot are designed, like Chernoff’s faces, to provide a general visual impression, based on icon shape. A wide diameter well-rounded plot indicates an individual with high scores on all variables and a small diameter well-rounded plot vice-versa. Jagged plots represent individuals with highly variable scores across the variables. ‘Stars’ of similar size, shape and orientation represent similar individuals.

Figure 5.18 , produced using STATGRAPHICS, shows a sunray plot for the best and worst performing inspectors. An interpretation glyph is also shown in the lower right corner of Fig. 5.18 , where variables are aligned with the spokes of a star (e.g. jobsat is at the 12 o’clock position). This sunray plot could lead you to form the visual impression that the worst performing inspectors (group 1) have rather less rounded rating profiles than do the best performing inspectors (group 2) and that the jobsat and workcond spokes are generally lower for the worst performing inspectors.

Comparatively speaking, the sunray plot makes identifying similar individuals a bit easier (perhaps even easier than Chernoff’s faces) and, when ordered as STATGRAPHICS showed in Fig. 5.18 , permits easier visual comparisons between groups of individuals, but at the expense of precise knowledge about variable scores. Remember, a holistic impression is the goal pursued using a sunray plot.

Multivariate graphical methods provide summary techniques for visually presenting certain characteristics of a complex array of data on variables. Such visual representations are generally better at helping us to form holistic impressions of multivariate data rather than any sort of tabular representation or numerical index. They also allow us to compress many numerical measures into a finite representation that is generally easy to understand. Multivariate graphical displays can add interest to an otherwise dry statistical reporting of numerical data. They are designed to appeal to our pattern recognition skills, focusing our attention on features of the data such as shape, level, variability and orientation. Some multivariate graphs (e.g. radar plots, sunray plots and multiplots) are useful not only for representing score patterns for individuals but also providing summaries of score patterns across groups of individuals.

Multivariate graphs tend to get very busy-looking and are hard to interpret if a great many variables or a large number of individuals need to be displayed (imagine any of the icon plots, for a sample of 200 questionnaire participants, displayed on a A4 page – each icon would be so small that its features could not be easily distinguished, thereby defeating the purpose of the display). In such cases, using numerical summary statistics (such as averages or correlations) in tabular form alone will provide a more economical and efficient summary. Also, some multivariate displays will work better for conveying certain types of information than others.

Information about variable relationships may be better displayed using a scatterplot matrix. Information about individual similarities and difference on a set of variables may be better conveyed using a histogram or sunray plot. Multiplots may be better suited to displaying information about group differences across a set of variables. Information about the overall similarity of individual entities in a sample might best be displayed using Chernoff’s faces.

Because people differ greatly in their visual capacities and preferences, certain types of multivariate displays will work for some people and not others. Sometimes, people will not see what you see in the plots. Some plots, such as Chernoff’s faces, may not strike a reader as a serious statistical procedure and this could adversely influence how convinced they will be by the story the plot conveys. None of the multivariate displays described here provide sufficiently precise information for solid inferences or interpretations; all are designed to simply facilitate the formation of holistic visual impressions. In fact, you may have noticed that some displays (scatterplot matrices and the icon plots, for example) provide no numerical scaling information that would help make precise interpretations. If precision in summary information is desired, the types of multivariate displays discussed here would not be the best strategic choices.

Virtually any research design which produces quantitative data/statistics for multiple variables provides opportunities for multivariate graphical data display which may help to clarify or illustrate important data characteristics or relationships. Thus, for survey research involving many identically-scaled attitudinal questions, a multivariate display may be just the device needed to communicate something about patterns in the data. Multivariate graphical displays are simply specialised communication tools designed to compress a lot of information into a meaningful and efficient format for interpretation—which tool to choose depends upon the message to be conveyed.

Generally speaking, visual representations of multivariate data could prove more useful in communicating to lay persons who are unfamiliar with statistics or who prefer visual as opposed to numerical information. However, these displays would probably require some interpretive discussion so that the reader clearly understands their intent.

Procedure 5.4: Assessing Central Tendency

The three most commonly reported measures of central tendency are the mean, median and mode. Each measure reflects a specific way of defining central tendency in a distribution of scores on a variable and each has its own advantages and disadvantages.

The mean is the most widely used measure of central tendency (also called the arithmetic average). Very simply, a mean is the sum of all the scores for a specific variable in a sample divided by the number of scores used in obtaining the sum. The resulting number reflects the average score for the sample of individuals on which the scores were obtained. If one were asked to predict the score that any single individual in the sample would obtain, the best prediction, in the absence of any other relevant information, would be the sample mean. Many parametric statistical methods (such as Procedures 10.1007/978-981-15-2537-7_7#Sec22 , 10.1007/978-981-15-2537-7_7#Sec32 , 10.1007/978-981-15-2537-7_7#Sec42 and 10.1007/978-981-15-2537-7_7#Sec68) deal with sample means in one way or another. For any sample of data, there is one and only one possible value for the mean in a specific distribution. For most purposes, the mean is the preferred measure of central tendency because it utilises all the available information in a sample.

In the context of the QCI database, Maree could quite reasonably ask what inspectors scored on the average in terms of mental ability ( mentabil ), inspection accuracy ( accuracy ), inspection speed ( speed ), overall job satisfaction ( jobsat ), and perceived quality of their working conditions ( workcond ). Table 5.3 shows the mean scores for the sample of 112 quality control inspectors on each of these variables. The statistics shown in Table 5.3 were computed using the SPSS Frequencies ... procedure. Notice that the table indicates how many of the 112 inspectors had a valid score for each variable and how many were missing a score (e.g. 109 inspectors provided a valid rating for jobsat; 3 inspectors did not).

Measures of central tendency for specific QCI variables

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab3_HTML.jpg

Each mean needs to be interpreted in terms of the original units of measurement for each variable. Thus, the inspectors in the sample showed an average mental ability score of 109.84 (higher than the general population mean of 100 for the test), an average inspection accuracy of 82.14%, and an average speed for making quality control decisions of 4.48 s. Furthermore, in terms of their work context, inspectors reported an average overall job satisfaction of 4.96 (on the 7-point scale, or a level of satisfaction nearly one full scale point above the Neutral point of 4—indicating a generally positive but not strong level of job satisfaction, and an average perceived quality of work conditions of 4.21 (on the 7-point scale which is just about at the level of Stressful but Tolerable.

The mean is sensitive to the presence of extreme values, which can distort its value, giving a biased indication of central tendency. As we will see below, the median is an alternative statistic to use in such circumstances. However, it is also possible to compute what is called a trimmed mean where the mean is calculated after a certain percentage (say, 5% or 10%) of the lowest and highest scores in a distribution have been ignored (a process called ‘trimming’; see, for example, the discussion in Field 2018 , pp. 262–264). This yields a statistic less influenced by extreme scores. The drawbacks are that the decision as to what percentage to trim can be somewhat subjective and trimming necessarily sacrifices information (i.e. the extreme scores) in order to achieve a less biased measure. Some software packages, such as SPSS, SYSTAT or NCSS, can report a specific percentage trimmed mean, if that option is selected for descriptive statistics or exploratory data analysis (see Procedure 5.6 ) procedures. Comparing the original mean with a trimmed mean can provide an indication of the degree to which the original mean has been biased by extreme values.

Very simply, the median is the centre or middle score of a set of scores. By ‘centre’ or ‘middle’ is meant that 50% of the data values are smaller than or equal to the median and 50% of the data values are larger when the entire distribution of scores is rank ordered from the lowest to highest value. Thus, we can say that the median is that score in the sample which occurs at the 50th percentile. [Note that a ‘percentile’ is attached to a specific score that a specific percentage of the sample scored at or below. Thus, a score at the 25th percentile means that 25% of the sample achieved this score or a lower score.] Table 5.3 shows the 25th, 50th and 75th percentile scores for each variable – note how the 50th percentile score is exactly equal to the median in each case .

The median is reported somewhat less frequently than the mean but does have some advantages over the mean in certain circumstances. One such circumstance is when the sample of data has a few extreme values in one direction (either very large or very small relative to all other scores). In this case, the mean would be influenced (biased) to a much greater degree than would the median since all of the data are used to calculate the mean (including the extreme scores) whereas only the single centre score is needed for the median. For this reason, many nonparametric statistical procedures (such as Procedures 10.1007/978-981-15-2537-7_7#Sec27 , 10.1007/978-981-15-2537-7_7#Sec37 and 10.1007/978-981-15-2537-7_7#Sec63) focus on the median as the comparison statistic rather than on the mean.

A discrepancy between the values for the mean and median of a variable provides some insight to the degree to which the mean is being influenced by the presence of extreme data values. In a distribution where there are no extreme values on either side of the distribution (or where extreme values balance each other out on either side of the distribution, as happens in a normal distribution – see Fundamental Concept II ), the mean and the median will coincide at the same value and the mean will not be biased.

For highly skewed distributions, however, the value of the mean will be pulled toward the long tail of the distribution because that is where the extreme values lie. However, in such skewed distributions, the median will be insensitive (statisticians call this property ‘robustness’) to extreme values in the long tail. For this reason, the direction of the discrepancy between the mean and median can give a very rough indication of the direction of skew in a distribution (‘mean larger than median’ signals possible positive skewness; ‘mean smaller than median’ signals possible negative skewness). Like the mean, there is one and only one possible value for the median in a specific distribution.

In Fig. 5.19 , the left graph shows the distribution of speed scores and the right-hand graph shows the distribution of accuracy scores. The speed distribution clearly shows the mean being pulled toward the right tail of the distribution whereas the accuracy distribution shows the mean being just slightly pulled toward the left tail. The effect on the mean is stronger in the speed distribution indicating a greater biasing effect due to some very long inspection decision times.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig19_HTML.jpg

Effects of skewness in a distribution on the values for the mean and median

If we refer to Table 5.3 , we can see that the median score for each of the five variables has also been computed. Like the mean, the median must be interpreted in the original units of measurement for the variable. We can see that for mentabil , accuracy , and workcond , the value of the median is very close to the value of the mean, suggesting that these distributions are not strongly influenced by extreme data values in either the high or low direction. However, note that the median speed was 3.89 s compared to the mean of 4.48 s, suggesting that the distribution of speed scores is positively skewed (the mean is larger than the median—refer to Fig. 5.19 ). Conversely, the median jobsat score was 5.00 whereas the mean score was 4.96 suggesting very little substantive skewness in the distribution (mean and median are nearly equal).

The mode is the simplest measure of central tendency. It is defined as the most frequently occurring score in a distribution. Put another way, it is the score that more individuals in the sample obtain than any other score. An interesting problem associated with the mode is that there may be more than one in a specific distribution. In the case where multiple modes exist, the issue becomes which value do you report? The answer is that you must report all of them. In a ‘normal’ bell-shaped distribution, there is only one mode and it is indeed at the centre of the distribution, coinciding with both the mean and the median.

Table 5.3 also shows the mode for each of the five variables. For example, more inspectors achieved a mentabil score of 111 more often than any other score and inspectors reported a jobsat rating of 6 more often than any other rating. SPSS only ever reports one mode even if several are present, so one must be careful and look at a histogram plot for each variable to make a final determination of the mode(s) for that variable.

All three measures of central tendency yield information about what is going on in the centre of a distribution of scores. The mean and median provide a single number which can summarise the central tendency in the entire distribution. The mode can yield one or multiple indices. With many measurements on individuals in a sample, it is advantageous to have single number indices which can describe the distributions in summary fashion. In a normal or near-normal distribution of sample data, the mean, the median, and the mode will all generally coincide at the one point. In this instance, all three statistics will provide approximately the same indication of central tendency. Note however that it is seldom the case that all three statistics would yield exactly the same number for any particular distribution. The mean is the most useful statistic, unless the data distribution is skewed by extreme scores, in which case the median should be reported.

While measures of central tendency are useful descriptors of distributions, summarising data using a single numerical index necessarily reduces the amount of information available about the sample. Not only do we need to know what is going on in the centre of a distribution, we also need to know what is going on around the centre of the distribution. For this reason, most social and behavioural researchers report not only measures of central tendency, but also measures of variability (see Procedure 5.5 ). The mode is the least informative of the three statistics because of its potential for producing multiple values.

Measures of central tendency are useful in almost any type of experimental design, survey or interview study, and in any observational studies where quantitative data are available and must be summarised. The decision as to whether the mean or median should be reported depends upon the nature of the data which should ideally be ascertained by visual inspection of the data distribution. Some researchers opt to report both measures routinely. Computation of means is a prelude to many parametric statistical methods (see, for example, Procedure 10.1007/978-981-15-2537-7_7#Sec22 , 10.1007/978-981-15-2537-7_7#Sec32 , 10.1007/978-981-15-2537-7_7#Sec42 , 10.1007/978-981-15-2537-7_7#Sec52 , 10.1007/978-981-15-2537-7_7#Sec68 , 10.1007/978-981-15-2537-7_7#Sec76 and 10.1007/978-981-15-2537-7_7#Sec105); comparison of medians is associated with many nonparametric statistical methods (see, for example, Procedure 10.1007/978-981-15-2537-7_7#Sec27 , 10.1007/978-981-15-2537-7_7#Sec37 , 10.1007/978-981-15-2537-7_7#Sec63 and 10.1007/978-981-15-2537-7_7#Sec81).

Procedure 5.5: Assessing Variability

There are a variety of measures of variability to choose from including the range, interquartile range, variance and standard deviation. Each measure reflects a specific way of defining variability in a distribution of scores on a variable and each has its own advantages and disadvantages. Most measures of variability are associated with a specific measure of central tendency so that researchers are now commonly expected to report both a measure of central tendency and its associated measure of variability whenever they display numerical descriptive statistics on continuous or ranked-ordered variables.

This is the simplest measure of variability for a sample of data scores. The range is merely the largest score in the sample minus the smallest score in the sample. The range is the one measure of variability not explicitly associated with any measure of central tendency. It gives a very rough indication as to the extent of spread in the scores. However, since the range uses only two of the total available scores in the sample, the rest of the scores are ignored, which means that a lot of potentially useful information is being sacrificed. There are also problems if either the highest or lowest (or both) scores are atypical or too extreme in their value (as in highly skewed distributions). When this happens, the range gives a very inflated picture of the typical variability in the scores. Thus, the range tends not be a frequently reported measure of variability.

Table 5.4 shows a set of descriptive statistics, produced by the SPSS Frequencies procedure, for the mentabil, accuracy, speed, jobsat and workcond measures in the QCI database. In the table, you will find three rows labelled ‘Range’, ‘Minimum’ and ‘Maximum’.

Measures of central tendency and variability for specific QCI variables

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab4_HTML.jpg

Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s – the fastest quality decision to 17.10 – the slowest quality decision). Accuracy scores had a range of 43 (from 57% – the least accurate inspector to 100% – the most accurate inspector). Both work context measures ( jobsat and workcond ) exhibited a range of 6 – the largest possible range given the 1 to 7 scale of measurement for these two variables.

Interquartile Range

The Interquartile Range ( IQR ) is a measure of variability that is specifically designed to be used in conjunction with the median. The IQR also takes care of the extreme data problem which typically plagues the range measure. The IQR is defined as the range that is covered by the middle 50% of scores in a distribution once the scores have been ranked in order from lowest value to highest value. It is found by locating the value in the distribution at or below which 25% of the sample scored and subtracting this number from the value in the distribution at or below which 75% of the sample scored. The IQR can also be thought of as the range one would compute after the bottom 25% of scores and the top 25% of scores in the distribution have been ‘chopped off’ (or ‘trimmed’ as statisticians call it).

The IQR gives a much more stable picture of the variability of scores and, like the median, is relatively insensitive to the biasing effects of extreme data values. Some behavioural researchers prefer to divide the IQR in half which gives a measure called the Semi-Interquartile Range ( S-IQR ) . The S-IQR can be interpreted as the distance one must travel away from the median, in either direction, to reach the value which separates the top (or bottom) 25% of scores in the distribution from the remaining 75%.

The IQR or S-IQR is typically not produced by descriptive statistics procedures by default in many computer software packages; however, it can usually be requested as an optional statistic to report or it can easily be computed by hand using percentile scores. Both the median and the IQR figure prominently in Exploratory Data Analysis, particularly in the production of boxplots (see Procedure 5.6 ).

Figure 5.20 illustrates the conceptual nature of the IQR and S-IQR compared to that of the range. Assume that 100% of data values are covered by the distribution curve in the figure. It is clear that these three measures would provide very different values for a measure of variability. Your choice would depend on your purpose. If you simply want to signal the overall span of scores between the minimum and maximum, the range is the measure of choice. But if you want to signal the variability around the median, the IQR or S-IQR would be the measure of choice.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig20_HTML.jpg

How the range, IQR and S-IQR measures of variability conceptually differ

Note: Some behavioural researchers refer to the IQR as the hinge-spread (or H-spread ) because of its use in the production of boxplots:

  • the 25th percentile data value is referred to as the ‘lower hinge’;
  • the 75th percentile data value is referred to as the ‘upper hinge’; and
  • their difference gives the H-spread.

Midspread is another term you may see used as a synonym for interquartile range.

Referring back to Table 5.4 , we can find statistics reported for the median and for the ‘quartiles’ (25th, 50th and 75th percentile scores) for each of the five variables of interest. The ‘quartile’ values are useful for finding the IQR or S-IQR because SPSS does not report these measures directly. The median clearly equals the 50th percentile data value in the table.

If we focus, for example, on the speed variable, we could find its IQR by subtracting the 25th percentile score of 2.19 s from the 75th percentile score of 5.71 s to give a value for the IQR of 3.52 s (the S-IQR would simply be 3.52 divided by 2 or 1.76 s). Thus, we could report that the median decision speed for inspectors was 3.89 s and that the middle 50% of inspectors showed scores spanning a range of 3.52 s. Alternatively, we could report that the median decision speed for inspectors was 3.89 s and that the middle 50% of inspectors showed scores which ranged 1.76 s either side of the median value.

Note: We could compare the ‘Minimum’ or ‘Maximum’ scores to the 25th percentile score and 75th percentile score respectively to get a feeling for whether the minimum or maximum might be considered extreme or uncharacteristic data values.

The variance uses information from every individual in the sample to assess the variability of scores relative to the sample mean. Variance assesses the average squared deviation of each score from the mean of the sample. Deviation refers to the difference between an observed score value and the mean of the sample—they are squared simply because adding them up in their naturally occurring unsquared form (where some differences are positive and others are negative) always gives a total of zero, which is useless for an index purporting to measure something.

If many scores are quite different from the mean, we would expect the variance to be large. If all the scores lie fairly close to the sample mean, we would expect a small variance. If all scores exactly equal the mean (i.e. all the scores in the sample have the same value), then we would expect the variance to be zero.

Figure 5.21 illustrates some possibilities regarding variance of a distribution of scores having a mean of 100. The very tall curve illustrates a distribution with small variance. The distribution of medium height illustrates a distribution with medium variance and the flattest distribution ia a distribution with large variance.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig21_HTML.jpg

The concept of variance

If we had a distribution with no variance, the curve would simply be a vertical line at a score of 100 (meaning that all scores were equal to the mean). You can see that as variance increases, the tails of the distribution extend further outward and the concentration of scores around the mean decreases. You may have noticed that variance and range (as well as the IQR) will be related, since the range focuses on the difference between the ends of the two tails in the distribution and larger variances extend the tails. So, a larger variance will generally be associated with a larger range and IQR compared to a smaller variance.

It is generally difficult to descriptively interpret the variance measure in a meaningful fashion since it involves squared deviations around the sample mean. [Note: If you look back at Table 5.4 , you will see the variance listed for each of the variables (e.g. the variance of accuracy scores is 84.118), but the numbers themselves make little sense and do not relate to the original measurement scale for the variables (which, for the accuracy variable, went from 0% to 100% accuracy).] Instead, we use the variance as a steppingstone for obtaining a measure of variability that we can clearly interpret, namely the standard deviation . However, you should know that variance is an important concept in its own right simply because it provides the statistical foundation for many of the correlational procedures and statistical inference procedures described in Chaps. 10.1007/978-981-15-2537-7_6 , 10.1007/978-981-15-2537-7_7 and 10.1007/978-981-15-2537-7_8.

When considering either correlations or tests of statistical hypotheses, we frequently speak of one variable explaining or sharing variance with another (see Procedure 10.1007/978-981-15-2537-7_6#Sec27 and 10.1007/978-981-15-2537-7_7#Sec47 ). In doing so, we are invoking the concept of variance as set out here—what we are saying is that variability in the behaviour of scores on one particular variable may be associated with or predictive of variability in scores on another variable of interest (e.g. it could explain why those scores have a non-zero variance).

Standard Deviation

The standard deviation (often abbreviated as SD, sd or Std. Dev.) is the most commonly reported measure of variability because it has a meaningful interpretation and is used in conjunction with reports of sample means. Variance and standard deviation are closely related measures in that the standard deviation is found by taking the square root of the variance. The standard deviation, very simply, is a summary number that reflects the ‘average distance of each score from the mean of the sample’. In many parametric statistical methods, both the sample mean and sample standard deviation are employed in some form. Thus, the standard deviation is a very important measure, not only for data description, but also for hypothesis testing and the establishment of relationships as well.

Referring again back to Table 5.4 , we’ll focus on the results for the speed variable for discussion purposes. Table 5.4 shows that the mean inspection speed for the QCI sample was 4.48 s. We can also see that the standard deviation (in the row labelled ‘Std Deviation’) for speed was 2.89 s.

This standard deviation has a straightforward interpretation: we would say that ‘on the average, an inspector’s quality inspection decision speed differed from the mean of the sample by about 2.89 s in either direction’. In a normal distribution of scores (see Fundamental Concept II ), we would expect to see about 68% of all inspectors having decision speeds between 1.59 s (the mean minus one amount of the standard deviation) and 7.37 s (the mean plus one amount of the standard deviation).

We noted earlier that the range of the speed scores was 16.05 s. However, the fact that the maximum speed score was 17.1 s compared to the 75th percentile score of just 5.71 s seems to suggest that this maximum speed might be rather atypically large compared to the bulk of speed scores. This means that the range is likely to be giving us a false impression of the overall variability of the inspectors’ decision speeds.

Furthermore, given that the mean speed score was higher than the median speed score, suggesting that speed scores were positively skewed (this was confirmed by the histogram for speed shown in Fig. 5.19 in Procedure 5.4 ), we might consider emphasising the median and its associated IQR or S-IQR rather than the mean and standard deviation. Of course, similar diagnostic and interpretive work could be done for each of the other four variables in Table 5.4 .

Measures of variability (particularly the standard deviation) provide a summary measure that gives an indication of how variable (spread out) a particular sample of scores is. When used in conjunction with a relevant measure of central tendency (particularly the mean), a reasonable yet economical description of a set of data emerges. When there are extreme data values or severe skewness is present in the data, the IQR (or S-IQR) becomes the preferred measure of variability to be reported in conjunction with the sample median (or 50th percentile value). These latter measures are much more resistant (‘robust’) to influence by data anomalies than are the mean and standard deviation.

As mentioned above, the range is a very cursory index of variability, thus, it is not as useful as variance or standard deviation. Variance has little meaningful interpretation as a descriptive index; hence, standard deviation is most often reported. However, the standard deviation (or IQR) has little meaning if the sample mean (or median) is not reported along with it.

Knowing that the standard deviation for accuracy is 9.17 tells you little unless you know the mean accuracy (82.14) that it is the standard deviation from.

Like the sample mean, the standard deviation can be strongly biased by the presence of extreme data values or severe skewness in a distribution in which case the median and IQR (or S-IQR) become the preferred measures. The biasing effect will be most noticeable in samples which are small in size (say, less than 30 individuals) and far less noticeable in large samples (say, in excess of 200 or 300 individuals). [Note that, in a manner similar to a trimmed mean, it is possible to compute a trimmed standard deviation to reduce the biasing effect of extreme data values, see Field 2018 , p. 263.]

It is important to realise that the resistance of the median and IQR (or S-IQR) to extreme values is only gained by deliberately sacrificing a good deal of the information available in the sample (nothing is obtained without a cost in statistics). What is sacrificed is information from all other members of the sample other than those members who scored at the median and 25th and 75th percentile points on a variable of interest; information from all members of the sample would automatically be incorporated in mean and standard deviation for that variable.

Any investigation where you might report on or read about measures of central tendency on certain variables should also report measures of variability. This is particularly true for data from experiments, quasi-experiments, observational studies and questionnaires. It is important to consider measures of central tendency and measures of variability to be inextricably linked—one should never report one without the other if an adequate descriptive summary of a variable is to be communicated.

Other descriptive measures, such as those for skewness and kurtosis 1 may also be of interest if a more complete description of any variable is desired. Most good statistical packages can be instructed to report these additional descriptive measures as well.

Of all the statistics you are likely to encounter in the business, behavioural and social science research literature, means and standard deviations will dominate as measures for describing data. Additionally, these statistics will usually be reported when any parametric tests of statistical hypotheses are presented as the mean and standard deviation provide an appropriate basis for summarising and evaluating group differences.

Fundamental Concept I: Basic Concepts in Probability

The concept of simple probability.

In Procedures 5.1 and 5.2 , you encountered the idea of the frequency of occurrence of specific events such as particular scores within a sample distribution. Furthermore, it is a simple operation to convert the frequency of occurrence of a specific event into a number representing the relative frequency of that event. The relative frequency of an observed event is merely the number of times the event is observed divided by the total number of times one makes an observation. The resulting number ranges between 0 and 1 but we typically re-express this number as a percentage by multiplying it by 100%.

In the QCI database, Maree Lakota observed data from 112 quality control inspectors of which 58 were male and 51 were female (gender indications were missing for three inspectors). The statistics 58 and 51 are thus the frequencies of occurrence for two specific types of research participant, a male inspector or a female inspector.

If she divided each frequency by the total number of observations (i.e. 112), whe would obtain .52 for males and .46 for females (leaving .02 of observations with unknown gender). These statistics are relative frequencies which indicate the proportion of times that Maree obtained data from a male or female inspector. Multiplying each relative frequency by 100% would yield 52% and 46% which she could interpret as indicating that 52% of her sample was male and 46% was female (leaving 2% of the sample with unknown gender).

It does not take much of a leap in logic to move from the concept of ‘relative frequency’ to the concept of ‘probability’. In our discussion above, we focused on relative frequency as indicating the proportion or percentage of times a specific category of participant was obtained in a sample. The emphasis here is on data from a sample.

Imagine now that Maree had infinite resources and research time and was able to obtain ever larger samples of quality control inspectors for her study. She could still compute the relative frequencies for obtaining data from males and females in her sample but as her sample size grew larger and larger, she would notice these relative frequencies converging toward some fixed values.

If, by some miracle, Maree could observe all of the quality control inspectors on the planet today, she would have measured the entire population and her computations of relative frequency for males and females would yield two precise numbers, each indicating the proportion of the population of inspectors that was male and the proportion that was female.

If Maree were then to list all of these inspectors and randomly choose one from the list, the chances that she would choose a male inspector would be equal to the proportion of the population of inspectors that was male and this logic extends to choosing a female inspector. The number used to quantify this notion of ‘chances’ is called a probability. Maree would therefore have established the probability of randomly observing a male or a female inspector in the population on any specific occasion.

Probability is expressed on a 0.0 (the observation or event will certainly not be seen) to 1.0 (the observation or event will certainly be seen) scale where values close to 0.0 indicate observations that are less certain to be seen and values close to 1.0 indicate observations that are more certain to be seen (a value of .5 indicates an even chance that an observation or event will or will not be seen – a state of maximum uncertainty). Statisticians often interpret a probability as the likelihood of observing an event or type of individual in the population.

In the QCI database, we noted that the relative frequency of observing males was .52 and for females was .46. If we take these relative frequencies as estimates of the proportions of each gender in the population of inspectors, then .52 and .46 represent the probability of observing a male or female inspector, respectively.

Statisticians would state this as “the probability of observing a male quality control inspector is .52” or in a more commonly used shorthand code, the likelihood of observing a male quality control inspector is p = .52 (p for probability). For some, probabilities make more sense if they are converted to percentages (by multiplying by 100%). Thus, p = .52 can also understood as a 52% chance of observing a male quality control inspector.

We have seen that relative frequency is a sample statistic that can be used to estimate the population probability. Our estimate will get more precise as we use larger and larger samples (technically, as the size of our samples more closely approximates the size of our population). In most behavioural research, we never have access to entire populations so we must always estimate our probabilities.

In some very special populations, having a known number of fixed possible outcomes, such as results of coin tosses or rolls of a die, we can analytically establish event probabilities without doing an infinite number of observations; all we must do is assume that we have a fair coin or die. Thus, with a fair coin, the probability of observing a H or a T on any single coin toss is ½ or .5 or 50%; the probability of observing a 6 on any single throw of a die is 1/6 or .16667 or 16.667%. With behavioural data, though, we can never measure all possible behavioural outcomes, which thereby forces researchers to depend on samples of observations in order to make estimates of population values.

The concept of probability is central to much of what is done in the statistical analysis of behavioural data. Whenever a behavioural scientist wishes to establish whether a particular relationship exists between variables or whether two groups, treated differently, actually show different behaviours, he/she is playing a probability game. Given a sample of observations, the behavioural scientist must decide whether what he/she has observed is providing sufficient information to conclude something about the population from which the sample was drawn.

This decision always has a non-zero probability of being in error simply because in samples that are much smaller than the population, there is always the chance or probability that we are observing something rare and atypical instead of something which is indicative of a consistent population trend. Thus, the concept of probability forms the cornerstone for statistical inference about which we will have more to say later (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec6). Probability also plays an important role in helping us to understand theoretical statistical distributions (e.g. the normal distribution) and what they can tell us about our observations. We will explore this idea further in Fundamental Concept II .

The Concept of Conditional Probability

It is important to understand that the concept of probability as described above focuses upon the likelihood or chances of observing a specific event or type of observation for a specific variable relative to a population or sample of observations. However, many important behavioural research issues may focus on the question of the probability of observing a specific event given that the researcher has knowledge that some other event has occurred or been observed (this latter event is usually measured by a second variable). Here, the focus is on the potential relationship or link between two variables or two events.

With respect to the QCI database, Maree could ask the quite reasonable question “what is the probability (estimated in the QCI sample by a relative frequency) of observing an inspector being female given that she knows that an inspector works for a Large Business Computer manufacturer.

To address this question, all she needs to know is:

  • how many inspectors from Large Business Computer manufacturers are in the sample ( 22 ); and
  • how many of those inspectors were female ( 7 ) (inspectors who were missing a score for either company or gender have been ignored here).

If she divides 7 by 22, she would obtain the probability that an inspector is female given that they work for a Large Business Computer manufacturer – that is, p = .32 .

This type of question points to the important concept of conditional probability (‘conditional’ because we are asking “what is the probability of observing one event conditional upon our knowledge of some other event”).

Continuing with the previous example, Maree would say that the conditional probability of observing a female inspector working for a Large Business Computer manufacturer is .32 or, equivalently, a 32% chance. Compare this conditional probability of p  = .32 to the overall probability of observing a female inspector in the entire sample ( p  = .46 as shown above).

This means that there is evidence for a connection or relationship between gender and the type of company an inspector works for. That is, the chances are lower for observing a female inspector from a Large Business Computer manufacturer than they are for simply observing a female inspector at all.

Maree therefore has evidence suggesting that females may be relatively under-represented in Large Business Computer manufacturing companies compared to the overall population. Knowing something about the company an inspector works for therefore can help us make a better prediction about their likely gender.

Suppose, however, that Maree’s conditional probability had been exactly equal to p  = .46. This would mean that there was exactly the same chance of observing a female inspector working for a Large Business Computer manufacturer as there was of observing a female inspector in the general population. Here, knowing something about the company an inspector works doesn’t help Maree make any better prediction about their likely gender. This would mean that the two variables are statistically independent of each other.

A classic case of events that are statistically independent is two successive throws of a fair die: rolling a six on the first throw gives us no information for predicting how likely it will be that we would roll a six on the second throw. The conditional probability of observing a six on the second throw given that I have observed a six on the first throw is 0.16667 (= 1 divided by 6) which is the same as the simple probability of observing a six on any specific throw. This statistical independence also means that if we wanted to know what the probability of throwing two sixes on two successive throws of a fair die, we would just multiply the probabilities for each independent event (i.e., throw) together; that is, .16667 × .16667 = .02789 (this is known as the multiplication rule of probability, see, for example, Smithson 2000 , p. 114).

Finally, you should know that conditional probabilities are often asymmetric. This means that for many types of behavioural variables, reversing the conditional arrangement will change the story about the relationship. Bayesian statistics (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec73) relies heavily upon this asymmetric relationship between conditional probabilities.

Maree has already learned that the conditional probability that an inspector is female given that they worked for a Large Business Computer manufacturer is p = .32. She could easily turn the conditional relationship around and ask what is the conditional probability that an inspector works for a Large Business Computer manufacturer given that the inspector is female?

From the QCI database, she can find that 51 inspectors in her total sample were female and of those 51, 7 worked for a Large Business Computer manufacturer. If she divided 7 by 51, she would get p = .14 (did you notice that all that changed was the number she divided by?). Thus, there is only a 14% chance of observing an inspector working for a Large Business Computer manufacturer given that the inspector is female – a rather different probability from p = .32, which tells a different story.

As you will see in Procedures 10.1007/978-981-15-2537-7_6#Sec14 and 10.1007/978-981-15-2537-7_7#Sec17, conditional relationships between categorical variables are precisely what crosstabulation contingency tables are designed to reveal.

Procedure 5.6: Exploratory Data Analysis

There are a variety of visual display methods for EDA, including stem & leaf displays, boxplots and violin plots. Each method reflects a specific way of displaying features of a distribution of scores or measurements and, of course, each has its own advantages and disadvantages. In addition, EDA displays are surprisingly flexible and can combine features in various ways to enhance the story conveyed by the plot.

Stem & Leaf Displays

The stem & leaf display is a simple data summary technique which not only rank orders the data points in a sample but presents them visually so that the shape of the data distribution is reflected. Stem & leaf displays are formed from data scores by splitting each score into two parts: the first part of each score serving as the ‘stem’, the second part as the ‘leaf’ (e.g. for 2-digit data values, the ‘stem’ is the number in the tens position; the ‘leaf’ is the number in the ones position). Each stem is then listed vertically, in ascending order, followed horizontally by all the leaves in ascending order associated with it. The resulting display thus shows all of the scores in the sample, but reorganised so that a rough idea of the shape of the distribution emerges. As well, extreme scores can be easily identified in a stem & leaf display.

Consider the accuracy and speed scores for the 112 quality control inspectors in the QCI sample. Figure 5.22 (produced by the R Commander Stem-and-leaf display … procedure) shows the stem & leaf displays for inspection accuracy (left display) and speed (right display) data.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig22_HTML.jpg

Stem & leaf displays produced by R Commander

[The first six lines reflect information from R Commander about each display: lines 1 and 2 show the actual R command used to produce the plot (the variable name has been highlighted in bold); line 3 gives a warning indicating that inspectors with missing values (= NA in R ) on the variable have been omitted from the display; line 4 shows how the stems and leaves have been defined; line 5 indicates what a leaf unit represents in value; and line 6 indicates the total number (n) of inspectors included in the display).] In Fig. 5.22 , for the accuracy display on the left-hand side, the ‘stems’ have been split into ‘half-stems’—one (which is starred) associated with the ‘leaves’ 0 through 4 and the other associated with the ‘leaves’ 5 through 9—a strategy that gives the display better balance and visual appeal.

Notice how the left stem & leaf display conveys a fairly clear (yet sideways) picture of the shape of the distribution of accuracy scores. It has a rather symmetrical bell-shape to it with only a slight suggestion of negative skewness (toward the extreme score at the top). The right stem & leaf display clearly depicts the highly positively skewed nature of the distribution of speed scores. Importantly, we could reconstruct the entire sample of scores for each variable using its display, which means that unlike most other graphical procedures, we didn’t have to sacrifice any information to produce the visual summary.

Some programs, such as SYSTAT, embellish their stem & leaf displays by indicating in which stem or half-stem the ‘median’ (50th percentile), the ‘upper hinge score’ (75th percentile), and ‘lower hinge score’ (25th percentile) occur in the distribution (recall the discussion of interquartile range in Procedure 5.5 ). This is shown in Fig. 5.23 , produced by SYSTAT, where M and H indicate the stem locations for the median and hinge points, respectively. This stem & leaf display labels a single extreme accuracy score as an ‘outside value’ and clearly shows that this actual score was 57.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig23_HTML.jpg

Stem & leaf display, produced by SYSTAT, of the accuracy QCI variable

Another important EDA technique is the boxplot or, as it is sometimes known, the box-and-whisker plot . This plot provides a symbolic representation that preserves less of the original nature of the data (compared to a stem & leaf display) but typically gives a better picture of the distributional characteristics. The basic boxplot, shown in Fig. 5.24 , utilises information about the median (50th percentile score) and the upper (75th percentile score) and lower (25th percentile score) hinge points in the construction of the ‘box’ portion of the graph (the ‘median’ defines the centre line in the box; the ‘upper’ and ‘lower hinge values’ define the end boundaries of the box—thus the box encompasses the middle 50% of data values).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig24_HTML.jpg

Boxplots for the accuracy and speed QCI variables

Additionally, the boxplot utilises the IQR (recall Procedure 5.5 ) as a way of defining what are called ‘fences’ which are used to indicate score boundaries beyond which we would consider a score in a distribution to be an ‘outlier’ (or an extreme or unusual value). In SPSS, the inner fence is typically defined as 1.5 times the IQR in each direction and a ‘far’ outlier or extreme case is typically defined as 3 times the IQR in either direction (Field 2018 , p. 193). The ‘whiskers’ in a boxplot extend out to the data values which are closest to the upper and lower inner fences (in most cases, the vast majority of data values will be contained within the fences). Outliers beyond these ‘whiskers’ are then individually listed. ‘Near’ outliers are those lying just beyond the inner fences and ‘far’ outliers lie well beyond the inner fences.

Figure 5.24 shows two simple boxplots (produced using SPSS), one for the accuracy QCI variable and one for the speed QCI variable. The accuracy plot shows a median value of about 83, roughly 50% of the data fall between about 77 and 89 and there is one outlier, inspector 83, in the lower ‘tail’ of the distribution. The accuracy boxplot illustrates data that are relatively symmetrically distributed without substantial skewness. Such data will tend to have their median in the middle of the box, whiskers of roughly equal length extending out from the box and few or no outliers.

The speed plot shows a median value of about 4 s, roughly 50% of the data fall between 2 s and 6 s and there are four outliers, inspectors 7, 62, 65 and 75 (although inspectors 65 and 75 fall at the same place and are rather difficult to read), all falling in the slow speed ‘tail’ of the distribution. Inspectors 65, 75 and 7 are shown as ‘near’ outliers (open circles) whereas inspector 62 is shown as a ‘far’ outlier (asterisk). The speed boxplot illustrates data which are asymmetrically distributed because of skewness in one direction. Such data may have their median offset from the middle of the box and/or whiskers of unequal length extending out from the box and outliers in the direction of the longer whisker. In the speed boxplot, the data are clearly positively skewed (the longer whisker and extreme values are in the slow speed ‘tail’).

Boxplots are very versatile representations in that side-by-side displays for sub-groups of data within a sample can permit easy visual comparisons of groups with respect to central tendency and variability. Boxplots can also be modified to incorporate information about error bands associated with the median producing what is called a ‘notched boxplot’. This helps in the visual detection of meaningful subgroup differences, where boxplot ‘notches’ don’t overlap.

Figure 5.25 (produced using NCSS), compares the distributions of accuracy and speed scores for QCI inspectors from the five types of companies, plotted side-by-side.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig25_HTML.jpg

Comparisons of the accuracy (regular boxplots) and speed (notched boxplots) QCI variables for different types of companies

Focus first on the left graph in Fig. 5.25 which plots the distribution of accuracy scores broken down by company using regular boxplots. This plot clearly shows the differing degree of skewness in each type of company (indicated by one or more outliers in one ‘tail’, whiskers which are not the same length and/or the median line being offset from the centre of a box), the differing variability of scores within each type of company (indicated by the overall length of each plot—box and whiskers), and the differing central tendency in each type of company (the median lines do not all fall at the same level of accuracy score). From the left graph in Fig. 5.25 , we could conclude that: inspection accuracy scores are most variable in PC and Large Electrical Appliance manufacturing companies and least variable in the Large Business Computer manufacturing companies; Large Business Computer and PC manufacturing companies have the highest median level of inspection accuracy; and inspection accuracy scores tend to be negatively skewed (many inspectors toward higher levels, relatively fewer who are poorer in inspection performance) in the Automotive manufacturing companies. One inspector, working for an Automotive manufacturing company, shows extremely poor inspection accuracy performance.

The right display compares types of companies in terms of their inspection speed scores, using’ notched’ boxplots. The notches define upper and lower error limits around each median. Aside from the very obvious positive skewness for speed scores (with a number of slow speed outliers) in every type of company (least so for Large Electrical Appliance manufacturing companies), the story conveyed by this comparison is that inspectors from Large Electrical Appliance and Automotive manufacturing companies have substantially faster median decision speeds compared to inspectors from Large Business Computer and PC manufacturing companies (i.e. their ‘notches’ do not overlap, in terms of speed scores, on the display).

Boxplots can also add interpretive value to other graphical display methods through the creation of hybrid displays. Such displays might combine a standard histogram with a boxplot along the X-axis to provide an enhanced picture of the data distribution as illustrated for the mentabil variable in Fig. 5.26 (produced using NCSS). This hybrid plot also employs a data ‘smoothing’ method called a density trace to outline an approximate overall shape for the data distribution. Any one graphical method would tell some of the story, but combined in the hybrid display, the story of a relatively symmetrical set of mentabil scores becomes quite visually compelling.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig26_HTML.jpg

A hybrid histogram-density-boxplot of the mentabil QCI variable

Violin Plots

Violin plots are a more recent and interesting EDA innovation, implemented in the NCSS software package (Hintze 2012 ). The violin plot gets its name from the rough shape that the plots tend to take on. Violin plots are another type of hybrid plot, this time combining density traces (mirror-imaged right and left so that the plots have a sense of symmetry and visual balance) with boxplot-type information (median, IQR and upper and lower inner ‘fences’, but not outliers). The goal of the violin plot is to provide a quick visual impression of the shape, central tendency and variability of a distribution (the length of the violin conveys a sense of the overall variability whereas the width of the violin conveys a sense of the frequency of scores occurring in a specific region).

Figure 5.27 (produced using NCSS), compares the distributions of speed scores for QCI inspectors across the five types of companies, plotted side-by-side. The violin plot conveys a similar story to the boxplot comparison for speed in the right graph of Fig. 5.25 . However, notice that with the violin plot, unlike with a boxplot, you also get a sense of distributions that have ‘clumps’ of scores in specific areas. Some violin plots, like that for Automobile manufacturing companies in Fig. 5.27 , have a shape suggesting a multi-modal distribution (recall Procedure 5.4 and the discussion of the fact that a distribution may have multiple modes). The violin plot in Fig. 5.27 has also been produced to show where the median (solid line) and mean (dashed line) would fall within each violin. This facilitates two interpretations: (1) a relative comparison of central tendency across the five companies and (2) relative degree of skewness in the distribution for each company (indicated by the separation of the two lines within a violin; skewness is particularly bad for the Large Business Computer manufacturing companies).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig27_HTML.jpg

Violin plot comparisons of the speed QCI variable for different types of companies

EDA methods (of which we have illustrated only a small subset; we have not reviewed dot density diagrams, for example) provide summary techniques for visually displaying certain characteristics of a set of data. The advantage of the EDA methods over more traditional graphing techniques such as those described in Procedure 5.2 is that as much of the original integrity of the data is maintained as possible while maximising the amount of summary information available about distributional characteristics.

Stem & leaf displays maintain the data in as close to their original form as possible whereas boxplots and violin plots provide more symbolic and flexible representations. EDA methods are best thought of as communication devices designed to facilitate quick visual impressions and they can add interest to any statistical story being conveyed about a sample of data. NCSS, SYSTAT, STATGRAPHICS and R Commander generally offer more options and flexibility in the generation of EDA displays than SPSS.

EDA methods tend to get cumbersome if a great many variables or groups need to be summarised. In such cases, using numerical summary statistics (such as means and standard deviations) will provide a more economical and efficient summary. Boxplots or violin plots are generally more space efficient summary techniques than stem & leaf displays.

Often, EDA techniques are used as data screening devices, which are typically not reported in actual write-ups of research (we will discuss data screening in more detail in Procedure 10.1007/978-981-15-2537-7_8#Sec11). This is a perfectly legitimate use for the methods although there is an argument for researchers to put these techniques to greater use in published literature.

Software packages may use different rules for constructing EDA plots which means that you might get rather different looking plots and different information from different programs (you saw some evidence of this in Figs. 5.22 and 5.23 ). It is important to understand what the programs are using as decision rules for locating fences and outliers so that you are clear on how best to interpret the resulting plot—such information is generally contained in the user’s guides or manuals for NCSS (Hintze 2012 ), SYSTAT (SYSTAT Inc. 2009a , b ), STATGRAPHICS (StatPoint Technologies Inc. 2010 ) and SPSS (Norušis 2012 ).

Virtually any research design which produces numerical measures (even to the extent of just counting the number of occurrences of several events) provides opportunities for employing EDA displays which may help to clarify data characteristics or relationships. One extremely important use of EDA methods is as data screening devices for detecting outliers and other data anomalies, such as non-normality and skewness, before proceeding to parametric statistical analyses. In some cases, EDA methods can help the researcher to decide whether parametric or nonparametric statistical tests would be best to apply to his or her data because critical data characteristics such as distributional shape and spread are directly reflected.

Procedure 5.7: Standard ( z ) Scores

In certain practical situations in behavioural research, it may be desirable to know where a specific individual’s score lies relative to all other scores in a distribution. A convenient measure is to observe how many standard deviations (see Procedure 5.5 ) above or below the sample mean a specific score lies. This measure is called a standard score or z -score . Very simply, any raw score can be converted to a z -score by subtracting the sample mean from the raw score and dividing that result by the sample’s standard deviation. z -scores can be positive or negative and their sign simply indicates whether the score lies above (+) or below (−) the mean in value. A z -score has a very simple interpretation: it measures the number of standard deviations above or below the sample mean a specific raw score lies.

In the QCI database, we have a sample mean for speed scores of 4.48 s, a standard deviation for speed scores of 2.89 s (recall Table 5.4 in Procedure 5.5 ). If we are interested in the z -score for Inspector 65’s raw speed score of 11.94 s, we would obtain a z -score of +2.58 using the method described above (subtract 4.48 from 11.94 and divide the result by 2.89). The interpretation of this number is that a raw decision speed score of 11.94 s lies about 2.9 standard deviations above the mean decision speed for the sample.

z -scores have some interesting properties. First, if one converts (statisticians would say ‘transforms’) every available raw score in a sample to z -scores, the mean of these z -scores will always be zero and the standard deviation of these z -scores will always be 1.0. These two facts about z -scores (mean = 0; standard deviation = 1) will be true no matter what sample you are dealing with and no matter what the original units of measurement are (e.g. seconds, percentages, number of widgets assembled, amount of preference for a product, attitude rating, amount of money spent). This is because transforming raw scores to z -scores automatically changes the measurement units from whatever they originally were to a new system of measurements expressed in standard deviation units.

Suppose Maree was interested in the performance statistics for the top 25% most accurate quality control inspectors in the sample. Given a sample size of 112, this would mean finding the top 28 inspectors in terms of their accuracy scores. Since Maree is interested in performance statistics, speed scores would also be of interest. Table 5.5 (generated using the SPSS Descriptives … procedure, listed using the Case Summaries … procedure and formatted for presentation using Excel) shows accuracy and speed scores for the top 28 inspectors in descending order of accuracy scores. The z -score transformation for each of these scores is also shown (last two columns) as are the type of company, education level and gender for each inspector.

Listing of the 28 (top 25%) most accurate QCI inspectors’ accuracy and speed scores as well as standard ( z ) score transformations for each score

There are three inspectors (8, 9 and 14) who scored maximum accuracy of 100%. Such accuracy converts to a z -score of +1.95. Thus 100% accuracy is 1.95 standard deviations above the sample’s mean accuracy level. Interestingly, all three inspectors worked for PC manufacturers and all three had only high school-level education. The least accurate inspector in the top 25% had a z -score for accuracy that was .75 standard deviations above the sample mean.

Interestingly, the top three inspectors in terms of accuracy had decision speeds that fell below the sample’s mean speed; inspector 8 was the fastest inspector of the three with a speed just over 1 standard deviation ( z  = −1.03) below the sample mean. The slowest inspector in the top 25% was inspector 75 (case #28 in the list) with a speed z -score of +2.62; i.e., he was over two and a half standard deviations slower in making inspection decisions relative to the sample’s mean speed.

The fact that z -scores always have a common measurement scale having a mean of 0 and a standard deviation of 1.0 leads to an interesting application of standard scores. Suppose we focus on inspector number 65 (case #8 in the list) in Table 5.5 . It might be of interest to compare this inspector’s quality control performance in terms of both his decision accuracy and decision speed. Such a comparison is impossible using raw scores since the inspector’s accuracy score and speed scores are different measures which have differing means and standard deviations expressed in fundamentally different units of measurement (percentages and seconds). However, if we are willing to assume that the score distributions for both variables are approximately the same shape and that both accuracy and speed are measured with about the same level of reliability or consistency (see Procedure 10.1007/978-981-15-2537-7_8#Sec1), we can compare the inspector’s two scores by first converting them to z -scores within their own respective distributions as shown in Table 5.5 .

Inspector 65 looks rather anomalous in that he demonstrated a relatively high level of accuracy (raw score = 94%; z  = +1.29) but took a very long time to make those accurate decisions (raw score = 11.94 s; z  = +2.58). Contrast this with inspector 106 (case #17 in the list) who demonstrated a similar level of accuracy (raw score = 92%; z  = +1.08) but took a much shorter time to make those accurate decisions (raw score = 1.70 s; z  = −.96). In terms of evaluating performance, from a company perspective, we might conclude that inspector 106 is performing at an overall higher level than inspector 65 because he can achieve a very high level of accuracy but much more quickly; accurate and fast is more cost effective and efficient than accurate and slow.

Note: We should be cautious here since we know from our previous explorations of the speed variable in Procedure 5.6 , that accuracy scores look fairly symmetrical and speed scores are positively skewed, so assuming that the two variables have the same distribution shape, so that z -score comparisons are permitted, would be problematic.

You might have noticed that as you scanned down the two columns of z -scores in Table 5.5 , there was a suggestion of a pattern between the signs attached to the respective z -scores for each person. There seems to be a very slight preponderance of pairs of z -scores where the signs are reversed (12 out of 22 pairs). This observation provides some very preliminary evidence to suggest that there may be a relationship between inspection accuracy and decision speed, namely that a more accurate decision tends to be associated with a faster decision speed. Of course, this pattern would be better verified using the entire sample rather than the top 25% of inspectors. However, you may find it interesting to learn that it is precisely this sort of suggestive evidence (about agreement or disagreement between z -score signs for pairs of variable scores throughout a sample) that is captured and summarised by a single statistical indicator called a ‘correlation coefficient’ (see Fundamental Concept 10.1007/978-981-15-2537-7_6#Sec1 and Procedure 10.1007/978-981-15-2537-7_6#Sec4).

z -scores are not the only type of standard score that is commonly used. Three other types of standard scores are: stanines (standard nines), IQ scores and T-scores (not to be confused with the t -test described in Procedure 10.1007/978-981-15-2537-7_7#Sec22). These other types of scores have the advantage of producing only positive integer scores rather than positive and negative decimal scores. This makes interpretation somewhat easier for certain applications. However, you should know that almost all other types of standard scores come from a specific transformation of z -scores. This is because once you have converted raw scores into z -scores, they can then be quite readily transformed into any other system of measurement by simply multiplying a person’s z -score by the new desired standard deviation for the measure and adding to that product the new desired mean for the measure.

T-scores are simply z-scores transformed to have a mean of 50.0 and a standard deviation of 10.0; IQ scores are simply z-scores transformed to have a mean of 100 and a standard deviation of 15 (or 16 in some systems). For more information, see Fundamental Concept II .

Standard scores are useful for representing the position of each raw score within a sample distribution relative to the mean of that distribution. The unit of measurement becomes the number of standard deviations a specific score is away from the sample mean. As such, z -scores can permit cautious comparisons across samples or across different variables having vastly differing means and standard deviations within the constraints of the comparison samples having similarly shaped distributions and roughly equivalent levels of measurement reliability. z -scores also form the basis for establishing the degree of correlation between two variables. Transforming raw scores into z -scores does not change the shape of a distribution or rank ordering of individuals within that distribution. For this reason, a z -score is referred to as a linear transformation of a raw score. Interestingly, z -scores provide an important foundational element for more complex analytical procedures such as factor analysis ( Procedure 10.1007/978-981-15-2537-7_6#Sec36), cluster analysis ( Procedure 10.1007/978-981-15-2537-7_6#Sec41) and multiple regression analysis (see, for example, Procedure 10.1007/978-981-15-2537-7_6#Sec27 and 10.1007/978-981-15-2537-7_7#Sec86).

While standard scores are useful indices, they are subject to restrictions if used to compare scores across samples or across different variables. The samples must have similar distribution shapes for the comparisons to be meaningful and the measures must have similar levels of reliability in each sample. The groups used to generate the z -scores should also be similar in composition (with respect to age, gender distribution, and so on). Because z -scores are not an intuitively meaningful way of presenting scores to lay-persons, many other types of standard score schemes have been devised to improve interpretability. However, most of these schemes produce scores that run a greater risk of facilitating lay-person misinterpretations simply because their connection with z -scores is hidden or because the resulting numbers ‘look’ like a more familiar type of score which people do intuitively understand.

It is extremely rare for a T-score to exceed 100 or go below 0 because this would mean that the raw score was in excess of 5 standard deviations away from the sample mean. This unfortunately means that T-scores are often misinterpreted as percentages because they typically range between 0 and 100 and therefore ‘look’ like percentages. However, T-scores are definitely not percentages.

Finally, a common misunderstanding of z -scores is that transforming raw scores into z -scores makes them follow a normal distribution (see Fundamental Concept II ). This is not the case. The distribution of z -scores will have exactly the same shape as that for the raw scores; if the raw scores are positively skewed, then the corresponding z -scores will also be positively skewed.

z -scores are particularly useful in evaluative studies where relative performance indices are of interest. Whenever you compute a correlation coefficient ( Procedure 10.1007/978-981-15-2537-7_6#Sec4), you are implicitly transforming the two variables involved into z -scores (which equates the variables in terms of mean and standard deviation), so that only the patterning in the relationship between the variables is represented. z -scores are also useful as a preliminary step to more advanced parametric statistical methods when variables differing in scale, range and/or measurement units must be equated for means and standard deviations prior to analysis.

Fundamental Concept II: The Normal Distribution

Arguably the most fundamental distribution used in the statistical analysis of quantitative data in the behavioural and social sciences is the normal distribution (also known as the Gaussian or bell-shaped distribution ). Many behavioural phenomena, if measured on a large enough sample of people, tend to produce ‘normally distributed’ variable scores. This includes most measures of ability, performance and productivity, personality characteristics and attitudes. The normal distribution is important because it is the one form of distribution that you must assume describes the scores of a variable in the population when parametric tests of statistical inference are undertaken. The standard normal distribution is defined as having a population mean of 0.0 and a population standard deviation of 1.0. The normal distribution is also important as a means of interpreting various types of scoring systems.

Figure 5.28 displays the standard normal distribution (mean = 0; standard deviation = 1.0) and shows that there is a clear link between z -scores and the normal distribution. Statisticians have analytically calculated the probability (also expressed as percentages or percentiles) that observations will fall above or below any specific z -score in the theoretical standard normal distribution. Thus, a z -score of +1.0 in the standard normal distribution will have 84.13% (equals a probability of .8413) of observations in the population falling at or below one standard deviation above the mean and 15.87% falling above that point. A z -score of −2.0 will have 2.28% of observations falling at that point or below and 97.72% of observations falling above that point. It is clear then that, in a standard normal distribution, z -scores have a direct relationship with percentiles .

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig28_HTML.jpg

The normal (bell-shaped or Gaussian) distribution

Figure 5.28 also shows how T-scores relate to the standard normal distribution and to z -scores. The mean T-score falls at 50 and each increment or decrement of 10 T-score units means a movement of another standard deviation away from this mean of 50. Thus, a T-score of 80 corresponds to a z -score of +3.0—a score 3 standard deviations higher than the mean of 50.

Of special interest to behavioural researchers are the values for z -scores in a standard normal distribution that encompass 90% of observations ( z  = ±1.645—isolating 5% of the distribution in each tail), 95% of observations ( z  = ±1.96—isolating 2.5% of the distribution in each tail), and 99% of observations ( z  = ±2.58—isolating 0.5% of the distribution in each tail).

Depending upon the degree of certainty required by the researcher, these bands describe regions outside of which one might define an observation as being atypical or as perhaps not belonging to a distribution being centred at a mean of 0.0. Most often, what is taken as atypical or rare in the standard normal distribution is a score at least two standard deviations away from the mean, in either direction. Why choose two standard deviations? Since in the standard normal distribution, only about 5% of observations will fall outside a band defined by z -scores of ±1.96 (rounded to 2 for simplicity), this equates to data values that are 2 standard deviations away from their mean. This can give us a defensible way to identify outliers or extreme values in a distribution.

Thinking ahead to what you will encounter in Chap. 10.1007/978-981-15-2537-7_7, this ‘banding’ logic can be extended into the world of statistics (like means and percentages) as opposed to just the world of observations. You will frequently hear researchers speak of some statistic estimating a specific value (a parameter ) in a population, plus or minus some other value.

A survey organisation might report political polling results in terms of a percentage and an error band, e.g. 59% of Australians indicated that they would vote Labour at the next federal election, plus or minus 2%.

Most commonly, this error band (±2%) is defined by possible values for the population parameter that are about two standard deviations (or two standard errors—a concept discussed further in Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec14) away from the reported or estimated statistical value. In effect, the researcher is saying that on 95% of the occasions he/she would theoretically conduct his/her study, the population value estimated by the statistic being reported would fall between the limits imposed by the endpoints of the error band (the official name for this error band is a confidence interval ; see Procedure 10.1007/978-981-15-2537-7_8#Sec18). The well-understood mathematical properties of the standard normal distribution are what make such precise statements about levels of error in statistical estimates possible.

Checking for Normality

It is important to understand that transforming the raw scores for a variable to z -scores (recall Procedure 5.7 ) does not produce z -scores which follow a normal distribution; rather they will have the same distributional shape as the original scores. However, if you are willing to assume that the normal distribution is the correct reference distribution in the population, then you are justified is interpreting z -scores in light of the known characteristics of the normal distribution.

In order to justify this assumption, not only to enhance the interpretability of z -scores but more generally to enhance the integrity of parametric statistical analyses, it is helpful to actually look at the sample frequency distributions for variables (using a histogram (illustrated in Procedure 5.2 ) or a boxplot (illustrated in Procedure 5.6 ), for example), since non-normality can often be visually detected. It is important to note that in the social and behavioural sciences as well as in economics and finance, certain variables tend to be non-normal by their very nature. This includes variables that measure time taken to complete a task, achieve a goal or make decisions and variables that measure, for example, income, occurrence of rare or extreme events or organisational size. Such variables tend to be positively skewed in the population, a pattern that can often be confirmed by graphing the distribution.

If you cannot justify an assumption of ‘normality’, you may be able to force the data to be normally distributed by using what is called a ‘normalising transformation’. Such transformations will usually involve a nonlinear mathematical conversion (such as computing the logarithm, square root or reciprocal) of the raw scores. Such transformations will force the data to take on a more normal appearance so that the assumption of ‘normality’ can be reasonably justified, but at the cost of creating a new variable whose units of measurement and interpretation are more complicated. [For some non-normal variables, such as the occurrence of rare, extreme or catastrophic events (e.g. a 100-year flood or forest fire, coronavirus pandemic, the Global Financial Crisis or other type of financial crisis, man-made or natural disaster), the distributions cannot be ‘normalised’. In such cases, the researcher needs to model the distribution as it stands. For such events, extreme value theory (e.g. see Diebold et al. 2000 ) has proven very useful in recent years. This theory uses a variation of the Pareto or Weibull distribution as a reference, rather than the normal distribution, when making predictions.]

Figure 5.29 displays before and after pictures of the effects of a logarithmic transformation on the positively skewed speed variable from the QCI database. Each graph, produced using NCSS, is of the hybrid histogram-density trace-boxplot type first illustrated in Procedure 5.6 . The left graph clearly shows the strong positive skew in the speed scores and the right graph shows the result of taking the log 10 of each raw score.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig29_HTML.jpg

Combined histogram-density trace-boxplot graphs displaying the before and after effects of a ‘normalising’ log 10 transformation of the speed variable

Notice how the long tail toward slow speed scores is pulled in toward the mean and the very short tail toward fast speed scores is extended away from the mean. The result is a more ‘normal’ appearing distribution. The assumption would then be that we could assume normality of speed scores, but only in a log 10 format (i.e. it is the log of speed scores that we assume is normally distributed in the population). In general, taking the logarithm of raw scores provides a satisfactory remedy for positively skewed distributions (but not for negatively skewed ones). Furthermore, anything we do with the transformed speed scores now has to be interpreted in units of log 10 (seconds) which is a more complex interpretation to make.

Another visual method for detecting non-normality is to graph what is called a normal Q-Q plot (the Q-Q stands for Quantile-Quantile). This plots the percentiles for the observed data against the percentiles for the standard normal distribution (see Cleveland 1995 for more detailed discussion; also see Lane 2007 , http://onlinestatbook.com/2/advanced_graphs/ q-q_plots.html) . If the pattern for the observed data follows a normal distribution, then all the points on the graph will fall approximately along a diagonal line.

Figure 5.30 shows the normal Q-Q plots for the original speed variable and the transformed log-speed variable, produced using the SPSS Explore... procedure. The diagnostic diagonal line is shown on each graph. In the left-hand plot, for speed , the plot points clearly deviate from the diagonal in a way that signals positive skewness. The right-hand plot, for log_speed, shows the plot points generally falling along the diagonal line thereby conforming much more closely to what is expected in a normal distribution.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig30_HTML.jpg

Normal Q-Q plots for the original speed variable and the new log_speed variable

In addition to visual ways of detecting non-normality, there are also numerical ways. As highlighted in Chap. 10.1007/978-981-15-2537-7_1, there are two additional characteristics of any distribution, namely skewness (asymmetric distribution tails) and kurtosis (peakedness of the distribution). Both have an associated statistic that provides a measure of that characteristic, similar to the mean and standard deviation statistics. In a normal distribution, the values for the skewness and kurtosis statistics are both zero (skewness = 0 means a symmetric distribution; kurtosis = 0 means a mesokurtic distribution). The further away each statistic is from zero, the more the distribution deviates from a normal shape. Both the skewness statistic and the kurtosis statistic have standard errors (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec14) associated with them (which work very much like the standard deviation, only for a statistic rather than for observations); these can be routinely computed by almost any statistical package when you request a descriptive analysis. Without going into the logic right now (this will come in Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec1), a rough rule of thumb you can use to check for normality using the skewness and kurtosis statistics is to do the following:

  • Prepare : Take the standard error for the statistic and multiply it by 2 (or 3 if you want to be more conservative).
  • Interval : Add the result from the Prepare step to the value of the statistic and subtract the result from the value of the statistic. You will end up with two numbers, one low - one high, that define the ends of an interval (what you have just created approximates what is called a ‘confidence interval’, see Procedure 10.1007/978-981-15-2537-7_8#Sec18).
  • Check : If zero falls inside of this interval (i.e. between the low and high endpoints from the Interval step), then there is likely to be no significant issue with that characteristic of the distribution. If zero falls outside of the interval (i.e. lower than the low value endpoint or higher than the high value endpoint), then you likely have an issue with non-normality with respect to that characteristic.

Visually, we saw in the left graph in Fig. 5.29 that the speed variable was highly positively skewed. What if Maree wanted to check some numbers to support this judgment? She could ask SPSS to produce the skewness and kurtosis statistics for both the original speed variable and the new log_speed variable using the Frequencies... or the Explore... procedure. Table 5.6 shows what SPSS would produce if the Frequencies ... procedure were used.

Skewness and kurtosis statistics and their standard errors for both the original speed variable and the new log_speed variable

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab6_HTML.jpg

Using the 3-step check rule described above, Maree could roughly evaluate the normality of the two variables as follows:

  • skewness : [Prepare] 2 × .229 = .458 ➔ [Interval] 1.487 − .458 = 1.029 and 1.487 + .458 = 1.945 ➔ [Check] zero does not fall inside the interval bounded by 1.029 and 1.945, so there appears to be a significant problem with skewness. Since the value for the skewness statistic (1.487) is positive, this means the problem is positive skewness, confirming what the left graph in Fig. 5.29 showed.
  • kurtosis : [Prepare] 2 × .455 = .91 ➔ [Interval] 3.071 − .91 = 2.161 and 3.071 + .91 = 3.981 ➔ [Check] zero does not fall in interval bounded by 2.161 and 3.981, so there appears to be a significant problem with kurtosis. Since the value for the kurtosis statistic (1.487) is positive, this means the problem is leptokurtosis—the peakedness of the distribution is too tall relative to what is expected in a normal distribution.
  • skewness : [Prepare] 2 × .229 = .458 ➔ [Interval] −.050 − .458 = −.508 and −.050 + .458 = .408 ➔ [Check] zero falls within interval bounded by −.508 and .408, so there appears to be no problem with skewness. The log transform appears to have corrected the problem, confirming what the right graph in Fig. 5.29 showed.
  • kurtosis : [Prepare] 2 × .455 = .91 ➔ [Interval] −.672 – .91 = −1.582 and −.672 + .91 = .238 ➔ [Check] zero falls within interval bounded by −1.582 and .238, so there appears to be no problem with kurtosis. The log transform appears to have corrected this problem as well, rendering the distribution more approximately mesokurtic (i.e. normal) in shape.

There are also more formal tests of significance (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec1) that one can use to numerically evaluate normality, such as the Kolmogorov-Smirnov test and the Shapiro-Wilk’s test . Each of these tests, for example, can be produced by SPSS on request, via the Explore... procedure.

1 For more information, see Chap. 10.1007/978-981-15-2537-7_1 – The language of statistics .

References for Procedure 5.1

  • Allen P, Bennett K, Heritage B. SPSS statistics: A practical guide. 4. South Melbourne, VIC: Cengage Learning Australia Pty; 2019. [ Google Scholar ]
  • George D, Mallery P. IBM SPSS statistics 25 step by step: A simple guide and reference. 15. New York: Routledge; 2019. [ Google Scholar ]

Useful Additional Readings for Procedure 5.1

  • Agresti A. Statistical methods for the social sciences. 5. Boston: Pearson; 2018. [ Google Scholar ]
  • Argyrous G. Statistics for research: With a guide to SPSS. 3. London: Sage; 2011. [ Google Scholar ]
  • De Vaus D. Analyzing social science data: 50 key problems in data analysis. London: Sage; 2002. [ Google Scholar ]
  • Glass GV, Hopkins KD. Statistical methods in education and psychology. 3. Upper Saddle River, NJ: Pearson; 1996. [ Google Scholar ]
  • Gravetter FJ, Wallnau LB. Statistics for the behavioural sciences. 10. Belmont, CA: Wadsworth Cengage; 2017. [ Google Scholar ]
  • Steinberg WJ. Statistics alive. 2. Los Angeles: Sage; 2011. [ Google Scholar ]

References for Procedure 5.2

  • Chang W. R graphics cookbook: Practical recipes for visualizing data. 2. Sebastopol, CA: O’Reilly Media; 2019. [ Google Scholar ]
  • Jacoby WG. Statistical graphics for univariate and bivariate data. Thousand Oaks, CA: Sage; 1997. [ Google Scholar ]
  • McCandless D. Knowledge is beautiful. London: William Collins; 2014. [ Google Scholar ]
  • Smithson MJ. Statistics with confidence. London: Sage; 2000. [ Google Scholar ]
  • Toseland M, Toseland S. Infographica: The world as you have never seen it before. London: Quercus Books; 2012. [ Google Scholar ]
  • Wilkinson L. Cognitive science and graphic design. In: SYSTAT Software Inc, editor. SYSTAT 13: Graphics. Chicago, IL: SYSTAT Software Inc; 2009. pp. 1–21. [ Google Scholar ]

Useful Additional Readings for Procedure 5.2

  • Field A. Discovering statistics using SPSS for windows. 5. Los Angeles: Sage; 2018. [ Google Scholar ]
  • George D, Mallery P. IBM SPSS statistics 25 step by step: A simple guide and reference. 15. Boston, MA: Pearson Education; 2019. [ Google Scholar ]
  • Hintze JL. NCSS 8 help system: Graphics. Kaysville, UT: Number Cruncher Statistical Systems; 2012. [ Google Scholar ]
  • StatPoint Technologies, Inc . STATGRAPHICS Centurion XVI user manual. Warrenton, VA: StatPoint Technologies Inc.; 2010. [ Google Scholar ]
  • SYSTAT Software Inc . SYSTAT 13: Graphics. Chicago, IL: SYSTAT Software Inc; 2009. [ Google Scholar ]

References for Procedure 5.3

  • Cleveland WR. Visualizing data. Summit, NJ: Hobart Press; 1995. [ Google Scholar ]
  • Jacoby WJ. Statistical graphics for visualizing multivariate data. Thousand Oaks, CA: Sage; 1998. [ Google Scholar ]

Useful Additional Readings for Procedure 5.3

  • Kirk A. Data visualisation: A handbook for data driven design. Los Angeles: Sage; 2016. [ Google Scholar ]
  • Knaflic CN. Storytelling with data: A data visualization guide for business professionals. Hoboken, NJ: Wiley; 2015. [ Google Scholar ]
  • Tufte E. The visual display of quantitative information. 2. Cheshire, CN: Graphics Press; 2001. [ Google Scholar ]

Reference for Procedure 5.4

Useful additional readings for procedure 5.4.

  • Rosenthal R, Rosnow RL. Essentials of behavioral research: Methods and data analysis. 2. New York: McGraw-Hill Inc; 1991. [ Google Scholar ]

References for Procedure 5.5

Useful additional readings for procedure 5.5.

  • Gravetter FJ, Wallnau LB. Statistics for the behavioural sciences. 9. Belmont, CA: Wadsworth Cengage; 2012. [ Google Scholar ]

References for Fundamental Concept I

Useful additional readings for fundamental concept i.

  • Howell DC. Statistical methods for psychology. 8. Belmont, CA: Cengage Wadsworth; 2013. [ Google Scholar ]

References for Procedure 5.6

  • Norušis MJ. IBM SPSS statistics 19 guide to data analysis. Upper Saddle River, NJ: Prentice Hall; 2012. [ Google Scholar ]
  • Field A. Discovering statistics using SPSS for Windows. 5. Los Angeles: Sage; 2018. [ Google Scholar ]
  • Hintze JL. NCSS 8 help system: Introduction. Kaysville, UT: Number Cruncher Statistical System; 2012. [ Google Scholar ]
  • SYSTAT Software Inc . SYSTAT 13: Statistics - I. Chicago, IL: SYSTAT Software Inc; 2009. [ Google Scholar ]

Useful Additional Readings for Procedure 5.6

  • Hartwig F, Dearing BE. Exploratory data analysis. Beverly Hills, CA: Sage; 1979. [ Google Scholar ]
  • Leinhardt G, Leinhardt L. Exploratory data analysis. In: Keeves JP, editor. Educational research, methodology, and measurement: An international handbook. 2. Oxford: Pergamon Press; 1997. pp. 519–528. [ Google Scholar ]
  • Rosenthal R, Rosnow RL. Essentials of behavioral research: Methods and data analysis. 2. New York: McGraw-Hill, Inc.; 1991. [ Google Scholar ]
  • Tukey JW. Exploratory data analysis. Reading, MA: Addison-Wesley Publishing; 1977. [ Google Scholar ]
  • Velleman PF, Hoaglin DC. ABC’s of EDA. Boston: Duxbury Press; 1981. [ Google Scholar ]

Useful Additional Readings for Procedure 5.7

References for fundemental concept ii.

  • Diebold FX, Schuermann T, Stroughair D. Pitfalls and opportunities in the use of extreme value theory in risk management. The Journal of Risk Finance. 2000; 1 (2):30–35. doi: 10.1108/eb043443. [ CrossRef ] [ Google Scholar ]
  • Lane D. Online statistics education: A multimedia course of study. Houston, TX: Rice University; 2007. [ Google Scholar ]

Useful Additional Readings for Fundemental Concept II

  • Keller DK. The tao of statistics: A path to understanding (with no math) Thousand Oaks, CA: Sage; 2006. [ Google Scholar ]

Logo for Kwantlen Polytechnic University

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Descriptive Statistics

At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single variables, followed by some of the most common techniques for describing statistical relationships between variables. We then look at how to present descriptive statistics in writing and also in the form of tables and graphs that would be appropriate for an American Psychological Association (APA)-style research report. We end with some practical advice for organizing and carrying out your analyses.

Research Methods in Psychology Copyright © 2019 by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

logo

Research Methods in Psychology

13. descriptive statistics ¶.

Statistics is the grammar of science. —Karl Pearson

At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics, a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single variables, followed by some of the most common techniques for describing statistical relationships between variables. We then look at how to present descriptive statistics in writing and also in the form of tables and graphs that would be appropriate for an American Psychological Association (APA)-style research report. We end with some practical advice for organizing and carrying out your analyses.

13.1. Describing Single Variables ¶

13.1.1. learning objectives ¶.

Use frequency tables and histograms to display and interpret the distribution of a variable.

Compute and interpret the mean, median, and mode of a distribution and identify situations in which the mean, median, or mode is the most appropriate measure of central tendency.

Compute and interpret the range and standard deviation of a distribution.

Compute and interpret percentile ranks and z scores. Define APA style and list several of its most important characteristics.

Identify three levels of APA style and give examples of each.

Identify multiple sources of information about APA style.

Descriptive statistics are a set of techniques for summarizing and displaying data. Let us assume here that the data are quantitative and consist of scores on one or more variables for each of several study participants. Although in most cases the primary research question will be about one or more statistical relationships between variables, it is also important to describe each variable individually. For this reason, we begin by looking at some of the most common techniques for describing single variables.

13.1.2. The Distribution of a Variable ¶

Every variable has a distribution, which is the way the scores are distributed across the levels of that variable. For example, in a sample of 100 university students, the distribution of the variable “number of siblings” might be such that 10 students have no siblings, 30 have one sibling, 40 have two siblings, and so on. In the same sample, the distribution of the variable “sex” might be such that 44 have a score of “male” and 56 have a score of “female”.

13.1.3. Frequency Tables ¶

One way to display the distribution of a variable is in a frequency table. Table 12.1, for example, is a frequency table showing a hypothetical distribution of scores on the Rosenberg Self-Esteem Scale for a sample of 40 college students. The first column lists the possible scores on the Rosenberg scale and the second column lists the frequency of each score. This table shows that there were three students who had self-esteem scores of 24, five who had self-esteem scores of 23, and so on. From a frequency table like this, one can quickly see several important aspects of a distribution, including the range of scores (from 15 to 24), the most and least common scores (22 and 17, respectively), and any extreme scores that stand out from the rest.

../_images/DSt1.png

Fig. 13.1 Frequency table showing a hypothetical distribution of scores on the Rosenberg self-esteem scale. ¶

There are a few other points worth noting about frequency tables. First, the levels listed in the first column usually go from the highest at the top to the lowest at the bottom, and they usually do not extend beyond the highest and lowest scores in the data. For example, although scores on the Rosenberg scale can vary from a high of 30 to a low of 0, Figure {number} <fig:selfesteem> only includes levels from 24 to 15 because that range includes all the scores in this particular data set. Second, when there are many different scores across a wide range of values, it is often better to create a grouped frequency table, in which the first column lists ranges of values and the second column lists the frequency of scores in each range. Figure {number} <fig:reactiontimes> , for example, is a grouped frequency table showing a hypothetical distribution of simple reaction times for a sample of 20 participants. In a grouped frequency table, the ranges must all be of equal width, and there are usually between five and 15 of them. Finally, frequency tables can also be used for categorical variables, in which case the levels are category labels. The order of the category labels is somewhat arbitrary, but they are often listed from the most frequent at the top to the least frequent at the bottom.

../_images/DSt2.png

Fig. 13.2 A grouped frequency table showing a hypothetical distribution of reaction times reaction time (ms). ¶

13.1.4. Histograms ¶

A histogram is a graphical display of a distribution. It presents the same information as a frequency table but in a way that is even quicker and easier to grasp. The histogram in Figure 13.3 presents the distribution of self-esteem scores in Table 12.1. The x-axis of the histogram represents the variable and the y-axis represents frequency. Above each level of the variable on the x-axis is a vertical bar that represents the number of individuals with that score. When the variable is quantitative, as in this example, there is usually no gap between the bars. When the variable is categorical, however, there is usually a small gap between them. The gap at 17 in this histogram reflects the fact that there were no scores of 17 in this data set.

../_images/Fig121.png

Fig. 13.3 Histogram showing the distribution of self-esteem scores presented in Figure {number} <fig:selfesteem> ¶

13.1.5. Distribution Shapes ¶

When the distribution of a quantitative variable is displayed in a histogram, it has a shape. The shape of the distribution of self-esteem scores in Figure 13.3 is typical. There is a peak somewhere near the middle of the distribution and “tails” that taper in either direction from the peak. The distribution of Figure 13.3 is unimodal, meaning it has one distinct peak, but distributions can also be bimodal, meaning they have two distinct peaks. Figure 13.4 , for example, shows a hypothetical bimodal distribution of scores on the Beck Depression Inventory (BDI). Distributions can also have more than two distinct peaks, but these are relatively rare in psychological research.

../_images/Fig122.png

Fig. 13.4 Histogram showing a hypothetical bimodal distribution of scores on the Beck Depression Inventory. ¶

Another characteristic of the shape of a distribution is whether it is symmetric or skewed. The distribution in the center of Figure 13.5 is symmetrical. Its left and right halves are mirror images of each other. The distribution on the left is negatively skewed: its peak is shifted toward the upper (right) end of its range and a relatively long left tail.

../_images/Fig123.png

Fig. 13.5 Histograms showing negatively skewed, symmetrical, and positively skewed distributions. ¶

An outlier is an extreme score that is much higher or lower than the rest of the scores in the distribution. Sometimes outliers represent truly extreme scores on the variable of interest. For example, on the Beck Depression Inventory, a single clinically depressed person might be an outlier in a sample of otherwise happy and high-functioning peers. However, outliers may also reflect errors or misunderstandings on the part of the researcher or participant, equipment malfunctions, or similar problems. We will say more about how to interpret outliers and what to do about them later in this chapter.

13.1.6. Measures of Central Tendency and Variability ¶

It is also useful to be able to describe the characteristics of a distribution more precisely. Here we look at how to do this in terms of two important characteristics: central tendency and variability.

13.1.7. Central Tendency ¶

The central tendency of a distribution is its middle, the point around which the scores in the distribution tend to cluster. (Another term for central tendency is average.) Looking back at Figure 13.3 , for example, we can see that the self-esteem scores tend to cluster around the values of 20 to 22. Here we will consider the three most common measures of central tendency: the mean, the median, and the mode.

The mean of a distribution (symbolized M) is the sum of the scores divided by the number of scores. As a formula, it looks like this:

In this formula, the symbol \(\sum\) (the Greek letter sigma) is the summation sign and means to sum across the values of the variable X. N represents the number of scores. The mean is by far the most common measure of central tendency, and there are some good reasons for this. It usually provides a good indication of the central tendency of a distribution, and it is easily understood by most people. In addition, the mean has statistical properties that make it especially useful in doing inferential statistics.

An alternative to the mean is the median. The median is the middle score in the sense that half the scores in the distribution are less than the median and half are greater than the median. The simplest way to find the median is to organize the scores from lowest to highest and locate the score in the middle. Consider, for example, the following set of seven scores:

To find the median, simply rearrange the scores from lowest to highest and locate the one in the middle.

In this case, the median is 4 because there are three scores lower than 4 and three scores higher than 4. When there is an even number of scores, there are two scores in the middle of the distribution, in which case the median is the value halfway between them. For example, if we were to add a score of 15 to the preceding data set, there would be two scores (both 4 and 8) in the middle of the distribution, and the median would be halfway between them (i.e., 6).

One final measure of central tendency is the mode. The mode is the most frequent score in a distribution. In the self-esteem distribution presented in Figure 13.1 and Figure 13.3 , for example, the mode is 22. More students had that score than any other. The mode is the only measure of central tendency that can also be used for categorical variables.

In a distribution that is both unimodal and symmetrical, the mean, median, and mode should be very close to each other. In a bimodal or asymmetrical distribution, the mean, median, and mode can be quite different from one another. In a bimodal distribution, the mean and median will tend to be between the peaks, whereas the mode will be at the tallest peak. In a skewed distribution, the mean will differ from the median in the direction of the skew (i.e., the direction of the longer tail). For highly skewed distributions, the mean can be pulled so far in the direction of the skew that it is no longer a good measure of the central tendency of that distribution. Imagine, for example, a set of four simple reaction times of 200, 250, 280, and 250 milliseconds (ms). The mean is 245 ms. But the addition of one more score of 5,000 ms (e.g., perhaps because the participant was not paying attention) would increase the mean to 1,445 ms. Not only is this measure of central tendency greater than 80% of the scores in the distribution, but it also does not seem to represent the behavior of anyone in the distribution very well. This is why researchers often prefer the median for highly skewed distributions (such as distributions of reaction times).

Keep in mind, though, that you are not required to choose a single measure of central tendency in analyzing your data. Each one provides slightly different information, and all of them can be useful.

13.1.8. Measures of Variability ¶

The variability of a distribution describes how spread out a group of scores is. Consider the two distributions in Figure 13.6 , both of which have the same central tendency: the means, medians, and modes of each distribution are all 10. Notice, however, that the two distributions differ in terms of their variability. The top one has relatively low variability, with all the scores relatively close to the center. The bottom one has relatively high variability, with the scores are spread across a much greater range.

../_images/Fig124.png

Fig. 13.6 Histograms showing hypothetical distributions with the same mean, median, and mode (10) but with low variability (top) and high variability (bottom) ¶

One simple measure of variability is the range, which is simply the difference between the highest and lowest scores in the distribution. The range of the self-esteem scores in Figure 13.1 , for example, is the difference between the highest score (24) and the lowest score (15). That is, the range is \(24 - 15 = 9\) . Although the range is easy to compute and understand, it can be misleading when there are outliers. Imagine, for example, an exam on which all the students scored between 90 and 100. It has a range of 10. But if there was a single student who scored 20, the range would increase to 80, giving the impression that the scores were quite variable when in fact only one student differed substantially from the rest.

By far the most common measure of variability is the standard deviation. The standard deviation of a distribution is, roughly speaking, the average distance between the scores and the mean. For example, the standard deviations of the distributions in Figure 13.6 are 1.69 for the top distribution and 4.30 for the bottom one. That is, whereas the scores in the top distribution differ from the mean by about 1.69 units on average, the scores in the bottom distribution differ from the mean by about 4.30 units on average.

Computing the standard deviation involves a slight complication. Specifically, it involves finding the difference between each score and the mean, squaring each difference, finding the mean of these squared differences, and finally finding the square root of that mean. The formula looks like this:

The computations for the standard deviation are illustrated for a small set of data in Figure 13.7 . The first column is a set of eight scores that has a mean of 5. The second column is the difference between each score and the mean. The third column is the square of each of these differences. Notice that although the differences can be negative, the squared differences are always positive, meaning that the standard deviation is always positive. At the bottom of the third column is the mean of the squared differences, which is also called the variance (symbolized \(SD^2\) ). Although the variance is itself a measure of variability, it generally plays a larger role in inferential statistics than in descriptive statistics. Finally, below the variance is the square root of the variance, which is the standard deviation.

../_images/DSt3.png

Fig. 13.7 Computations for the standard deviation ¶

If you have already taken a statistics course, you may have learned to divide the sum of the squared differences by N - 1 rather than by N when you compute the variance and standard deviation. Why is this?

By definition, the standard deviation is the square root of the mean of the squared differences. This implies dividing the sum of squared differences by N, as in the formula just presented. Computing the standard deviation this way is appropriate when your goal is simply to describe the variability in a sample. And learning it this way emphasizes that the variance is in fact the mean of the squared differences—and the standard deviation is the square root of this mean. However, most calculators and software packages divide the sum of squared differences by N - 1. This is because the standard deviation of a sample tends to be a bit lower than the standard deviation of the population the sample was selected from. Dividing the sum of squares by N - 1 corrects for this tendency and results in a better estimate of the population standard deviation. Because researchers generally think of their data as representing a sample selected from a larger population, and because they are generally interested in drawing conclusions about the population, it makes sense to routinely apply this correction.

13.1.9. Percentile Ranks and z Scores ¶

In many situations, it is useful to have a way to describe the location of an individual score within its distribution. One approach is the percentile rank. The percentile rank of a score is the percentage of scores in the distribution that are lower than that score. Consider, for example, the distribution in Figure 13.1 . For any score in the distribution, we can find its percentile rank by counting the number of scores in the distribution that are lower than that score and converting that number to a percentage of the total number of scores. Notice, for example, that five of the students represented by the data in Figure 13.1 had self-esteem scores of 23. In this distribution, 32 of the 40 scores (80%) are lower than 23. Thus each of these students has a percentile rank of 80. Alternatively, one can say that these students scored “at the 80th percentile”. Percentile ranks are often used to report the results of standardized tests of ability or achievement. If your percentile rank on a test of verbal ability were 40, for example, this would mean that you scored higher than 40% of the people who took the test.

Another approach is the z score. The z score for a particular individual is the difference between that individual’s score and the mean of the distribution, divided by the standard deviation of the distribution:

A z score indicates how far above or below the mean a raw score is, but it expresses this in terms of the standard deviation. For example, in a distribution of intelligence quotient (IQ) scores with a mean of 100 and a standard deviation of 15, an IQ score of 110 would have a z score of (110 - 100) / 15 = +0.67. In other words, a score of 110 is 0.67 standard deviations (approximately two thirds of a standard deviation) above the mean. Similarly, a raw score of 85 would have a z score of (85 - 100) / 15 = -1.00. In other words, a score of 85 is one standard deviation below the mean.

There are several reasons that z scores are important. Again, they provide a way of describing where an individual’s score is located within a distribution and are sometimes used to report the results of standardized tests. They also provide one way of defining outliers. For example, outliers are sometimes defined as scores that have z scores less than -3.00 or greater than +3.00. In other words, they are defined as scores that are more than three standard deviations from the mean. Finally, z scores play an important role in understanding and computing other statistics, as we will see shortly.

Online Statistical Tools

Although many researchers use commercially available software such as SPSS and Excel to analyze their data, there are several free online analysis tools that can also be extremely useful. Many allow you to enter or upload your data and then make one click to conduct several descriptive statistical analyses. Among them are the following.

Rice Virtual Lab in Statistics https://onlinestatbook.com/rvls.html

VassarStats http://www.vassarstats.net/

Bright Stat http://www.brightstat.com

For a more complete list, see http://statpages.org/index.html

13.1.10. Key Takeaways ¶

Every variable has a distribution—a way that the scores are distributed across the levels. The distribution can be described using a frequency table and histogram. It can also be described in words in terms of its shape, including whether it is unimodal or bimodal, and whether it is symmetrical or skewed.

The central tendency, or middle, of a distribution can be described precisely using three statistics—the mean, median, and mode. The mean is the sum of the scores divided by the number of scores, the median is the middle score, and the mode is the most common score.

The variability, or spread, of a distribution can be described precisely using the range and standard deviation. The range is the difference between the highest and lowest scores, and the standard deviation is roughly the average amount by which the scores differ from the mean.

The location of a score within its distribution can be described using percentile ranks or z scores. The percentile rank of a score is the percentage of scores below that score, and the z score is the difference between the score and the mean divided by the standard deviation.

13.1.11. Exercises ¶

Practice: Make a frequency table and histogram for the following data. Then write a short description of the shape of the distribution in words.

11, 8, 9, 12, 9, 10, 12, 13, 11, 13, 12, 6, 10, 17, 13, 11, 12, 12, 14, 14

Practice: For the data in Exercise 1, compute the mean, median, mode, standard deviation, and range.

Practice: Using the data in Exercises 1 and 2, find a. the percentile ranks for scores of 9 and 14 b. the z scores for scores of 8 and 12.

13.2. Describing Statistical Relationships ¶

13.2.1. learning objectives ¶.

Describe differences between groups in terms of their means and standard deviations, and in terms of Cohen’s d.

Describe correlations between quantitative variables in terms of Pearson’s r.

As we have seen throughout this book, most interesting research questions in psychology are about statistical relationships between variables. Recall that there is a statistical relationship between two variables when the average score on one differs systematically across the levels of the other. In this section, we revisit the two basic forms of statistical relationship introduced earlier in the book: differences between groups or conditions and relationships between quantitative variables. We will also consider how to describe them in more detail.

13.2.2. Differences Between Groups or Conditions ¶

Differences between groups or conditions are usually described in terms of the mean and standard deviation of each group or condition. For example, Thomas Ollendick and his colleagues conducted a study in which they evaluated two one-session treatments for simple phobias in children [OOstReuterskiold+09] . They randomly assigned children with an intense fear (e.g., to dogs) to one of three conditions. In the exposure condition, the children actually confronted the object of their fear under the guidance of a trained therapist. In the education condition, they learned about phobias and some strategies for coping with them. In the wait-list control condition, they were waiting to receive a treatment after the study was over. The severity of each child’s phobia was then rated on a 1-to-8 scale by a clinician who did not know which treatment the child had received. The mean fear rating in the education condition was 4.83 with a standard deviation of 1.52, while the mean fear rating in the exposure condition was 3.47 with a standard deviation of 1.77. The mean fear rating in the control condition was 5.56 with a standard deviation of 1.21. In other words, both treatments worked, but the exposure treatment worked better than the education treatment.

../_images/Fig12-5.png

Fig. 13.8 Bar graph showing mean clinician phobia ratings for children in two treatment conditions ¶

As we have seen, differences between group or condition means can be presented in a bar graph like that in Figure 13.8 , where the heights of the bars represent the group or condition means. We will look more closely at creating American Psychological Association (APA)-style bar graphs shortly.

13.2.3. Effect size ¶

It is also important to be able to describe the strength of a statistical relationship, which is often referred to as the effect size. The most widely used measure of effect size for differences between group or condition means is called Cohen’s d, which is the difference between the two means divided by the standard deviation:

In this formula, it does not really matter which mean is M1 and which is M2. If there is a treatment group and a control group, the treatment group mean is usually M1 and the control group mean is M2. Otherwise, the larger mean is usually M1 and the smaller mean M2 so that Cohen’s d turns out to be positive.

The standard deviation in this formula is usually a kind of average of the two group standard deviations called the pooled-within groups standard deviation. To compute the pooled within-groups standard deviation, add the sum of the squared differences for Group 1 to the sum of squared differences for Group 2, divide this by the sum of the two sample sizes, and then take the square root of that. Informally, however, the standard deviation of either group can be used instead.

Conceptually, Cohen’s d is the difference between the two means expressed in standard deviation units. Notice its similarity to a z score, which expresses the difference between an individual score and a mean in standard deviation units. A Cohen’s d of 0.50 means that the two group means differ by 0.50 standard deviations (half a standard deviation). A Cohen’s d of 1.20 means that they differ by 1.20 standard deviations. But how should we interpret these values in terms of the strength of the relationship or the size of the difference between the means? Table 13.1 presents some guidelines for interpreting Cohen’s d values in psychological research [Coh92] . Values near 0.20 are considered small, values near 0.50 are considered medium, and values near 0.80 are considered large. Thus a Cohen’s d value of 0.50 represents a medium-sized difference between two means, and a Cohen’s d value of 1.20 represents a very large difference in the context of psychological research. In the research by Ollendick and his colleagues, there was a large difference (d = 0.82) between the exposure and education conditions.

Cohen’s d is useful because it has the same meaning regardless of the variable being compared or the scale it was measured on. A Cohen’s d of 0.20 means that the two group means differ by 0.20 standard deviations whether we are talking about scores on the Rosenberg Self-Esteem scale, reaction times measured in milliseconds, numbers of siblings, or diastolic blood pressures measured in millimeters of mercury. Not only does this make it easier for researchers to communicate with each other about their results, it also makes it possible to combine and compare results across different studies using different measures.

Be aware that the term effect size can be misleading because it suggests a causal relationship: that the difference between the two means is an “effect” of being in one group or condition as opposed to another. Imagine, for example, a study showing that a group of exercisers is happier on average than a group of nonexercisers, with an “effect size” of d = 0.35. If the study was an experiment and has randomly assigned participants to exercise and no-exercise conditions, then one could conclude that exercising caused a small to medium-sized increase in happiness. If the study was correlational, however, then one could conclude only that the exercisers were happier than the nonexercisers by a small to medium-sized amount. In other words, simply calling the difference an “effect size” does not make the relationship a causal one.

Sex Differences Expressed as Cohen’s d

Researcher Janet Shibley Hyde has looked at the results of numerous studies on psychological sex differences and expressed the results in terms of Cohen’s d [Hyd07] . Following are a few of the values she has found, averaging across several studies in each case. Note that because she always treats the mean for men as M1 and the mean for women as M2, positive values indicate that men score higher and negative values indicate that women score higher.

Hyde points out that although men and women differ by a large amount on some variables (e.g., attitudes toward casual sex), they differ by only a small amount on the vast majority. In many cases, Cohen’s d is less than 0.10, which she terms a “trivial” difference. The difference in talkativeness discussed in Chapter 1 was also trivial: d = 0.06. Although researchers and non-researchers alike often emphasize sex differences, Hyde has argued that it makes at least as much sense to think of men and women as fundamentally similar. She refers to this as the “gender similarities hypothesis.”

13.2.4. Correlations Between Quantitative Variables ¶

As we have seen throughout the book, many interesting statistical relationships take the form of correlations between quantitative variables. For example, researchers Kurt Carlson and Jacqueline Conard conducted a study on the relationship between the alphabetical position of the first letter of people’s last names (from A = 1 to Z = 26) and how quickly those people responded to consumer appeals [CC11] . In one study, they sent e-mails to a large group of MBA students, offering free basketball tickets from a limited supply. The result was that the further toward the end of the alphabet students’ last names were, the faster they tended to respond. These results are summarized in Figure 13.9 .

../_images/Fig12-6.png

Fig. 13.9 Line graph showing the relationship between the alphabetical position of people’s last names and how quickly those people respond to offers of consumer goods. ¶

Such relationships are often presented using line graphs or scatterplots, which show how the level of one variable differs across the range of the other. In the line graph in Figure 13.9 , for example, each point represents the mean response time for participants with last names in the first, second, third, and fourth quartiles (or quarters) of the name distribution. It clearly shows how response time tends to decline as people’s last names get closer to the end of the alphabet. The scatterplot in Figure 13.10 , which is reproduced from Chapter 5 , shows the relationship between 25 research methods students’ scores on the Rosenberg Self-Esteem Scale given on two occasions a week apart. Here the points represent individuals, and we can see that the higher students scored on the first occasion, the higher they tended to score on the second occasion. In general, line graphs are used when the variable on the x-axis has (or is organized into) a small number of distinct values, such as the four quartiles of the name distribution. Scatterplots are used when the variable on the x-axis has a large number of values, such as the different possible self-esteem scores.

../_images/Fig12-7.png

Fig. 13.10 Statistical relationship between several university students’ scores on the Rosenberg Self-Esteem Scale given on two occasions a week apart. ¶

The data presented in Figure 13.10 provide a good example of a positive relationship, in which higher scores on one variable tend to be associated with higher scores on the other (so that the points go from the lower left to the upper right of the graph). The data presented in Figure 13.9 provide a good example of a negative relationship, in which higher scores on one variable tend to be associated with lower scores on the other (so that the points go from the upper left to the lower right).

Both of these examples are also linear relationships, in which the points are reasonably well fit by a single straight line. Nonlinear relationships are those in which the points are better fit by a curved line. Figure 13.11 , for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best fits the points is a curve, an upside down “U”, because people who get about eight hours of sleep tend to be the least depressed, while those who get less sleep and those who get more sleep tend to be more depressed. Nonlinear relationships are not uncommon in psychology, but a detailed discussion of them is beyond the scope of this book.

../_images/Fig12-8.png

Fig. 13.11 A hypothetical nonlinear relationship between how much sleep people get per night and how depressed they are. ¶

As we saw earlier in the book, the strength of a correlation between quantitative variables is typically measured using a statistic called Pearson’s r. As Figure 13.12 shows, its possible values range from -1.00, through zero, to +1.00. A value of 0 means there is no relationship between the two variables. In addition to his guidelines for interpreting Cohen’s d, Cohen offered guidelines for interpreting Pearson’s r in psychological research (see Table 13.1 ). Values near \(\pm .10\) are considered small, values near \(\pm .30\) are considered medium, and values near \(\pm .50\) are considered large. Notice that the sign of Pearson’s r is unrelated to its strength. Pearson’s r values of +.30 and -.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. Like Cohen’s d, Pearson’s r is also referred to as a measure of “effect size” even though the relationship may not be causal.

../_images/Fig12-9.png

Fig. 13.12 Pearson’s r ranges from -1.00 (representing the strongest possible negative relationship), through 0 (representing no relationship), to +1.00 (representing the strongest possible positive relationship). ¶

The computations for Pearson’s r are more complicated than those for Cohen’s d. Although you may never have to do them by hand, it is still instructive to see how. Computationally, Pearson’s r is the “mean cross-product of z scores”. To compute it, one starts by transforming all the scores to z scores. For the X variable, subtract the mean of X from each score and divide each difference by the standard deviation of X. For the Y variable, subtract the mean of Y from each score and divide each difference by the standard deviation of Y. Then, for each individual, multiply the two z scores together to form a cross-product. Finally, take the mean of the cross-products. The formula looks like this:

Table 12.5 illustrates these computations for a small set of data. The first column lists the scores for the X variable, which has a mean of 4.00 and a standard deviation of 1.90. The second column is the z-score for each of these raw scores. The third and fourth columns list the raw scores for the Y variable, which has a mean of 40 and a standard deviation of 11.78, and the corresponding z scores. The fifth column lists the cross-products. For example, the first one is 0.00 multiplied by -0.85, which is equal to 0.00. The second is 1.58 multiplied by 1.19, which is equal to 1.88. The mean of these cross-products, shown at the bottom of that column, is Pearson’s r, which in this case is +.53. There are other formulas for computing Pearson’s r by hand that may be quicker. This approach, however, is much clearer in terms of communicating conceptually what Pearson’s r is.

There are several common situations in which the value of Pearson’s r can be misleading. One is when the relationship under study is nonlinear. Even though Figure 13.12 shows a fairly strong relationship between depression and sleep, Pearson’s r would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s r. Another is when one or both of the variables have a limited range in the sample relative to the population. This problem is referred to as restriction of range. Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 13.13 .

../_images/Fig12-10.png

Fig. 13.13 Hypothetical data showing how a strong overall correlation can appear to be weak when one variable has a restricted range. The overall correlation here is -.77, but the correlation for the 18- to 24-year-olds (in the blue box) is 0. ¶

Pearson’s r here is -.77. However, if we were to collect data only from 18- to 24-year-olds (the shaded area of Figure 13.13 ) then the relationship would seem to be quite weak. In fact, Pearson’s r for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s r in light of it. There are also statistical methods to correct Pearson’s r for restriction of range, but they are beyond the scope of this book.

13.2.5. Key Takeaways ¶

Differences between groups or conditions are typically described in terms of the means and standard deviations of the groups or conditions or in terms of Cohen’s d and are presented in bar graphs.

Cohen’s d is a measure of relationship strength (or effect size) for differences between two group or condition means. It is the difference of the means divided by the standard deviation. In general, values of ±0.20, ±0.50, and ±0.80 can be considered small, medium, and large, respectively.

Correlations between quantitative variables are typically described in terms of Pearson’s r and presented in line graphs or scatterplots.

Pearson’s r is a measure of relationship strength (or effect size) for relationships between quantitative variables. It is the mean cross-product of the two sets of z scores. In general, values of ±.10, ±.30, and ±.50 can be considered small, medium, and large, respectively.

13.2.6. Exercises ¶

Practice: The following data represent scores on the Rosenberg Self-Esteem Scale for a sample of 10 Japanese university students and 10 American university students. Although hypothetical, these data are consistent with empirical findings [SA05] .Compute the means and standard deviations of the two groups, make a bar graph, compute Cohen’s d, and describe the strength of the relationship in words.

insert table Japan United States 25 27 20 30 24 34 28 37 30 26 32 24 21 28 24 35 20 33 26 36

Practice: The hypothetical data that follow are extraversion scores and the number of Facebook friends for 15 university students. Make a scatterplot for these data, compute Pearson’s r, and describe the relationship in words.

Extraversion Facebook Friends 8 75 10 315 4 28 6 214 12 176 14 95 10 120 11 150 4 32 13 250 5 99 7 136 8 185 11 88 10 144

13.3. Expressing Your Results ¶

13.3.1. learning objectives ¶.

Write out simple descriptive statistics in American Psychological Association (APA) style.

Interpret and create simple APA-style graphs—including bar graphs, line graphs, and scatterplots.

Interpret and create simple APA-style tables—including tables of group or condition means and correlation matrices.

Once you have conducted your descriptive statistical analyses, you will need to present them to others. In this section, we focus on presenting descriptive statistical results in writing, in graphs, and in tables—following American Psychological Association (APA) guidelines for written research reports. These principles can be adapted easily to other presentation formats such as posters and slide show presentations.

13.3.2. Presenting Descriptive Statistics in Writing ¶

When you have a small number of results to report, it is often most efficient to write them out. There are a few important APA style guidelines here. First, statistical results are always presented in the form of numerals rather than words and are usually rounded to two decimal places (e.g., “2.00” rather than “two” or “2”). They can be presented either in the narrative description of the results or parenthetically, much like reference citations. Here are some examples:

The mean age of the participants was 22.43 years with a standard deviation of 2.34.

Among the low self-esteem participants, those in a negative mood expressed stronger intentions to have unprotected sex (M = 4.05, SD = 2.32) than those in a positive mood (M = 2.15, SD = 2.27).

The treatment group had a mean of 23.40 (SD = 9.33), and the control group had a mean of 20.87 (SD = 8.45).

The test-retest correlation was .96.

There was a moderate negative correlation between the alphabetical position of respondents’ last names and their response time (r = -.27).

Notice that when presented in the narrative, the terms “mean” and “standard deviation” are written out, but when presented parenthetically, the symbols M and SD are used instead.

Notice also that it is especially important to use parallel construction to express similar or comparable results in similar ways. Parallel construction refers to using consistent language in a sentence. For example, the third sentence above has good parallel construction, because it uses the same format to describe the treatment and control group.

Consider the nonparallel alternative, which is more difficult read.

The treatment group had a mean of 23.40 (SD = 9.33), while 20.87 was the mean of the control group, which had a standard deviation of 8.45.

13.3.3. Presenting Descriptive Statistics in Graphs ¶

When you have a large number of results to report, you can often do it more clearly and efficiently with a graph. When you prepare graphs for an APA-style research report, there are some general guidelines that you should keep in mind. First, the graph should always add important information rather than repeat information that already appears in the text or in a table. If a graph presents information more clearly or efficiently, then you may choose to keep the graph and eliminate the text and/or table. Second, graphs should be as simple as possible. For example, the Publication Manual discourages the use of color unless it is absolutely necessary (although color can still be an effective element in posters, slide show presentations, or textbooks). Third, graphs should be interpretable on their own. A reader should be able to understand the basic result based only on the graph and its caption and should not have to refer to the text for an explanation.

There are also several more technical guidelines for graphs that include the following:

The graph should be slightly wider than it is tall.

The independent variable should be plotted on the x-axis and the dependent variable on the y-axis.

Values should increase from left to right on the x-axis and from bottom to top on the y-axis.

Axis Labels and Legends

Axis labels should be clear and concise and include the units of measurement if they do not appear in the caption.

Axis labels should be parallel to the axis.

Legends should appear within the boundaries of the graph.

Text should be in the same simple font throughout and differ by no more than four points.

Captions should briefly describe the figure, explain any abbreviations, and include the units of measurement if they do not appear in the axis labels.

Captions in an APA manuscript should be typed on a separate page that appears at the end of the manuscript. See Chapter 11 for more information.

13.3.4. Bar Graphs ¶

As we have seen throughout this book, bar graphs are generally used to present and compare the mean scores for two or more groups or conditions. The bar graph in Figure 13.14 is an APA-style version of Figure 13.6 . Notice that it conforms to all the guidelines listed. A new element in Figure 13.14 is the smaller vertical bars that extend both upward and downward from the top of each main bar. These are error bars, and they represent the variability in each group or condition. Although they sometimes extend one standard deviation in each direction, they are more likely to extend one standard error in each direction (as in Figure 13.14 ). The standard error is the standard deviation of the group divided by the square root of the sample size of the group. The standard error is used because, in general, a difference between group means that is greater than two standard errors is statistically significant. Thus one can “see” whether a difference is statistically significant based on a bar graph with error bars.

../_images/Fig12-11.png

Fig. 13.14 Sample APA-Style Bar Graph, With Error Bars Representing the Standard Errors, Based on Research by Ollendick and Colleagues ¶

13.3.5. Line Graphs ¶

Line graphs are used to present correlations between quantitative variables when the independent variable has, or is organized into, a relatively small number of distinct levels. Each point in a line graph represents the mean score on the dependent variable for participants at one level of the independent variable. Figure 13.15 is an APA-style version of the results of Carlson and Conard. Notice that it includes error bars representing the standard error and conforms to all the stated guidelines.

../_images/Fig12-12.png

Fig. 13.15 Sample APA-style line graph based on research by Carlson and Conard ¶

In most cases, the information in a line graph could just as easily be presented in a bar graph. In Figure 13.15 , for example, one could replace each point with a bar that reaches up to the same level and leave the error bars right where they are. This emphasizes the fundamental similarity of the two types of statistical relationship. Both are differences in the average score on one variable across levels of another. The convention followed by most researchers, however, is to use a bar graph when the variable plotted on the x-axis is categorical and a line graph when it is quantitative.

13.3.6. Scatterplots ¶

Scatterplots are used to present relationships between quantitative variables when the variable on the x-axis (typically the independent variable) has a large number of levels. Each point in a scatterplot represents an individual rather than the mean for a group of individuals, and there are no lines connecting the points. The graph in Figure 13.16 is an APA-style version of Figure 13.10 , which illustrates a few additional points. First, when the variables on the x-axis and y-axis are conceptually similar and measured on the same scale (here they are measures of the same variable on two different occasions) this can be emphasized by making the axes the same length. Second, when two or more individuals fall at exactly the same point on the graph, one way this can be indicated is by offsetting the points slightly along the x-axis. Other ways are by displaying the number of individuals in parentheses next to the point or by making the point larger or darker in proportion to the number of individuals. Finally, the straight line that best fits the points in the scatterplot, which is called the regression line, can also be included.

13.3.7. Expressing Descriptive Statistics in Tables ¶

Like graphs, tables can be used to present large amounts of information clearly and efficiently. The same general principles apply to tables as apply to graphs. They should add important information to the presentation of your results, be as simple as possible, and be interpretable on their own. Again, we focus here on tables for an APA-style manuscript.

../_images/Fig12-13.png

Fig. 13.16 Sample APA-Style Scatterplot ¶

The most common use of tables is to present several means and standard deviations, usually for complex research designs with multiple independent and dependent variables. Figure 13.17 , for example, shows the results of a hypothetical study similar to the one by MacDonald and Martineau [MM02] discussed in Chapter 5 . The means in Figure 13.17 are the means reported by MacDonald and Martineau, but the standard errors are not. Recall that these researchers categorized participants as having low or high self-esteem, put them into a negative or positive mood, and measured their intentions to have unprotected sex. Although not mentioned in Chapter 5 , they also measured participants’ attitudes toward unprotected sex. Notice that the table includes horizontal lines spanning the entire table at the top and bottom, and just beneath the column headings. Furthermore, every column has a heading—including the leftmost column—and there are additional headings that span two or more columns that help to organize the information and present it more efficiently. Finally, notice that APA-style tables are numbered consecutively starting at 1 (Table 1, Table 2, and so on) and given a brief but clear and descriptive title.

../_images/Fig12-14.png

Fig. 13.17 Sample APA-style table presenting means and standard deviations. ¶

Another common use of tables is to present correlations (e.g., Pearson’s r) among several variables. This kind of table is called a correlation matrix. Figure 13.18 is a correlation matrix based on a study by David McCabe and colleagues [MRIM+10] . They were interested in the relationships between working memory and several other variables.

../_images/Fig12-15.png

Fig. 13.18 Sample APA-style table (correlation matrix) based on research by McCabe and colleagues. ¶

We can see from the table that the correlation between working memory and executive function, for example, was an extremely strong .96, that the correlation between working memory and vocabulary was a medium .27, and that all the measures except vocabulary tend to decline with age. Notice here that only half the table is filled in because the other half would have identical values. For example, the Pearson’s r value in the upper right corner (working memory and age) would be the same as the one in the lower left corner (age and working memory). The correlation of a variable with itself is always 1.00, so these values are replaced by dashes to make the table easier to read.

As with graphs, precise statistical results that appear in a table do not need to be repeated in the text. Instead, the writer can note major trends and alert the reader to details (e.g., specific correlations) that are of particular interest.

13.3.8. Key Takeaways ¶

In an APA-style article, simple results are most efficiently presented in the text, while more complex results are most efficiently presented in graphs or tables.

APA style includes several rules for presenting numerical results in the text. These include using words only for numbers less than 10 that do not represent precise statistical results, and rounding results to two decimal places, using words (e.g., “mean”) in the text and symbols (e.g., “M”) in parentheses.

APA style includes several rules for presenting results in graphs and tables. Graphs and tables should add information rather than repeating information, be as simple as possible, and be interpretable on their own with a descriptive caption (for graphs) or a descriptive title (for tables).

13.3.9. Exercises ¶

Practice: In a classic study, men and women rated the importance of physical attractiveness in both a short-term mate and a long-term mate (Buss & Schmitt, 1993)3. The means and standard deviations are as follows. Men / Short Term: M = 5.67, SD = 2.34; Men / Long Term: M = 4.43, SD = 2.11; Women / Short Term: M = 5.67, SD = 2.48; Women / Long Term: M = 4.22, SD = 1.98. Present these results a. in writing b. in a graph c. in a table

13.4. Conducting Your Analyses ¶

13.4.1. learning objectives ¶.

Describe the steps involved in preparing and analyzing a typical set of raw data.

Even when you understand the statistics involved, analyzing data can be a complicated process. It is likely that for each of several participants, you have data for several different variables: demographics such as sex and age, one or more independent variables, one or more dependent variables, and perhaps a manipulation check. Furthermore, the “raw” (unanalyzed) data might take several different forms: completed paper-and-pencil questionnaires, computer files filled with numbers or text, videos, or written notes. Each of these may have to be organized, coded, or combined in some way. There might even be missing, incorrect, or just “suspicious” responses that must be dealt with. In this section, we consider some practical advice to make this process as organized and efficient as possible.

13.4.2. Prepare Your Data for Analysis ¶

Whether your raw data are on paper or in a computer file (or both), there are a few things you should do before you begin analyzing them. First, be sure they do not include any information that might identify individual participants and be sure that you have a secure location where you can store the data and a separate secure location where you can store any consent forms. Unless the data are highly sensitive, a locked room or password-protected computer is usually good enough. It is also a good idea to make photocopies or backup files of your data and store them in yet another secure location, at least until the project is complete. Professional researchers usually keep a copy of their raw data and consent forms for several years in case questions about the procedure, the data, or participant consent arise after the project is completed.

Next, you should check your raw data to make sure that they are complete and appear to have been accurately recorded (whether it was participants, yourself, or a computer program that did the recording). At this point, you might find that there are illegible or missing responses, or obvious misunderstandings (e.g., a response of “12” on a 1-to-10 rating scale). You will have to decide whether such problems are severe enough to make a participant’s data unusable. If information about the main independent or dependent variable is missing, or if several responses are missing or suspicious, you may have to exclude that participant’s data from the analyses. If you do decide to exclude any data, do not throw them away or delete them because you or another researcher might want to see them later. Instead, set them aside and keep notes about why you decided to exclude them because you will need to report this information.

Now you are ready to enter your data in a spreadsheet program or, if it is already in a computer file, to format it for analysis. You can use a general spreadsheet program like Microsoft Excel or a statistical analysis program like SPSS to create your data file. Data files created in one program can usually be converted to work with other programs. The most common format is for each row to represent a participant and for each column to represent a variable (with the variable name at the top of each column). A sample data file is shown in Table 12.6. The first column contains participant identification numbers. This is followed by columns containing demographic information (sex and age), independent variables (mood, four self-esteem items, and the total of the four self-esteem items), and finally dependent variables (intentions and attitudes). Categorical variables can usually be entered as category labels (e.g., “M” and “F” for male and female) or as numbers (e.g., “0” for negative mood and “1” for positive mood). Although category labels are often clearer, some analyses might require numbers. SPSS allows you to enter numbers but also attach a category label to each number.

If you have multiple-response measures, such the self-esteem measure in Table 13.3 , you could combine the items by hand and then enter the total score in your spreadsheet. However, it is much better to enter each response as a separate variable in the spreadsheet, as with the self-esteem measure in Table 13.3 , and use the software to combine them (e.g., using the “AVERAGE” function in Excel or the “Compute” function in SPSS). Not only is this approach more accurate, but it allows you to detect and correct errors, to assess internal consistency, and to analyze individual responses if you decide to do so later.

13.4.3. Preliminary Analyses ¶

Before turning to your primary research questions, there are often several preliminary analyses to conduct. For multiple-response measures, you should assess the internal consistency of the measure. Statistical programs like SPSS will allow you to compute Cronbach’s \(\alpha\) or Cohen’s \(\kappa\) . If this is beyond your comfort level, you can still compute and evaluate a split-half correlation.

Next, you should analyze each important variable separately. This step is not necessary for manipulated independent variables, of course, because you as the researcher determined what the distribution would be. Make histograms for each one, note their shapes, and compute the common measures of central tendency and variability. Be sure you understand what these statistics mean in terms of the variables you are interested in. For example, a distribution of self-report happiness ratings on a 1-to-10-point scale might be unimodal and negatively skewed with a mean of 8.25 and a standard deviation of 1.14. But what this means is that most participants rated themselves fairly high on the happiness scale, with a small number rating themselves noticeably lower.

Now is the time to identify any outliers, examine them more closely, and decide what to do about them. You might discover that what at first appears to be an outlier is the result of a response being entered incorrectly in the data file, in which case you only need to correct the data file and move on. Alternatively, you might suspect that an outlier represents some other kind of error, misunderstanding, or lack of effort by a participant. For example, in a reaction time distribution in which most participants took only a few seconds to respond, a participant who took 3 minutes to respond would probably be considered an outlier. It seems likely that this participant did not understand the task (or at least was not paying very close attention). Also, including his or her reaction time would have a large impact on the mean and standard deviation for the sample. In situations like this, it may be justifiable to exclude the outlying response or participant from the analyses. If you do this, however, you should keep notes on which responses or participants you have excluded and why, and apply those same criteria consistently to every response and every participant. When you present your results, you should indicate how many responses or participants you excluded and the specific criteria that you used. And again, do not literally throw away or delete the data that you choose to exclude. Just set them aside because you or another researcher might want to see them later.

Keep in mind that outliers do not necessarily represent an error, misunderstanding, or lack of effort. They might represent truly extreme responses or participants. For example, in one large university student sample, the vast majority of participants reported having had fewer than 15 sexual partners, but there were also a few extreme scores of 60 or 70 [BS99] . Although these scores might represent errors, misunderstandings, or even intentional exaggerations, it is also plausible that they represent honest and even accurate estimates. One strategy here would be to use the median and other statistics that are not strongly affected by the outliers. Another would be to analyze the data both including and excluding any outliers. If the results are essentially the same, which they often are, then it makes sense to leave the outliers. If the results differ depending on whether the outliers are included or excluded them, then both analyses can be reported and the differences between them discussed.

13.4.4. Answer Your Research Questions ¶

Finally, you are ready to answer your primary research questions. If you are interested in a difference between group or condition means, you can compute the relevant group or condition means and standard deviations, make a bar graph to display the results, and compute Cohen’s d. If you are interested in a correlation between quantitative variables, you can make a line graph or scatterplot (be sure to check for nonlinearity and restriction of range) and compute Pearson’s r.

At this point, you should also explore your data for other interesting results that might provide the basis for future research (and material for the discussion section of your paper). Bem [BZD87] suggests that you “examine your data from every angle. Analyze the sexes separately. Make up new composite indexes. If a datum suggests a new hypothesis, try to find additional evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something, anything, interesting. (p. 186–187)”

It is important to be cautious, however, because complex sets of data are likely to include “patterns” that occurred entirely by chance. Thus results discovered while “fishing” should be replicated in at least one new study before being presented as new phenomena in their own right.

13.4.5. Understand Your Descriptive Statistics ¶

In the next chapter, we will consider inferential statistics: a set of techniques for deciding whether the results for your sample are likely to apply to the population. Although inferential statistics are important for reasons that will be explained shortly, beginning researchers sometimes forget that their descriptive statistics really tell “what happened” in their study. For example, imagine that a treatment group of 50 participants has a mean score of 34.32 (SD = 10.45), a control group of 50 participants has a mean score of 21.45 (SD = 9.22), and Cohen’s d is an extremely strong 1.31. Although conducting and reporting inferential statistics (like a t test) would certainly be a required part of any formal report on this study, it should be clear from the descriptive statistics alone that the treatment worked. Or imagine that a scatterplot shows an indistinct “cloud” of points and Pearson’s r is a trivial -.02. Again, although conducting and reporting inferential statistics would be a required part of any formal report on this study, it should be clear from the descriptive statistics alone that the variables are essentially unrelated. The point is that you should always be sure that you thoroughly understand your results at a descriptive level first, and then move on to the inferential statistics.

13.4.6. Key Takeaways ¶

Raw data must be prepared for analysis by examining them for possible errors, organizing them, and entering them into a spreadsheet program.

Preliminary analyses on any data set include checking the reliability of measures, evaluating the effectiveness of any manipulations, examining the distributions of individual variables, and identifying outliers.

Outliers that appear to be the result of an error, a misunderstanding, or a lack of effort can be excluded from the analyses. The criteria for excluded responses or participants should be applied in the same way to all the data and described when you present your results. Excluded data should be set aside rather than destroyed or deleted in case they are needed later.

Descriptive statistics tell the story of what happened in a study. Although inferential statistics are also important, it is essential to understand the descriptive statistics first.

13.4.7. Exercises ¶

Discussion: What are at least two reasonable ways to deal with each of the following outliers based on the discussion in this chapter? (a) A participant estimating ordinary people’s heights estimates one woman’s height to be “84 inches” tall. (b) In a study of memory for ordinary objects, one participant scores 0 out of 15. (c) In response to a question about how many “close friends” she has, one participant writes “32.”

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Social Sci LibreTexts

12.5: Descriptive Statistics (Summary)

  • Last updated
  • Save as PDF
  • Page ID 19679

  • Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton
  • Kwantlen Polytechnic U., Washington State U., & Texas A&M U.—Texarkana

Key Takeaways

  • Every variable has a distribution—a way that the scores are distributed across the levels. The distribution can be described using a frequency table and histogram. It can also be described in words in terms of its shape, including whether it is unimodal or bimodal, and whether it is symmetrical or skewed.
  • The central tendency, or middle, of a distribution can be described precisely using three statistics—the mean, median, and mode. The mean is the sum of the scores divided by the number of scores, the median is the middle score, and the mode is the most common score.
  • The variability, or spread, of a distribution can be described precisely using the range and standard deviation. The range is the difference between the highest and lowest scores, and the standard deviation is the average amount by which the scores differ from the mean.
  • The location of a score within its distribution can be described using percentile ranks or z scores. The percentile rank of a score is the percentage of scores below that score, and the z score is the difference between the score and the mean divided by the standard deviation.
  • Differences between groups or conditions are typically described in terms of the means and standard deviations of the groups or conditions or in terms of Cohen’s d and are presented in bar graphs.
  • Cohen’s d is a measure of relationship strength (or effect size) for differences between two group or condition means. It is the difference of the means divided by the standard deviation. In general, values of ±0.20, ±0.50, and ±0.80 can be considered small, medium, and large, respectively.
  • Correlations between quantitative variables are typically described in terms of Pearson’s r and presented in line graphs or scatterplots.
  • Pearson’s r is a measure of relationship strength (or effect size) for relationships between quantitative variables. It is the mean cross-product of the two sets of z scores. In general, values of ±.10, ±.30, and ±.50 can be considered small, medium, and large, respectively.
  • In an APA-style article, simple results are most efficiently presented in the text, while more complex results are most efficiently presented in graphs or tables.
  • APA style includes several rules for presenting numerical results in the text. These include using words only for numbers less than 10 that do not represent precise statistical results, and rounding results to two decimal places, using words (e.g., “mean”) in the text and symbols (e.g., “ M ”) in parentheses.
  • APA style includes several rules for presenting results in graphs and tables. Graphs and tables should add information rather than repeating information, be as simple as possible, and be interpretable on their own with a descriptive caption (for graphs) or a descriptive title (for tables).
  • Raw data must be prepared for analysis by examining them for possible errors, organizing them, and entering them into a spreadsheet program.
  • Preliminary analyses on any data set include checking the reliability of measures, evaluating the effectiveness of any manipulations, examining the distributions of individual variables, and identifying outliers.
  • Outliers that appear to be the result of an error, a misunderstanding, or a lack of effort can be excluded from the analyses. The criteria for excluded responses or participants should be applied in the same way to all the data and described when you present your results. Excluded data should be set aside rather than destroyed or deleted in case they are needed later.
  • Descriptive statistics tell the story of what happened in a study. Although inferential statistics are also important, it is essential to understand the descriptive statistics first.

Ollendick, T. H., Öst, L.-G., Reuterskiöld, L., Costa, N., Cederlund, R., Sirbu, C.,…Jarrett, M. A. (2009). One-session treatments of specific phobias in youth: A randomized clinical trial in the United States and Sweden. Journal of Consulting and Clinical Psychology, 77 , 504–516.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112 , 155–159.

Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263.

Carlson, K. A., & Conard, J. M. (2011). The last name effect: How last name influences acquisition timing. Journal of Consumer Research, 38 (2), 300-307. doi: 10.1086/658470

MacDonald, T. K., & Martineau, A. M. (2002). Self-esteem, mood, and intentions to use condoms: When does low self-esteem lead to risky health behaviors? Journal of Experimental Social Psychology, 38 , 299–306.

McCabe, D. P., Roediger, H. L., McDaniel, M. A., Balota, D. A., & Hambrick, D. Z. (2010). The relationship between working memory capacity and executive functioning. Neuropsychology, 24 (2), 222–243. doi:10.1037/a0017619

Brown, N. R., & Sinclair, R. C. (1999). Estimating number of lifetime sexual partners: Men and women do it differently. The Journal of Sex Research, 36, 292–297.

Bem, D. J. (2003). Writing the empirical journal article. In J. M. Darley, M. P. Zanna, & H. L. Roediger III (Eds.), The complete academic: A career guide (2nd ed., pp. 185–219). Washington, DC: American Psychological Association.

Schmitt, D. P., & Allik, J. (2005). Simultaneous administration of the Rosenberg Self-Esteem Scale in 53 nations: Exploring the universal and culture-specific features of global self-esteem. Journal of Personality and Social Psychology, 89 , 623–642.

Buss, D. M., & Schmitt, D. P. (1993). Sexual strategies theory: A contextual evolutionary analysis of human mating. Psychological Review, 100 , 204–232.

  • 11, 8, 9, 12, 9, 10, 12, 13, 11, 13, 12, 6, 10, 17, 13, 11, 12, 12, 14, 14
  • Practice: For the data in Exercise 1, compute the mean, median, mode, standard deviation, and range.
  • the percentile ranks for scores of 9 and 14
  • the z scores for scores of 8 and 12.
  • Practice: The following data represent scores on the Rosenberg Self-Esteem Scale for a sample of 10 Japanese university students and 10 American university students. (Although hypothetical, these data are consistent with empirical findings [Schmitt & Allik, 2005] [1] .) Compute the means and standard deviations of the two groups, make a bar graph, compute Cohen’s d , and describe the strength of the relationship in words.
  • Practice: The hypothetical data that follow are extraversion scores and the number of Facebook friends for 15 university students. Make a scatterplot for these data, compute Pearson’s r , and describe the relationship in words.
  • in a figure
  • Discussion: What are at least two reasonable ways to deal with each of the following outliers based on the discussion in this chapter? (a) A participant estimating ordinary people’s heights estimates one woman’s height to be “84 inches” tall. (b) In a study of memory for ordinary objects, one participant scores 0 out of 15. (c) In response to a question about how many “close friends” she has, one participant writes “32.”
  • Schmitt, D. P., & Allik, J. (2005). Simultaneous administration of the Rosenberg Self-Esteem Scale in 53 nations: Exploring the universal and culture-specific features of global self-esteem. Journal of Personality and Social Psychology, 89 , 623–642. ↵
  • Buss, D. M., & Schmitt, D. P. (1993). Sexual strategies theory: A contextual evolutionary analysis of human mating. Psychological Review, 100 , 204–232. ↵

descriptive statistics in psychology research

  • My Preferences
  • My Reading List
  • Study Guides
  • Descriptive Statistics
  • Inferential Statistics
  • Measurement Scales
  • Descriptive/Correlational Research
  • Experimental Research
  • Ethical Considerations
  • The Scientific Method
  • Neural Transmission
  • Nervous System
  • Central Nervous System (CNS)
  • Peripheral Nervous System (PNS)
  • Endocrine System
  • Learning about the Brain
  • The Chemical Senses: Taste and Smell
  • The Cutaneous Skin Senses
  • The Vestibular Senses
  • The Kinesthetic System
  • Sensory Thresholds
  • Organization of Perceptions
  • Perceptual Constancy
  • Depth and Distance Perception
  • Other Factors Influencing Perception
  • Gestalt Theory
  • Extrasensory Perception
  • Stimulus Input: Attention and Set
  • Circadian Rhythms
  • Procedures for Changing Consciousness
  • Prenatal/Perinatal Drug Effects
  • Historical Background
  • Classical Conditioning
  • Operant Conditioning
  • Stimulus Generalization
  • Behavior Modification
  • Contingency Theory
  • Responsiveness
  • Cognitive Learning (S‐S Learning)
  • Social Learning Theory
  • Influencing Learning and Performance
  • Sexual Motivation
  • Electrical Stimulation of the Brain
  • Behavioral Perspective
  • Cognitive Perspective
  • The Need Hierarchy
  • Biological/Physiological Perspective
  • Hunger Motivation
  • Thirst Motivation
  • Cognitive Factors in Emotion
  • Other Theories of Emotion
  • Nonverbal Communication of Emotion
  • Biological/Physiological Factors
  • Early Theories of Emotion
  • Learning Factors in Emotion
  • Effects of Stress
  • Personality Factors and Stress
  • Health‐Related Disciplines
  • Coping with Stress
  • Stress Response Theories
  • The Origins of Stress
  • Development in Infancy and Childhood
  • Developmental Psychology Defined
  • Nature and Nurture
  • Prenatal Development
  • Development in Early & Middle Adulthood
  • Development in Late Adulthood
  • Development in Adolescence
  • Personality Development
  • The Humanistic Perspective
  • Behavioral Approaches
  • Personality Assessment
  • Psychodynamic Perspectives
  • Trait and Type Perspectives
  • Diagnosis of Psychological Disorders
  • Classifying Psychological Disorders
  • Legal Aspects of Psychological Disorders
  • Perspectives on Abnormal Behavior
  • Psychotherapies
  • Biomedical Therapies
  • Institutionalization
  • The Training of Psychotherapists
  • Behavior in Groups
  • Interpersonal Attraction
  • Attributions
  • Social Influence
  • Intelligence Tests
  • Measures of intelligence
  • Other Concepts of Intelligence
  • Encoding Information
  • Memory Storage
  • Memory Retrieval
  • Memory Loss: Forgetting
  • Biological Substrates in Memory
  • Memory Defined
  • Problem Solving
  • Decision Making
  • Culture and Race

Organization of data. Graphical representation of data is typically the first organizational step. Frequency distributions, histograms, and/or frequency polygons are usually prepared in this process.

  • A frequency distribution , often the first organizational step, is an ordered arrangement of all variables, which shows the number of occurrences in each category. Table shows a frequency distribution concerning how much time students spent studying for an exam. Note that the total number tallied (counted) in each category by the researcher equals the number listed in the frequency column.

descriptive statistics in psychology research

Such a frequency distribution can be presented graphically as a frequency histogram or frequency polygon.

  • Frequency histograms are bar graphs. Figure shows a frequency histogram derived from the data in the frequency distribution in Table . The frequency (number of students) determined from the tally is the ordinate (vertical, or Y, axis), and the number of hours studied is the abscissa (horizontal, or X, axis). Each one‐hour interval is presented sequentially, and the height of each bar represents the number of students who studied that number of hours.

descriptive statistics in psychology research

  • Frequency polygons are graphs in which the frequency of occurrence of the variable measured is shown by using connected points rather than bars. Figure shows, in a frequency polygon, the same data displayed in Figure . (Note that if the midpoints of each of the bars in Figure were connected, the result would be this frequency polygon.)

descriptive statistics in psychology research

Measures of central tendency. The three measures of central tendency , the mean, median, and mode, describe a distribution of data and are an index of the average, or typical, value of a distribution of scores.

descriptive statistics in psychology research

  • The median is the point at which 50% of the observations fall below and 50% above or, in other words, the middle number of a set of numbers arranged in ascending or descending order. (If the list includes an even number of categories, the median is the arithmetic average of the middle two numbers.) Based on the data in Table , the full list of each student's study hours would be written 10, 9, 9, 9, 8, 8, 8, 8, and so on. If the list were written out in full, it would be clear that the middle two numbers of the 40 entries are 6 and 6, which average 6. So the median of the hours studied is 6.
  • The mode is the number that appears most often. Based on the data in Table , the mode of the number of hours studied is also 6 (8 students studied for 6 hours, so 6 appears 8 times in the list, more than any other number).

Graphical representations of the measures of central tendency may be presented in frequency polygons that take the form of curves, which may be normal or skewed.

  • Generally, if enough measures are taken of a variable and plotted as a frequency polygon, the result is a normal curve (bell‐shaped curve), or normal distribution (Figure a). The curve is symmetrical, and the mean, median, and mode fall at the highest point on the curve.

descriptive statistics in psychology research

  • Skewed distributions are asymmetrical, with most of the scores grouped toward one end. The mean, median, and mode fall at different points. Distributions may be skewed to the left ( negatively skewed ) (Figure 3b) or to the right ( positively skewed ) (Figure 3c).
  • The frequency distribution termed bimodal, has two peaks, which represent two equal scores of highest frequency. In such a distribution, the mean and median may be at the same point or different points.

Measures of variation. Variability refers to the extent that scores differ from one another and from the mean. Widely used measures of variability are the range, variance, and standard deviation.

  • The range describes the spread of scores in a distribution. It is calculated by subtracting the lowest from the highest score in the distribution. (In the example of hours of study, the range is 10 − 1 = 9 hours.)
  • The variance is a measure of variation from the mean of the squared deviation scores about the means of a distribution. Using the data from Table as an example, the variance for the entire distribution is computed by 
  • determining the mean of the distribution of data

descriptive statistics in psychology research

  • subtracting the mean from each score to determine the deviation score for that item (Table , column 1)

TABLE 2 COMPUTATION OF DEVIATION SCORE, VARIANCE, STANDARD DEVIATION:   HOURS STUDIED FOR AN EXAM

descriptive statistics in psychology research

  • squaring each deviation score (to eliminate minus signs) and multiplying it by the frequency of that score to account for the total number of scores (Table , column 2)
  • summing the results of the previous multiplication step (Table , last entry in column 2) to arrive at the total of all squared deviation scores and dividing by ( N − 1) ( N = number of scores)

descriptive statistics in psychology research

Some variance computations simply use N , but ( N − 1) is considered to produce a more precise measurement. The variance gives one indication of how much the scores differ.

  • The standard deviation (SD) is the square root of the variance.

descriptive statistics in psychology research

Next Inferential Statistics

has been added to your

Reading List!

Removing #book# from your Reading List will also remove any bookmarked pages associated with this title.

Are you sure you want to remove #bookConfirmation# and any corresponding bookmarks?

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 12: Descriptive Statistics

Expressing Your Results

Learning Objectives

  • Write out simple descriptive statistics in American Psychological Association (APA) style.
  • Interpret and create simple APA-style graphs—including bar graphs, line graphs, and scatterplots.
  • Interpret and create simple APA-style tables—including tables of group or condition means and correlation matrixes.

Once you have conducted your descriptive statistical analyses, you will need to present them to others. In this section, we focus on presenting descriptive statistical results in writing, in graphs, and in tables—following American Psychological Association (APA) guidelines for written research reports. These principles can be adapted easily to other presentation formats such as posters and slide show presentations.

Presenting Descriptive Statistics in Writing

When you have a small number of results to report, it is often most efficient to write them out. There are a few important APA style guidelines here. First, statistical results are always presented in the form of numerals rather than words and are usually rounded to two decimal places (e.g., “2.00” rather than “two” or “2”). They can be presented either in the narrative description of the results or parenthetically—much like reference citations. Here are some examples:

The mean age of the participants was 22.43 years with a standard deviation of 2.34.

Among the low self-esteem participants, those in a negative mood expressed stronger intentions to have unprotected sex ( M  = 4.05,  SD  = 2.32) than those in a positive mood ( M  = 2.15,  SD  = 2.27).

The treatment group had a mean of 23.40 ( SD  = 9.33), while the control group had a mean of 20.87 ( SD  = 8.45).

The test-retest correlation was .96.

There was a moderate negative correlation between the alphabetical position of respondents’ last names and their response time ( r  = −.27).

Notice that when presented in the narrative, the terms  mean  and  standard deviation  are written out, but when presented parenthetically, the symbols  M and  SD  are used instead. Notice also that it is especially important to use parallel construction to express similar or comparable results in similar ways. The third example is  much  better than the following nonparallel alternative:

The treatment group had a mean of 23.40 ( SD  = 9.33), while 20.87 was the mean of the control group, which had a standard deviation of 8.45.

Presenting Descriptive Statistics in Graphs

When you have a large number of results to report, you can often do it more clearly and efficiently with a graph. When you prepare graphs for an APA-style research report, there are some general guidelines that you should keep in mind. First, the graph should always add important information rather than repeat information that already appears in the text or in a table. (If a graph presents information more clearly or efficiently, then you should keep the graph and eliminate the text or table.) Second, graphs should be as simple as possible. For example, the  Publication Manual  discourages the use of colour unless it is absolutely necessary (although colour can still be an effective element in posters, slide show presentations, or textbooks.) Third, graphs should be interpretable on their own. A reader should be able to understand the basic result based only on the graph and its caption and should not have to refer to the text for an explanation.

There are also several more technical guidelines for graphs that include the following:

  • The graph should be slightly wider than it is tall.
  • The independent variable should be plotted on the  x- axis and the dependent variable on the  y- axis.
  • Values should increase from left to right on the  x- axis and from bottom to top on the  y- axis.
  • Axis labels should be clear and concise and include the units of measurement if they do not appear in the caption.
  • Axis labels should be parallel to the axis.
  • Legends should appear within the boundaries of the graph.
  • Text should be in the same simple font throughout and differ by no more than four points.
  • Captions should briefly describe the figure, explain any abbreviations, and include the units of measurement if they do not appear in the axis labels.
  • Captions in an APA manuscript should be typed on a separate page that appears at the end of the manuscript. See  Chapter 11 for more information.

As we have seen throughout this book,  bar graphs  are generally used to present and compare the mean scores for two or more groups or conditions. The bar graph in Figure 12.11 is an APA-style version of Figure 12.4. Notice that it conforms to all the guidelines listed. A new element in Figure 12.11 is the smaller vertical bars that extend both upward and downward from the top of each main bar. These are error bars , and they represent the variability in each group or condition. Although they sometimes extend one standard deviation in each direction, they are more likely to extend one standard error in each direction (as in Figure 12.11). The  standard error  is the standard deviation of the group divided by the square root of the sample size of the group. The standard error is used because, in general, a difference between group means that is greater than two standard errors is statistically significant. Thus one can “see” whether a difference is statistically significant based on a bar graph with error bars.

Sample APA-style bar graph. Long description available.

Line Graphs

Line graphs  are used to present correlations between quantitative variables when the independent variable has, or is organized into, a relatively small number of distinct levels. Each point in a line graph represents the mean score on the dependent variable for participants at one level of the independent variable. Figure 12.12 is an APA-style version of the results of Carlson and Conard. Notice that it includes error bars representing the standard error and conforms to all the stated guidelines.

Sample APA-style line graph. Long description available.

In most cases, the information in a line graph could just as easily be presented in a bar graph. In Figure 12.12, for example, one could replace each point with a bar that reaches up to the same level and leave the error bars right where they are. This emphasizes the fundamental similarity of the two types of statistical relationship. Both are differences in the average score on one variable across levels of another. The convention followed by most researchers, however, is to use a bar graph when the variable plotted on the  x- axis is categorical and a line graph when it is quantitative.

Scatterplots

Scatterplots  are used to present relationships between quantitative variables when the variable on the  x- axis (typically the independent variable) has a large number of levels. Each point in a scatterplot represents an individual rather than the mean for a group of individuals, and there are no lines connecting the points. The graph in Figure 12.13 is an APA-style version of Figure 12.7, which illustrates a few additional points. First, when the variables on the x- axis and  y -axis are conceptually similar and measured on the same scale—as here, where they are measures of the same variable on two different occasions—this can be emphasized by making the axes the same length. Second, when two or more individuals fall at exactly the same point on the graph, one way this can be indicated is by offsetting the points slightly along the  x- axis. Other ways are by displaying the number of individuals in parentheses next to the point or by making the point larger or darker in proportion to the number of individuals. Finally, the straight line that best fits the points in the scatterplot, which is called the regression line, can also be included.

Sample APA-style scatterplot. Long description available.

Expressing Descriptive Statistics in Tables

Like graphs, tables can be used to present large amounts of information clearly and efficiently. The same general principles apply to tables as apply to graphs. They should add important information to the presentation of your results, be as simple as possible, and be interpretable on their own. Again, we focus here on tables for an APA-style manuscript.

The most common use of tables is to present several means and standard deviations—usually for complex research designs with multiple independent and dependent variables. Figure 12.14, for example, shows the results of a hypothetical study similar to the one by MacDonald and Martineau (2002) [1] discussed in  Chapter 5 . (The means in Figure 12.14 are the means reported by MacDonald and Martineau, but the standard errors are not). Recall that these researchers categorized participants as having low or high self-esteem, put them into a negative or positive mood, and measured their intentions to have unprotected sex. Although not mentioned in  Chapter 5 , they also measured participants’ attitudes toward unprotected sex. Notice that the table includes horizontal lines spanning the entire table at the top and bottom, and just beneath the column headings. Furthermore, every column has a heading—including the leftmost column—and there are additional headings that span two or more columns that help to organize the information and present it more efficiently. Finally, notice that APA-style tables are numbered consecutively starting at 1 (Table 1, Table 2, and so on) and given a brief but clear and descriptive title.

Sample APA-style table presenting means and standard deviations. Long description available.

Another common use of tables is to present correlations—usually measured by Pearson’s  r —among several variables. This kind of table is called a  correlation matrix . Figure 12.15 is a correlation matrix based on a study by David McCabe and colleagues (McCabe, Roediger, McDaniel, Balota, & Hambrick, 2010) [2] . They were interested in the relationships between working memory and several other variables. We can see from the table that the correlation between working memory and executive function, for example, was an extremely strong .96, that the correlation between working memory and vocabulary was a medium .27, and that all the measures except vocabulary tend to decline with age. Notice here that only half the table is filled in because the other half would have identical values. For example, the Pearson’s  r  value in the upper right corner (working memory and age) would be the same as the one in the lower left corner (age and working memory). The correlation of a variable with itself is always 1.00, so these values are replaced by dashes to make the table easier to read.

Sample APA-style table (correlation matrix). Long description available.

As with graphs, precise statistical results that appear in a table do not need to be repeated in the text. Instead, the writer can note major trends and alert the reader to details (e.g., specific correlations) that are of particular interest.

Key Takeaways

  • In an APA-style article, simple results are most efficiently presented in the text, while more complex results are most efficiently presented in graphs or tables.
  • APA style includes several rules for presenting numerical results in the text. These include using words only for numbers less than 10 that do not represent precise statistical results, and rounding results to two decimal places, using words (e.g., “mean”) in the text and symbols (e.g., “ M ”) in parentheses.
  • APA style includes several rules for presenting results in graphs and tables. Graphs and tables should add information rather than repeating information, be as simple as possible, and be interpretable on their own with a descriptive caption (for graphs) or a descriptive title (for tables).

Long Descriptions

“Convincing” long description: A four-panel comic strip. In the first panel, a man says to a woman, “I think we should give it another shot.” The woman says, “We should break up, and I can prove it.”

In the second panel, there is a line graph with a downward trend titled “Our Relationship.”

In the third panel, the man, bent over and looking at the graph in the woman’s hands, says, “Huh.”

In the fourth panel, the man says, “Maybe you’re right.” The woman says, “I knew data would convince you.” The man replies, “No, I just think I can do better than someone who doesn’t label her axes.” [Return to “Convincing”]

Figure 12.11 long description: A sample APA-style bar graph, with a horizontal axis labelled “Condition” and a vertical axis labelled “Clinician Rating of Severity.” The caption of the graph says, “Figure X. Mean clinician’s rating of phobia severity for participants receiving the education treatment and the exposure treatment. Error bars represent standard errors.” At the top of each data bar is an error bar, which look likes a capital I: a vertical line with short horizontal lines attached to its top and bottom. The bottom half of each error bar hangs over the top of the data bar, while each top half sticks out the top of the data bar. [Return to Figure 12.11]

Figure 12.12 long description: A sample APA-style line graph with a horizontal axis labelled “Last Name Quartile” and a vertical axis labelled “Response Times (z Scores).” The caption of the graph says, “Figure X. Mean response time by the alphabetical position of respondents’ names in the alphabet. Response times are expressed as z scores. Error bars represent standard errors.” Each data point has an error bar sticking out of its top and bottom. [Return to Figure 12.12]

Figure 12.13 long description: Sample APA-style scatterplot with a horizontal axis labelled “Time 1” and a vertical axis labelled “Time 2.” Each axis has values from 10 to 30. The caption of the scatterplot says, “Figure X. Relationship between scores on the Rosenberg self-esteem scale taken by 25 research methods students on two occasions one week apart. Pearson’s r = .96.” Most of the data points are clustered around the dashed regression line that extends from approximately (12, 11) to (29, 22). [Return to Figure 12.13]

Figure 12.14 long description: Sample APA-style table presenting means and standard deviations. The table is titled “Table X” and is captioned, “Means and Standard Deviations of Intentions to Have Unprotected Sex and Attitudes Toward Unprotected Sex as a Function of Both Mood and Self-Esteem.” The data is organized into negative and positive mood and details intentions and attitudes toward unprotected sex.

Negative mood:

  • High—Mean, 2.46
  • High—Standard Deviation, 1.97
  • Low—Mean, 4.05
  • Low—Standard Deviation, 2.32
  • High—Mean, 1.65
  • High—Standard Deviation, 2.23
  • Low—Mean, 1.95
  • Low—Standard Deviation, 2.01

Positive mood:

  • High—Mean, 2.45
  • High—Standard Deviation, 2.00
  • Low—Mean, 2.15
  • Low—Standard Deviation, 2.27
  • High—Mean, 1.82
  • High—Standard Deviation, 2.32
  • Low—Mean, 1.23
  • Low—Standard Deviation, 1.75

[Return to Figure 12.14]

Figure 12.15 long description: Sample APA-style correlation matrix, titled “Table X: Correlations Between Five Cognitive Variables and Age.” The five cognitive variables are:

  • Working memory
  • Executive function
  • Processing speed
  • Episodic memory

The data is as such:

Media Attributions

  • Convincing by XKCD  CC BY-NC (Attribution NonCommercial)
  • MacDonald, T. K., & Martineau, A. M. (2002). Self-esteem, mood, and intentions to use condoms: When does low self-esteem lead to risky health behaviours? Journal of Experimental Social Psychology, 38 , 299–306. ↵
  • McCabe, D. P., Roediger, H. L., McDaniel, M. A., Balota, D. A., & Hambrick, D. Z. (2010). The relationship between working memory capacity and executive functioning. Neuropsychology, 24 (2), 222–243. doi:10.1037/a0017619 ↵
  • Buss, D. M., & Schmitt, D. P. (1993). Sexual strategies theory: A contextual evolutionary analysis of human mating. Psychological Review, 100 , 204–232. ↵

A figure in which the heights of the bars represent the group means.

Small bars at the top of each main bar in a bar graph that represent the variability in each group or condition.

The standard deviation of the group divided by the square root of the sample size of the group.

A graph used to present correlations between quantitative variables when the independent variable has, or is organized into, a relatively small number of distinct levels.

A graph which shows correlations between quantitative variables; each point represents one person’s score on both variables.

A table showing the correlation between every possible pair of variables in the study.

Research Methods in Psychology - 2nd Canadian Edition by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

descriptive statistics in psychology research

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

Descriptive Statistics | Definitions, Types, Examples

Published on 4 November 2022 by Pritha Bhandari . Revised on 9 January 2023.

Descriptive statistics summarise and organise characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population .

In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

The next step is inferential statistics , which help you decide whether your data confirms or refutes your hypothesis and whether it is generalisable to a larger population.

Table of contents

Types of descriptive statistics, frequency distribution, measures of central tendency, measures of variability, univariate descriptive statistics, bivariate descriptive statistics, frequently asked questions.

There are 3 main types of descriptive statistics:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability or dispersion concerns how spread out the values are.

Types of descriptive statistics

You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and multivariate analysis.

  • Go to a library
  • Watch a movie at a theater
  • Visit a national park

A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarise the frequency of every possible value of a variable in numbers or percentages.

  • Simple frequency distribution table
  • Grouped frequency distribution table

From this table, you can see that more women than men or people with another gender identity took part in the study. In a grouped frequency distribution, you can group numerical response values and add up the number of responses for each group. You can also convert each of these numbers to percentages.

Measures of central tendency estimate the center, or average, of a data set. The mean , median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using the first 6 responses of our survey.

The mean , or M , is the most commonly used method for finding the average.

To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N .

The median is the value that’s exactly in the middle of a data set.

To find the median, order each response value from the smallest to the biggest. Then, the median is the number in the middle. If there are two numbers in the middle, find their mean.

The mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode.

To find the mode, order your data set from lowest to highest and find the response that occurs most frequently.

Measures of variability give you a sense of how spread out the response values are. The range, standard deviation and variance each reflect different aspects of spread.

The range gives you an idea of how far apart the most extreme response scores are. To find the range , simply subtract the lowest value from the highest value.

Standard deviation

The standard deviation ( s ) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:

  • List each score and find their mean.
  • Subtract the mean from each score to get the deviation from the mean.
  • Square each of these deviations.
  • Add up all of the squared deviations.
  • Divide the sum of the squared deviations by N – 1.
  • Find the square root of the number you found.

Step 5: 421.5/5 = 84.3

Step 6: √84.3 = 9.18

The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.

To find the variance, simply square the standard deviation. The symbol for variance is s 2 .

Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread. Programs like SPSS and Excel can be used to easily calculate these.

If you were to only consider the mean as a measure of central tendency, your impression of the ‘middle’ of the data set can be skewed by outliers, unlike the median or mode.

Likewise, while the range is sensitive to extreme values, you should also consider the standard deviation and variance to get easily comparable measures of spread.

If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests .

Multivariate analysis is the same as bivariate analysis but with more than two variables.

Contingency table

In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read ‘across’ the table to see how the independent and dependent variables relate to each other.

Interpreting a contingency table is easier when the raw data is converted to percentages. Percentages make each row comparable to the other by making it seem as if each group had only 100 observations or participants. When creating a percentage-based contingency table, you add the N for each independent variable on the end.

From this table, it is more clear that similar proportions of children and adults go to the library over 17 times a year. Additionally, children most commonly went to the library between 5 and 8 times, while for adults, this number was between 13 and 16.

Scatter plots

A scatter plot is a chart that shows you the relationship between two or three variables. It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

From your scatter plot, you see that as the number of movies seen at movie theaters increases, the number of visits to the library decreases. Based on your visual assessment of a possible linear relationship, you perform further tests of correlation and regression.

Descriptive statistics: Scatter plot

Descriptive statistics summarise the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalisable to the broader population.

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.
  • Univariate statistics summarise only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2023, January 09). Descriptive Statistics | Definitions, Types, Examples. Scribbr. Retrieved 12 March 2024, from https://www.scribbr.co.uk/stats/descriptive-statistics-explained/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, data collection methods | step-by-step guide & examples, variability | calculating range, iqr, variance, standard deviation, normal distribution | examples, formulas, & uses.

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Descriptive Research in Psychology

Sometimes you need to dig deeper than the pure statistics

John Loeppky is a freelance journalist based in Regina, Saskatchewan, Canada, who has written about disability and health for outlets of all kinds.

descriptive statistics in psychology research

FG Trade / E+/ Getty

Types of Descriptive Research and the Methods Used

  • Advantages & Limitations of Descriptive Research

Best Practices for Conducting Descriptive Research

Descriptive research is one of the key tools needed in any psychology researcher’s toolbox in order to create and lead a project that is both equitable and effective. Because psychology, as a field, loves definitions, let’s start with one. The University of Minnesota’s Introduction to Psychology defines this type of research as one that is “...designed to provide a snapshot of the current state of affairs.” That's pretty broad, so what does that mean in practice? Dr. Heather Derry-Vick (PhD) , an assistant professor in psychiatry at Hackensack Meridian School of Medicine, helps us put it into perspective. "Descriptive research really focuses on defining, understanding, and measuring a phenomenon or an experience," she says. "Not trying to change a person's experience or outcome, or even really looking at the mechanisms for why that might be happening, but more so describing an experience or a process as it unfolds naturally.”

Within the descriptive research methodology there are multiple types, including the following.

Descriptive Survey Research

This involves going beyond a typical tool like a LIkert Scale —where you typically place your response to a prompt on a one to five scale. We already know that scales like this can be ineffective, particularly when studying pain, for example.

When that's the case, using a descriptive methodology can help dig deeper into how a person is thinking, feeling, and acting rather than simply quantifying it in a way that might be unclear or confusing.

Descriptive Observational Research

Think of observational research like an ethically-focused version of people-watching. One example would be watching the patterns of children on a playground—perhaps when looking at a concept like risky play or seeking to observe social behaviors between children of different ages.

Descriptive Case Study Research

A descriptive approach to a case study is akin to a biography of a person, honing in on the experiences of a small group to extrapolate to larger themes. We most commonly see descriptive case studies when those in the psychology field are using past clients as an example to illustrate a point.

Correlational Descriptive Research

While descriptive research is often about the here and now, this form of the methodology allows researchers to make connections between groups of people. As an example from her research, Derry-Vick says she uses this method to identify how gender might play a role in cancer scan anxiety, aka scanxiety.

Dr. Derry-Vick's research uses surveys and interviews to get a sense of how cancer patients are feeling and what they are experiencing both in the course of their treatment and in the lead-up to their next scan, which can be a significant source of stress.

David Marlon, PsyD, MBA , who works as a clinician and as CEO at Vegas Stronger, and whose research focused on leadership styles at community-based clinics, says that using descriptive research allowed him to get beyond the numbers.

In his case, that includes data points like how many unhoused people found stable housing over a certain period or how many people became drug-free—and identify the reasons for those changes.

Those [data points] are some practical, quantitative tools that are helpful. But when I question them on how safe they feel, when I question them on the depth of the bond or the therapeutic alliance, when I talk to them about their processing of traumas,  wellbeing...these are things that don't really fall on to a yes, no, or even on a Likert scale.

For the portion of his thesis that was focused on descriptive research, Marlon used semi-structured interviews to look at the how and the why of transformational leadership and its impact on clinics’ clients and staff.

Advantages & Limitations of Descriptive Research

So, if the advantages of using descriptive research include that it centers the research participants, gives us a clear picture of what is happening to a person in a particular moment,  and gives us very nuanced insights into how a particular situation is being perceived by the very person affected, are there drawbacks? Yes, there are. Dr. Derry-Vick says that it’s important to keep in mind that just because descriptive research tells us something is happening doesn’t mean it necessarily leads us to the resolution of a given problem.

I think that, by design, the descriptive research might not tell you why a phenomenon is happening. So it might tell you, very well, how often it's happening, or what the levels are, or help you understand it in depth. But that may or may not always tell you information about the causes or mechanisms for why something is happening.

Another limitation she identifies is that it also can’t tell you, on its own, whether a particular treatment pathway is having the desired effect.

“Descriptive research in and of itself can't really tell you whether a specific approach is going to be helpful until you take in a different approach to actually test it.”

Marlon, who believes in a multi-disciplinary approach, says that his subfield—addictions—is one where descriptive research had its limits, but helps readers go beyond preconceived notions of what addictions treatment looks and feels like when it is effective. “If we talked to and interviewed and got descriptive information from the clinicians and the clients, a much more precise picture would be painted, showing the need for a client's specific multidisciplinary approach augmented with a variety of modalities," he says. "If you tried to look at my discipline in a pure quantitative approach , it wouldn't begin to tell the real story.”

Because you’re controlling far fewer variables than other forms of research, it’s important to identify whether those you are describing, your study participants, should be informed that they are part of a study.

For example, if you’re observing and describing who is buying what in a grocery store to identify patterns, then you might not need to identify yourself.

However, if you’re asking people about their fear of certain treatment, or how their marginalized identities impact their mental health in a particular way, there is far more of a pressure to think deeply about how you, as the researcher, are connected to the people you are researching.

Many descriptive research projects use interviews as a form of research gathering and, as a result, descriptive research that is focused on this type of data gathering also has ethical and practical concerns attached. Thankfully, there are plenty of guides from established researchers about how to best conduct these interviews and/or formulate surveys .

While descriptive research has its limits, it is commonly used by researchers to get a clear vantage point on what is happening in a given situation.

Tools like surveys, interviews, and observation are often employed to dive deeper into a given issue and really highlight the human element in psychological research. At its core, descriptive research is rooted in a collaborative style that allows deeper insights when used effectively.

University of Minnesota. Introduction to Psychology .

By John Loeppky John Loeppky is a freelance journalist based in Regina, Saskatchewan, Canada, who has written about disability and health for outlets of all kinds.

2.6 Analyzing the Data

Learning objectives.

  • Distinguish between descriptive and inferential statistics
  • Identify the different kinds of descriptive statistics researchers use to summarize their data
  • Describe the purpose of inferential statistics.
  • Distinguish between Type I and Type II errors.

Once the study is complete and the observations have been made and recorded the researchers need to analyze the data and draw their conclusions. Typically, data are analyzed using both descriptive and inferential statistics. Descriptive statistics are used to summarize the data and inferential statistics are used to generalize the results from the sample to the population. In turn, inferential statistics are used to make conclusions about whether or not a theory has been supported, refuted, or requires modification.

Descriptive Statistics

Descriptive statistics are used to organize or summarize a set of data. Examples include percentages, measures of central tendency (mean, median, mode), measures of dispersion (range, standard deviation, variance), and correlation coefficients.

Measures of central tendency are used to describe the typical, average and center of a distribution of scores. The mode is the most frequently occurring score in a distribution. The median is the midpoint of a distribution of scores. The mean is the average of a distribution of scores.

Measures of dispersion are also considered descriptive statistics. They are used to describe the degree of spread in a set of scores. So are all of the scores similar and clustered around the mean or is there a lot of variability in the scores? The range is a measure of dispersion that measures the distance between the highest and lowest scores in a distribution. The standard deviation is a more sophisticated measure of dispersion that measures the average distance of scores from the mean. The variance is just the standard deviation squared. So it also measures the distance of scores from the mean but in a different unit of measure.

Typically means and standard deviations are computed for experimental research studies in which an independent variable was manipulated to produce two or more groups and a dependent variable was measured quantitatively. The means from each experimental group or condition are calculated separately and are compared to see if they differ.

For nonexperimental research, simple percentages may be computed to describe the percentage of people who engaged in some behavior or held some belief. But more commonly nonexperimental research involves computing the correlation between two variables. A correlation coefficient  describes the strength and direction of the relationship between two variables. The values of a correlation coefficient can range from −1.00 (the strongest possible negative relationship) to +1.00 (the strongest possible positive relationship). A value of 0 means there is no relationship between the two variables. Positive correlation coefficients indicate that as the values of one variable increase, so do the values of the other variable. A good example of a positive correlation is the correlation between height and weight. Negative correlation coefficients indicate that as the value of one variable increase, the values of the other variable decrease. An example of a negative correlation is the correlation between stressful life events and happiness; because as stress increases, happiness is likely to decrease.

Inferential Statistics

As you learned in the section of this chapter on sampling, typically researchers sample from a population but ultimately they want to be able to generalize their results from the sample to a broader population. Researchers typically want to infer what the population is like based on the sample they studied. Inferential statistics are used for that purpose. Inferential statistics allow researchers to draw conclusions about a population based on data from a sample. Inferential statistics are crucial because the effects (i.e., the differences in the means or the correlation coefficient) that researchers find in a study may be due simply to random chance variability or they may be due to a real effect (i.e., they may reflect a real relationship between variables or a real effect of an independent variable on a dependent variable).

Researchers use inferential statistics to determine whether their effects are statistically significant. A statistically significant effect is one that is unlikely due to random chance and therefore likely represents a real effect in the population. More specifically results that have less than a 5% chance of being due to random error are typically considered statistically significant. When an effect is statistically significant it is appropriate to generalize the results from the sample to the population. In contrast, if inferential statistics reveal that there is more than a 5% chance that an effect could be due to chance error alone then the researcher must conclude that his/her result is not statistically significant.

It is important to keep in mind that statistics are probabilistic in nature. They allow researchers to determine whether the chances are low that their results are due to random error, but they don’t provide any absolute certainty. Hopefully, when we conclude that an effect is statistically significant it is a real effect that we would find if we tested the entire population. And hopefully when we conclude that an effect is not statistically significant there really is no effect and if we tested the entire population we would find no effect. And that 5% threshold is set at 5% to ensure that there is a high probability that we make a correct decision and that our determination of statistical significance is an accurate reflection of reality.

But mistakes can always be made. Specifically, two kinds of mistakes can be made. First, researchers can make a Type I error , which is a false positive. It is when a researcher concludes that his/her results are statistically significant (so they say there is an effect in the population) when in reality there is no real effect in the population and the results are just due to chance (they are a fluke). When the threshold is set to 5%, which is the convention, then the researcher has a 5% chance or less of making a Type I error. You might wonder why researchers don’t set it even lower to reduce the chances of making a Type I error. The reason is when the chances of making a Type I error are reduced, the chances of making a Type II error are increased. A Type II error  is a missed opportunity. It is when a researcher concludes that his/her results are not statistically significant when in reality there is a real effect in the population and they just missed detecting it.

Creative Commons License

Share This Book

  • Increase Font Size

descriptive statistics in psychology research

Chapter 12 Descriptive Statistics

At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single variables, followed by some of the most common techniques for describing statistical relationships between variables. We then look at how to present descriptive statistics in writing and also in the form of tables and graphs that would be appropriate for an American Psychological Association (APA)-style research report. We end with some practical advice for organizing and carrying out your analyses.

12.1 Describing Single Variables

Learning objectives.

  • Use frequency tables and histograms to display and interpret the distribution of a variable.
  • Compute and interpret the mean, median, and mode of a distribution and identify situations in which the mean, median, or mode is the most appropriate measure of central tendency.
  • Compute and interpret the range and standard deviation of a distribution.
  • Compute and interpret percentile ranks and z scores.

Descriptive statistics A set of techniques for summarizing and displaying data. refers to a set of techniques for summarizing and displaying data. Let us assume here that the data are quantitative and consist of scores on one or more variables for each of several study participants. Although in most cases the primary research question will be about one or more statistical relationships between variables, it is also important to describe each variable individually. For this reason, we begin by looking at some of the most common techniques for describing single variables.

The Distribution of a Variable

Every variable has a distribution The way the scores on a variable are distributed across the levels of that variable. , which is the way the scores are distributed across the levels of that variable. For example, in a sample of 100 college students, the distribution of the variable “number of siblings” might be such that 10 of them have no siblings, 30 have one sibling, 40 have two siblings, and so on. In the same sample, the distribution of the variable “sex” might be such that 44 have a score of “male” and 56 have a score of “female.”

Frequency Tables

One way to display the distribution of a variable is in a frequency table A table for displaying the distribution of a variable. The first column lists the values of the variable, and the second column lists the frequency of each score. . Table 12.1 "Frequency Table Showing a Hypothetical Distribution of Scores on the Rosenberg Self-Esteem Scale" , for example, is a frequency table showing a hypothetical distribution of scores on the Rosenberg Self-Esteem Scale for a sample of 40 college students. The first column lists the values of the variable—the possible scores on the Rosenberg scale—and the second column lists the frequency of each score. This table shows that there were three students who had self-esteem scores of 24, five who had self-esteem scores of 23, and so on. From a frequency table like this, one can quickly see several important aspects of a distribution, including the range of scores (from 15 to 24), the most and least common scores (22 and 17, respectively), and any extreme scores that stand out from the rest.

Table 12.1 Frequency Table Showing a Hypothetical Distribution of Scores on the Rosenberg Self-Esteem Scale

There are a few other points worth noting about frequency tables. First, the levels listed in the first column usually go from the highest at the top to the lowest at the bottom, and they usually do not extend beyond the highest and lowest scores in the data. For example, although scores on the Rosenberg scale can vary from a high of 30 to a low of 0, Table 12.1 "Frequency Table Showing a Hypothetical Distribution of Scores on the Rosenberg Self-Esteem Scale" only includes levels from 24 to 15 because that range includes all the scores in this particular data set. Second, when there are many different scores across a wide range of values, it is often better to create a grouped frequency table, in which the first column lists ranges of values and the second column lists the frequency of scores in each range. Table 12.2 "A Grouped Frequency Table Showing a Hypothetical Distribution of Reaction Times" , for example, is a grouped frequency table showing a hypothetical distribution of simple reaction times for a sample of 20 participants. In a grouped frequency table, the ranges must all be of equal width, and there are usually between five and 15 of them. Finally, frequency tables can also be used for categorical variables, in which case the levels are category labels. The order of the category labels is somewhat arbitrary, but they are often listed from the most frequent at the top to the least frequent at the bottom.

Table 12.2 A Grouped Frequency Table Showing a Hypothetical Distribution of Reaction Times

A histogram A graph for displaying the distribution of a variable. The x- axis represents the values of the variable, and the y- axis represents the frequency of each score. is a graphical display of a distribution. It presents the same information as a frequency table but in a way that is even quicker and easier to grasp. The histogram in Figure 12.1 "Histogram Showing the Distribution of Self-Esteem Scores Presented in " presents the distribution of self-esteem scores in Table 12.1 "Frequency Table Showing a Hypothetical Distribution of Scores on the Rosenberg Self-Esteem Scale" . The x- axis of the histogram represents the variable and the y- axis represents frequency. Above each level of the variable on the x- axis is a vertical bar that represents the number of individuals with that score. When the variable is quantitative, as in this example, there is usually no gap between the bars. When the variable is categorical, however, there is usually a small gap between them. (The gap at 17 in this histogram reflects the fact that there were no scores of 17 in this data set.)

Figure 12.1 Histogram Showing the Distribution of Self-Esteem Scores Presented in Table 12.1 "Frequency Table Showing a Hypothetical Distribution of Scores on the Rosenberg Self-Esteem Scale"

descriptive statistics in psychology research

Distribution Shapes

When the distribution of a quantitative variable is displayed in a histogram, it has a shape. The shape of the distribution of self-esteem scores in Figure 12.1 "Histogram Showing the Distribution of Self-Esteem Scores Presented in " is typical. There is a peak somewhere near the middle of the distribution and “tails” that taper in either direction from the peak. The distribution of Figure 12.1 "Histogram Showing the Distribution of Self-Esteem Scores Presented in " is unimodal, meaning it has one distinct peak, but distributions can also be bimodal, meaning they have two distinct peaks. Figure 12.2 "Histogram Showing a Hypothetical Bimodal Distribution of Scores on the Beck Depression Inventory" , for example, shows a hypothetical bimodal distribution of scores on the Beck Depression Inventory. Distributions can also have more than two distinct peaks, but these are relatively rare in psychological research.

Figure 12.2 Histogram Showing a Hypothetical Bimodal Distribution of Scores on the Beck Depression Inventory

descriptive statistics in psychology research

Another characteristic of the shape of a distribution is whether it is symmetrical or skewed. The distribution in the center of Figure 12.3 "Histograms Showing Negatively Skewed, Symmetrical, and Positively Skewed Distributions" is symmetrical Refers to a distribution in which the left and right sides are near mirror images of each other. . Its left and right halves are mirror images of each other. The distribution on the left is negatively skewed Refers to an asymmetrical distribution. A positively skewed distribution has a relatively long positive tail, and a negatively skewed distribution has a relatively long negative tail. , with its peak shifted toward the upper end of its range and a relatively long negative tail. The distribution on the right is positively skewed, with its peak toward the lower end of its range and a relatively long positive tail.

Figure 12.3 Histograms Showing Negatively Skewed, Symmetrical, and Positively Skewed Distributions

descriptive statistics in psychology research

An outlier An extreme score that is far removed from the rest of the scores in the distribution. is an extreme score that is much higher or lower than the rest of the scores in the distribution. Sometimes outliers represent truly extreme scores on the variable of interest. For example, on the Beck Depression Inventory, a single clinically depressed person might be an outlier in a sample of otherwise happy and high-functioning peers. However, outliers can also represent errors or misunderstandings on the part of the researcher or participant, equipment malfunctions, or similar problems. We will say more about how to interpret outliers and what to do about them later in this chapter.

Measures of Central Tendency and Variability

It is also useful to be able to describe the characteristics of a distribution more precisely. Here we look at how to do this in terms of two important characteristics: their central tendency and their variability.

Central Tendency

The central tendency The middle of a distribution. The mean, median, and mode are measures of central tendency. of a distribution is its middle—the point around which the scores in the distribution tend to cluster. (Another term for central tendency is average .) Looking back at Figure 12.1 "Histogram Showing the Distribution of Self-Esteem Scores Presented in " , for example, we can see that the self-esteem scores tend to cluster around the values of 20 to 22. Here we will consider the three most common measures of central tendency: the mean, the median, and the mode.

The mean The most common measure of central tendency. The sum of the scores divided by the number of scores. of a distribution (symbolized M ) is the sum of the scores divided by the number of scores. As a formula, it looks like this:

In this formula, the symbol Σ (the Greek letter sigma) is the summation sign and means to sum across the values of the variable X . N represents the number of scores. The mean is by far the most common measure of central tendency, and there are some good reasons for this. It usually provides a good indication of the central tendency of a distribution, and it is easily understood by most people. In addition, the mean has statistical properties that make it especially useful in doing inferential statistics.

An alternative to the mean is the median. The median A measure of central tendency. The value such that half the scores in the distribution are lower than it and half are higher than it. is the middle score in the sense that half the scores in the distribution are less than it and half are greater than it. The simplest way to find the median is to organize the scores from lowest to highest and locate the score in the middle. Consider, for example, the following set of seven scores:

To find the median, simply rearrange the scores from lowest to highest and locate the one in the middle.

In this case, the median is 4 because there are three scores lower than 4 and three scores higher than 4. When there is an even number of scores, there are two scores in the middle of the distribution, in which case the median is the value halfway between them. For example, if we were to add a score of 15 to the preceding data set, there would be two scores (both 4 and 8) in the middle of the distribution, and the median would be halfway between them (6).

One final measure of central tendency is the mode. The mode A measure of central tendency. The most frequently occurring score in the distribution. is the most frequent score in a distribution. In the self-esteem distribution presented in Table 12.1 "Frequency Table Showing a Hypothetical Distribution of Scores on the Rosenberg Self-Esteem Scale" and Figure 12.1 "Histogram Showing the Distribution of Self-Esteem Scores Presented in " , for example, the mode is 22. More students had that score than any other. The mode is the only measure of central tendency that can also be used for categorical variables.

In a distribution that is both unimodal and symmetrical, the mean, median, and mode will be very close to each other at the peak of the distribution. In a bimodal or asymmetrical distribution, the mean, median, and mode can be quite different. In a bimodal distribution, the mean and median will tend to be between the peaks, while the mode will be at the tallest peak. In a skewed distribution, the mean will differ from the median in the direction of the skew (i.e., the direction of the longer tail). For highly skewed distributions, the mean can be pulled so far in the direction of the skew that it is no longer a good measure of the central tendency of that distribution. Imagine, for example, a set of four simple reaction times of 200, 250, 280, and 250 milliseconds (ms). The mean is 245 ms. But the addition of one more score of 5,000 ms—perhaps because the participant was not paying attention—would raise the mean to 1,445 ms. Not only is this measure of central tendency greater than 80% of the scores in the distribution, but it also does not seem to represent the behavior of anyone in the distribution very well. This is why researchers often prefer the median for highly skewed distributions (such as distributions of reaction times).

Keep in mind, though, that you are not required to choose a single measure of central tendency in analyzing your data. Each one provides slightly different information, and all of them can be useful.

Measures of Variability

The variability The extent to which the scores in a distribution vary around their central tendency. of a distribution is the extent to which the scores vary around their central tendency. Consider the two distributions in Figure 12.4 "Histograms Showing Hypothetical Distributions With the Same Mean, Median, and Mode (10) but With Low Variability (Top) and High Variability (Bottom)" , both of which have the same central tendency. The mean, median, and mode of each distribution are 10. Notice, however, that the two distributions differ in terms of their variability. The top one has relatively low variability, with all the scores relatively close to the center. The bottom one has relatively high variability, with the scores are spread across a much greater range.

Figure 12.4 Histograms Showing Hypothetical Distributions With the Same Mean, Median, and Mode (10) but With Low Variability (Top) and High Variability (Bottom)

descriptive statistics in psychology research

One simple measure of variability is the range A measure of variability. The difference between the highest and lowest scores in the distribution. , which is simply the difference between the highest and lowest scores in the distribution. The range of the self-esteem scores in Table 12.1 "Frequency Table Showing a Hypothetical Distribution of Scores on the Rosenberg Self-Esteem Scale" , for example, is the difference between the highest score (24) and the lowest score (15). That is, the range is 24 − 15 = 9. Although the range is easy to compute and understand, it can be misleading when there are outliers. Imagine, for example, an exam on which all the students scored between 90 and 100. It has a range of 10. But if there was a single student who scored 20, the range would increase to 80—giving the impression that the scores were quite variable when in fact only one student differed substantially from the rest.

By far the most common measure of variability is the standard deviation. The standard deviation The most common measure of variability. The square root of the mean of the squared differences between the scores and the mean. Also the square root of the variance. of a distribution is, roughly speaking, the average distance between the scores and the mean. For example, the standard deviations of the distributions in Figure 12.4 "Histograms Showing Hypothetical Distributions With the Same Mean, Median, and Mode (10) but With Low Variability (Top) and High Variability (Bottom)" are 1.69 for the top distribution and 4.30 for the bottom one. That is, while the scores in the top distribution differ from the mean by about 1.69 units on average, the scores in the bottom distribution differ from the mean by about 4.30 units on average.

Computing the standard deviation involves a slight complication. Specifically, it involves finding the difference between each score and the mean, squaring each difference, finding the mean of these squared differences, and finally finding the square root of that mean. The formula looks like this:

descriptive statistics in psychology research

The computations for the standard deviation are illustrated for a small set of data in Table 12.3 "Computations for the Standard Deviation" . The first column is a set of eight scores that has a mean of 5. The second column is the difference between each score and the mean. The third column is the square of each of these differences. Notice that although the differences can be negative, the squared differences are always positive—meaning that the standard deviation is always positive. At the bottom of the third column is the mean of the squared differences, which is also called the variance A measure of variability. The mean of the squared differences between the scores and the mean. Also the square of the standard deviation. (symbolized SD 2 ). Although the variance is itself a measure of variability, it generally plays a larger role in inferential statistics than in descriptive statistics. Finally, below the variance is the square root of the variance, which is the standard deviation.

Table 12.3 Computations for the Standard Deviation

If you have already taken a statistics course, you may have learned to divide the sum of the squared differences by N − 1 rather than by N when you compute the variance and standard deviation. Why is this?

By definition, the standard deviation is the square root of the mean of the squared differences. This implies dividing the sum of squared differences by N , as in the formula just presented. Computing the standard deviation this way is appropriate when your goal is simply to describe the variability in a sample. And learning it this way emphasizes that the variance is in fact the mean of the squared differences—and the standard deviation is the square root of this mean .

However, most calculators and software packages divide the sum of squared differences by N − 1. This is because the standard deviation of a sample tends to be a bit lower than the standard deviation of the population the sample was selected from. Dividing the sum of squares by N − 1 corrects for this tendency and results in a better estimate of the population standard deviation. Because researchers generally think of their data as representing a sample selected from a larger population—and because they are generally interested in drawing conclusions about the population—it makes sense to routinely apply this correction.

Percentile Ranks and z Scores

In many situations, it is useful to have a way to describe the location of an individual score within its distribution. One approach is the percentile rank. The percentile rank A measure of the location of a score within its distribution. The percentage of scores below a particular score. of a score is the percentage of scores in the distribution that are lower than that score. Consider, for example, the distribution in Table 12.1 "Frequency Table Showing a Hypothetical Distribution of Scores on the Rosenberg Self-Esteem Scale" . For any score in the distribution, we can find its percentile rank by counting the number of scores in the distribution that are lower than that score and converting that number to a percentage of the total number of scores. Notice, for example, that five of the students represented by the data in Table 12.1 "Frequency Table Showing a Hypothetical Distribution of Scores on the Rosenberg Self-Esteem Scale" had self-esteem scores of 23. In this distribution, 32 of the 40 scores (80%) are lower than 23. Thus each of these students has a percentile rank of 80. (It can also be said that they scored “at the 80th percentile.”) Percentile ranks are often used to report the results of standardized tests of ability or achievement. If your percentile rank on a test of verbal ability were 40, for example, this would mean that you scored higher than 40% of the people who took the test.

Another approach is the z score. The z score A measure of the location of a score within its distribution. The score minus the mean, divided by the standard deviation. for a particular individual is the difference between that individual’s score and the mean of the distribution, divided by the standard deviation of the distribution:

A z score indicates how far above or below the mean a raw score is, but it expresses this in terms of the standard deviation. For example, in a distribution of intelligence quotient (IQ) scores with a mean of 100 and a standard deviation of 15, an IQ score of 110 would have a z score of (110 − 100) / 15 = +0.67. In other words, a score of 110 is 0.67 standard deviations (approximately two thirds of a standard deviation) above the mean. Similarly, a raw score of 85 would have a z score of (85 − 100) / 15 = −1.00. In other words, a score of 85 is one standard deviation below the mean.

There are several reasons that z scores are important. Again, they provide a way of describing where an individual’s score is located within a distribution and are sometimes used to report the results of standardized tests. They also provide one way of defining outliers. For example, outliers are sometimes defined as scores that have z scores less than −3.00 or greater than +3.00. In other words, they are defined as scores that are more than three standard deviations from the mean. Finally, z scores play an important role in understanding and computing other statistics, as we will see shortly.

Online Descriptive Statistics

Although many researchers use commercially available software such as SPSS and Excel to analyze their data, there are several free online analysis tools that can also be extremely useful. Many allow you to enter or upload your data and then make one click to conduct several descriptive statistical analyses. Among them are the following.

Rice Virtual Lab in Statistics

http://onlinestatbook.com/stat_analysis/index.html

VassarStats

http://faculty.vassar.edu/lowry/VassarStats.html

Bright Stat

http://www.brightstat.com

For a more complete list, see http://statpages.org/index.html .

Key Takeaways

  • Every variable has a distribution—a way that the scores are distributed across the levels. The distribution can be described using a frequency table and histogram. It can also be described in words in terms of its shape, including whether it is unimodal or bimodal, and whether it is symmetrical or skewed.
  • The central tendency, or middle, of a distribution can be described precisely using three statistics—the mean, median, and mode. The mean is the sum of the scores divided by the number of scores, the median is the middle score, and the mode is the most common score.
  • The variability, or spread, of a distribution can be described precisely using the range and standard deviation. The range is the difference between the highest and lowest scores, and the standard deviation is roughly the average amount by which the scores differ from the mean.
  • The location of a score within its distribution can be described using percentile ranks or z scores. The percentile rank of a score is the percentage of scores below that score, and the z score is the difference between the score and the mean divided by the standard deviation.

Practice: Make a frequency table and histogram for the following data. Then write a short description of the shape of the distribution in words.

11, 8, 9, 12, 9, 10, 12, 13, 11, 13, 12, 6, 10, 17, 13, 11, 12, 12, 14, 14

  • Practice: For the data in Exercise 1, compute the mean, median, mode, standard deviation, and range.
  • Practice: Using the data in Exercises 1 and 2, find (a) the percentile ranks for scores of 9 and 14 and (b) the z scores for scores of 8 and 12.

12.2 Describing Statistical Relationships

  • Describe differences between groups in terms of their means and standard deviations, and in terms of Cohen’s d .
  • Describe correlations between quantitative variables in terms of Pearson’s r .

As we have seen throughout this book, most interesting research questions in psychology are about statistical relationships between variables. Recall that there is a statistical relationship between two variables when the average score on one differs systematically across the levels of the other. In this section, we revisit the two basic forms of statistical relationship introduced earlier in the book—differences between groups or conditions and relationships between quantitative variables—and we consider how to describe them in more detail.

Differences Between Groups or Conditions

Differences between groups or conditions are usually described in terms of the mean and standard deviation of each group or condition. For example, Thomas Ollendick and his colleagues conducted a study in which they evaluated two one-session treatments for simple phobias in children (Ollendick et al., 2009). Ollendick, T. H., Öst, L.-G., Reuterskiöld, L., Costa, N., Cederlund, R., Sirbu, C.,…Jarrett, M. A. (2009). One-session treatments of specific phobias in youth: A randomized clinical trial in the United States and Sweden. Journal of Consulting and Clinical Psychology, 77 , 504–516. They randomly assigned children with an intense fear (e.g., to dogs) to one of three conditions. In the exposure condition, the children actually confronted the object of their fear under the guidance of a trained therapist. In the education condition, they learned about phobias and some strategies for coping with them. In the waitlist control condition, they were waiting to receive a treatment after the study was over. The severity of each child’s phobia was then rated on a 1-to-8 scale by a clinician who did not know which treatment the child had received. (This was one of several dependent variables.) The mean fear rating in the education condition was 4.83 with a standard deviation of 1.52, while the mean fear rating in the exposure condition was 3.47 with a standard deviation of 1.77. The mean fear rating in the control condition was 5.56 with a standard deviation of 1.21. In other words, both treatments worked, but the exposure treatment worked better than the education treatment. As we have seen, differences between group or condition means can be presented in a bar graph like that in Figure 12.5 "Bar Graph Showing Mean Clinician Phobia Ratings for Children in Two Treatment Conditions" , where the heights of the bars represent the group or condition means. We will look more closely at creating American Psychological Association (APA)-style bar graphs shortly.

Figure 12.5 Bar Graph Showing Mean Clinician Phobia Ratings for Children in Two Treatment Conditions

descriptive statistics in psychology research

It is also important to be able to describe the strength of a statistical relationship, which is often referred to as the effect size Another name for measures of relationship strength, including Cohen’s d and Pearson’s r . . The most widely used measure of effect size for differences between group or condition means is called Cohen’s d A measure of relationship strength or “effect size” for a difference between two groups or conditions. , which is the difference between the two means divided by the standard deviation:

In this formula, it does not really matter which mean is M 1 and which is M 2 . If there is a treatment group and a control group, the treatment group mean is usually M 1 and the control group mean is M 2 . Otherwise, the larger mean is usually M 1 and the smaller mean M 2 so that Cohen’s d turns out to be positive. The standard deviation in this formula is usually a kind of average of the two group standard deviations called the pooled-within groups standard deviation. To compute the pooled within-groups standard deviation, add the sum of the squared differences for Group 1 to the sum of squared differences for Group 2, divide this by the sum of the two sample sizes, and then take the square root of that. Informally, however, the standard deviation of either group can be used instead.

Conceptually, Cohen’s d is the difference between the two means expressed in standard deviation units. (Notice its similarity to a z score, which expresses the difference between an individual score and a mean in standard deviation units.) A Cohen’s d of 0.50 means that the two group means differ by 0.50 standard deviations (half a standard deviation). A Cohen’s d of 1.20 means that they differ by 1.20 standard deviations. But how should we interpret these values in terms of the strength of the relationship or the size of the difference between the means? Table 12.4 "Guidelines for Referring to Cohen’s " presents some guidelines for interpreting Cohen’s d values in psychological research (Cohen, 1992). Cohen, J. (1992). A power primer. Psychological Bulletin, 112 , 155–159. Values near 0.20 are considered small, values near 0.50 are considered medium, and values near 0.80 are considered large. Thus a Cohen’s d value of 0.50 represents a medium-sized difference between two means, and a Cohen’s d value of 1.20 represents a very large difference in the context of psychological research. In the research by Ollendick and his colleagues, there was a large difference ( d = 0.82) between the exposure and education conditions.

Table 12.4 Guidelines for Referring to Cohen’s d and Pearson’s r Values as “Strong,” “Medium,” or “Weak”

Cohen’s d is useful because it has the same meaning regardless of the variable being compared or the scale it was measured on. A Cohen’s d of 0.20 means that the two group means differ by 0.20 standard deviations whether we are talking about scores on the Rosenberg Self-Esteem scale, reaction time measured in milliseconds, number of siblings, or diastolic blood pressure measured in millimeters of mercury. Not only does this make it easier for researchers to communicate with each other about their results, it also makes it possible to combine and compare results across different studies using different measures.

Be aware that the term effect size can be misleading because it suggests a causal relationship—that the difference between the two means is an “effect” of being in one group or condition as opposed to another. Imagine, for example, a study showing that a group of exercisers is happier on average than a group of nonexercisers, with an “effect size” of d = 0.35. If the study was an experiment—with participants randomly assigned to exercise and no-exercise conditions—then one could conclude that exercising caused a small to medium-sized increase in happiness. If the study was correlational, however, then one could conclude only that the exercisers were happier than the nonexercisers by a small to medium-sized amount. In other words, simply calling the difference an “effect size” does not make the relationship a causal one.

Sex Differences Expressed as Cohen’s d

Researcher Janet Shibley Hyde has looked at the results of numerous studies on psychological sex differences and expressed the results in terms of Cohen’s d (Hyde, 2007). Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263. Following are a few of the values she has found, averaging across several studies in each case. (Note that because she always treats the mean for men as M 1 and the mean for women as M 2 , positive values indicate that men score higher and negative values indicate that women score higher.)

Hyde points out that although men and women differ by a large amount on some variables (e.g., attitudes toward casual sex), they differ by only a small amount on the vast majority. In many cases, Cohen’s d is less than 0.10, which she terms a “trivial” difference. (The difference in talkativeness discussed in Chapter 1 "The Science of Psychology" was also trivial: d = 0.06.) Although researchers and nonresearchers alike often emphasize sex differences , Hyde has argued that it makes at least as much sense to think of men and women as fundamentally similar . She refers to this as the “gender similarities hypothesis.”

Correlations Between Quantitative Variables

As we have seen throughout the book, many interesting statistical relationships take the form of correlations between quantitative variables. For example, researchers Kurt Carlson and Jacqueline Conard conducted a study on the relationship between the alphabetical position of the first letter of people’s last names (from A = 1 to Z = 26) and how quickly those people responded to consumer appeals (Carlson & Conard, 2011). Carlson, K. A., & Conard, J. M. (2011). The last name effect: How last name influences acquisition timing. Journal of Consumer Research . doi: 10.1086/658470 In one study, they sent e-mails to a large group of MBA students, offering free basketball tickets from a limited supply. The result was that the further toward the end of the alphabet students’ last names were, the faster they tended to respond. These results are summarized in Figure 12.7 "Line Graph Showing the Relationship Between the Alphabetical Position of People’s Last Names and How Quickly Those People Respond to Offers of Consumer Goods" .

Figure 12.7 Line Graph Showing the Relationship Between the Alphabetical Position of People’s Last Names and How Quickly Those People Respond to Offers of Consumer Goods

descriptive statistics in psychology research

Such relationships are often presented using line graphs or scatterplots, which show how the level of one variable differs across the range of the other. In the line graph in Figure 12.7 "Line Graph Showing the Relationship Between the Alphabetical Position of People’s Last Names and How Quickly Those People Respond to Offers of Consumer Goods" , for example, each point represents the mean response time for participants with last names in the first, second, third, and fourth quartiles (or quarters) of the name distribution. It clearly shows how response time tends to decline as people’s last names get closer to the end of the alphabet. The scatterplot in Figure 12.8 "Statistical Relationship Between Several College Students’ Scores on the Rosenberg Self-Esteem Scale Given on Two Occasions a Week Apart" , which is reproduced from Chapter 5 "Psychological Measurement" , shows the relationship between 25 research methods students’ scores on the Rosenberg Self-Esteem Scale given on two occasions a week apart. Here the points represent individuals, and we can see that the higher students scored on the first occasion, the higher they tended to score on the second occasion. In general, line graphs are used when the variable on the x- axis has (or is organized into) a small number of distinct values, such as the four quartiles of the name distribution. Scatterplots are used when the variable on the x- axis has a large number of values, such as the different possible self-esteem scores.

Figure 12.8 Statistical Relationship Between Several College Students’ Scores on the Rosenberg Self-Esteem Scale Given on Two Occasions a Week Apart

descriptive statistics in psychology research

The data presented in Figure 12.8 "Statistical Relationship Between Several College Students’ Scores on the Rosenberg Self-Esteem Scale Given on Two Occasions a Week Apart" provide a good example of a positive relationship, in which higher scores on one variable tend to be associated with higher scores on the other (so that the points go from the lower left to the upper right of the graph). The data presented in Figure 12.7 "Line Graph Showing the Relationship Between the Alphabetical Position of People’s Last Names and How Quickly Those People Respond to Offers of Consumer Goods" provide a good example of a negative relationship, in which higher scores on one variable tend to be associated with lower scores on the other (so that the points go from the upper left to the lower right).

Both of these examples are also linear relationships, in which the points are reasonably well fit by a single straight line. Nonlinear relationships A statistical relationship in which as the X variable increases, the Y variable does not increase or decrease at a constant rate. Such relationships are best described by a curved line. are those in which the points are better fit by a curved line. Figure 12.9 "A Hypothetical Nonlinear Relationship Between How Much Sleep People Get per Night and How Depressed They Are" , for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best fits the points is a curve—a kind of upside down “U”—because people who get about eight hours of sleep tend to be the least depressed, while those who get too little sleep and those who get too much sleep tend to be more depressed. Nonlinear relationships are not uncommon in psychology, but a detailed discussion of them is beyond the scope of this book.

Figure 12.9 A Hypothetical Nonlinear Relationship Between How Much Sleep People Get per Night and How Depressed They Are

descriptive statistics in psychology research

As we saw earlier in the book, the strength of a correlation between quantitative variables is typically measured using a statistic called Pearson’s r . As Figure 12.10 "Pearson’s " shows, its possible values range from −1.00, through zero, to +1.00. A value of 0 means there is no relationship between the two variables. In addition to his guidelines for interpreting Cohen’s d , Cohen offered guidelines for interpreting Pearson’s r in psychological research (see Table 12.4 "Guidelines for Referring to Cohen’s " ). Values near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s r is unrelated to its strength. Pearson’s r values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. Like Cohen’s d , Pearson’s r is also referred to as a measure of “effect size” even though the relationship may not be a causal one.

Figure 12.10 Pearson’s r Ranges From −1.00 (Representing the Strongest Possible Negative Relationship), Through 0 (Representing No Relationship), to +1.00 (Representing the Strongest Possible Positive Relationship)

descriptive statistics in psychology research

The computations for Pearson’s r are more complicated than those for Cohen’s d . Although you may never have to do them by hand, it is still instructive to see how. Computationally, Pearson’s r is the “mean cross-product of z scores.” To compute it, one starts by transforming all the scores to z scores. For the X variable, subtract the mean of X from each score and divide each difference by the standard deviation of X . For the Y variable, subtract the mean of Y from each score and divide each difference by the standard deviation of Y . Then, for each individual, multiply the two z scores together to form a cross-product. Finally, take the mean of the cross-products. The formula looks like this:

Table 12.5 "Sample Computations for Pearson’s " illustrates these computations for a small set of data. The first column lists the scores for the X variable, which has a mean of 4.00 and a standard deviation of 1.90. The second column is the z score for each of these raw scores. The third and fourth columns list the raw scores for the Y variable, which has a mean of 40 and a standard deviation of 11.78, and the corresponding z scores. The fifth column lists the cross-products. For example, the first one is 0.00 multiplied by −0.85, which is equal to 0.00. The second is 1.58 multiplied by 1.19, which is equal to 1.88. The mean of these cross-products, shown at the bottom of that column, is Pearson’s r , which in this case is +.53. There are other formulas for computing Pearson’s r by hand that may be quicker. This approach, however, is much clearer in terms of communicating conceptually what Pearson’s r is.

Table 12.5 Sample Computations for Pearson’s r

There are two common situations in which the value of Pearson’s r can be misleading. One is when the relationship under study is nonlinear. Even though Figure 12.9 "A Hypothetical Nonlinear Relationship Between How Much Sleep People Get per Night and How Depressed They Are" shows a fairly strong relationship between depression and sleep, Pearson’s r would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s r . The other is when one or both of the variables have a limited range in the sample relative to the population. This is referred to as restriction of range When the data used to assess a statistical relationship include a limited range of scores on either the X or Y variable, relative to the range of scores in the population. This makes the statistical relationships appear weaker than it actually is. . Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 12.11 "Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range" . Pearson’s r here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 12.11 "Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range" —then the relationship would seem to be quite weak. In fact, Pearson’s r for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s r in light of it. (There are also statistical methods to correct Pearson’s r for restriction of range, but they are beyond the scope of this book).

Figure 12.11 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

descriptive statistics in psychology research

The overall correlation here is −.77, but the correlation for the 18- to 24-year-olds (in the blue box) is 0.

  • Differences between groups or conditions are typically described in terms of the means and standard deviations of the groups or conditions or in terms of Cohen’s d and are presented in bar graphs.
  • Cohen’s d is a measure of relationship strength (or effect size) for differences between two group or condition means. It is the difference of the means divided by the standard deviation. In general, values of ±0.20, ±0.50, and ±0.80 can be considered small, medium, and large, respectively.
  • Correlations between quantitative variables are typically described in terms of Pearson’s r and presented in line graphs or scatterplots.
  • Pearson’s r is a measure of relationship strength (or effect size) for relationships between quantitative variables. It is the mean cross-product of the two sets of z scores. In general, values of ±.10, ±.30, and ±.50 can be considered small, medium, and large, respectively.

Practice: The following data represent scores on the Rosenberg Self-Esteem Scale for a sample of 10 Japanese college students and 10 American college students. (Although hypothetical, these data are consistent with empirical findings [Schmitt & Allik, 2005]. Schmitt, D. P., & Allik, J. (2005). Simultaneous administration of the Rosenberg Self-Esteem Scale in 53 nations: Exploring the universal and culture-specific features of global self-esteem. Journal of Personality and Social Psychology, 89 , 623–642. ) Compute the means and standard deviations of the two groups, make a bar graph, compute Cohen’s d , and describe the strength of the relationship in words.

Practice: The hypothetical data that follow are extroversion scores and the number of Facebook friends for 15 college students. Make a scatterplot for these data, compute Pearson’s r , and describe the relationship in words.

12.3 Expressing Your Results

  • Write out simple descriptive statistics in American Psychological Association (APA) style.
  • Interpret and create simple APA-style graphs—including bar graphs, line graphs, and scatterplots.
  • Interpret and create simple APA-style tables—including tables of group or condition means and correlation matrixes.

Once you have conducted your descriptive statistical analyses, you will need to present them to others. In this section, we focus on presenting descriptive statistical results in writing, in graphs, and in tables—following American Psychological Association (APA) guidelines for written research reports. These principles can be adapted easily to other presentation formats such as posters and slide show presentations.

Presenting Descriptive Statistics in Writing

When you have a small number of results to report, it is often most efficient to write them out. There are a few important APA style guidelines here. First, statistical results are always presented in the form of numerals rather than words and are usually rounded to two decimal places (e.g., “2.00” rather than “two” or “2”). They can be presented either in the narrative description of the results or parenthetically—much like reference citations. Here are some examples:

The mean age of the participants was 22.43 years with a standard deviation of 2.34.

Among the low self-esteem participants, those in a negative mood expressed stronger intentions to have unprotected sex ( M = 4.05, SD = 2.32) than those in a positive mood ( M = 2.15, SD = 2.27).

The treatment group had a mean of 23.40 ( SD = 9.33), while the control group had a mean of 20.87 ( SD = 8.45).

The test-retest correlation was .96.

There was a moderate negative correlation between the alphabetical position of respondents’ last names and their response time ( r = −.27).

Notice that when presented in the narrative, the terms mean and standard deviation are written out, but when presented parenthetically, the symbols M and SD are used instead. Notice also that it is especially important to use parallel construction to express similar or comparable results in similar ways. The third example is much better than the following nonparallel alternative:

The treatment group had a mean of 23.40 ( SD = 9.33), while 20.87 was the mean of the control group, which had a standard deviation of 8.45.

Presenting Descriptive Statistics in Graphs

When you have a large number of results to report, you can often do it more clearly and efficiently with a graph. When you prepare graphs for an APA-style research report, there are some general guidelines that you should keep in mind. First, the graph should always add important information rather than repeat information that already appears in the text or in a table. (If a graph presents information more clearly or efficiently, then you should keep the graph and eliminate the text or table.) Second, graphs should be as simple as possible. For example, the Publication Manual discourages the use of color unless it is absolutely necessary (although color can still be an effective element in posters, slide show presentations, or textbooks.) Third, graphs should be interpretable on their own. A reader should be able to understand the basic result based only on the graph and its caption and should not have to refer to the text for an explanation.

There are also several more technical guidelines for graphs that include the following:

  • The graph should be slightly wider than it is tall.
  • The independent variable should be plotted on the x- axis and the dependent variable on the y- axis.
  • Values should increase from left to right on the x- axis and from bottom to top on the y- axis.

Axis Labels and Legends

  • Axis labels should be clear and concise and include the units of measurement if they do not appear in the caption.
  • Axis labels should be parallel to the axis.
  • Legends should appear within the boundaries of the graph.
  • Text should be in the same simple font throughout and differ by no more than four points.
  • Captions should briefly describe the figure, explain any abbreviations, and include the units of measurement if they do not appear in the axis labels.
  • Captions in an APA manuscript should be typed on a separate page that appears at the end of the manuscript. See Chapter 11 "Presenting Your Research" for more information.

As we have seen throughout this book, bar graphs A graph used to show differences between the mean scores of two or more groups or conditions. are generally used to present and compare the mean scores for two or more groups or conditions. The bar graph in Figure 12.12 "Sample APA-Style Bar Graph, With Error Bars Representing the Standard Errors, Based on Research by Ollendick and Colleagues" is an APA-style version of Figure 12.5 "Bar Graph Showing Mean Clinician Phobia Ratings for Children in Two Treatment Conditions" . Notice that it conforms to all the guidelines listed. A new element in Figure 12.12 "Sample APA-Style Bar Graph, With Error Bars Representing the Standard Errors, Based on Research by Ollendick and Colleagues" is the smaller vertical bars that extend both upward and downward from the top of each main bar. These are error bars In bar graphs and line graphs, vertical lines that show the amount of variability around the mean in each group or condition. They typically extend upward and downward one standard error from the top of each bar or point. , and they represent the variability in each group or condition. Although they sometimes extend one standard deviation in each direction, they are more likely to extend one standard error in each direction (as in Figure 12.12 "Sample APA-Style Bar Graph, With Error Bars Representing the Standard Errors, Based on Research by Ollendick and Colleagues" ). The standard error The standard deviation divided by the square root of the sample size. Often used for error bars in graphs. is the standard deviation of the group divided by the square root of the sample size of the group. The standard error is used because, in general, a difference between group means that is greater than two standard errors is statistically significant. Thus one can “see” whether a difference is statistically significant based on a bar graph with error bars.

Figure 12.12 Sample APA-Style Bar Graph, With Error Bars Representing the Standard Errors, Based on Research by Ollendick and Colleagues

descriptive statistics in psychology research

Line Graphs

Line graphs A graph used to show the relationship between two quantitative variables. For each level of the X variable, there is a point representing the mean of the Y variable. The points are connected by lines. are used to present correlations between quantitative variables when the independent variable has, or is organized into, a relatively small number of distinct levels. Each point in a line graph represents the mean score on the dependent variable for participants at one level of the independent variable. Figure 12.13 "Sample APA-Style Line Graph Based on Research by Carlson and Conard" is an APA-style version of the results of Carlson and Conard. Notice that it includes error bars representing the standard error and conforms to all the stated guidelines.

Figure 12.13 Sample APA-Style Line Graph Based on Research by Carlson and Conard

descriptive statistics in psychology research

In most cases, the information in a line graph could just as easily be presented in a bar graph. In Figure 12.13 "Sample APA-Style Line Graph Based on Research by Carlson and Conard" , for example, one could replace each point with a bar that reaches up to the same level and leave the error bars right where they are. This emphasizes the fundamental similarity of the two types of statistical relationship. Both are differences in the average score on one variable across levels of another. The convention followed by most researchers, however, is to use a bar graph when the variable plotted on the x- axis is categorical and a line graph when it is quantitative.

Scatterplots

Scatterplots A graph used to show the correlation between two quantitative variables. For each individual, there is a point representing that individual’s score on both the X and Y variables. are used to present relationships between quantitative variables when the variable on the x- axis (typically the independent variable) has a large number of levels. Each point in a scatterplot represents an individual rather than the mean for a group of individuals, and there are no lines connecting the points. The graph in Figure 12.14 "Sample APA-Style Scatterplot" is an APA-style version of Figure 12.8 "Statistical Relationship Between Several College Students’ Scores on the Rosenberg Self-Esteem Scale Given on Two Occasions a Week Apart" , which illustrates a few additional points. First, when the variables on the x- axis and y -axis are conceptually similar and measured on the same scale—as here, where they are measures of the same variable on two different occasions—this can be emphasized by making the axes the same length. Second, when two or more individuals fall at exactly the same point on the graph, one way this can be indicated is by offsetting the points slightly along the x- axis. Other ways are by displaying the number of individuals in parentheses next to the point or by making the point larger or darker in proportion to the number of individuals. Finally, the straight line that best fits the points in the scatterplot, which is called the regression line, can also be included.

Figure 12.14 Sample APA-Style Scatterplot

descriptive statistics in psychology research

Expressing Descriptive Statistics in Tables

Like graphs, tables can be used to present large amounts of information clearly and efficiently. The same general principles apply to tables as apply to graphs. They should add important information to the presentation of your results, be as simple as possible, and be interpretable on their own. Again, we focus here on tables for an APA-style manuscript.

The most common use of tables is to present several means and standard deviations—usually for complex research designs with multiple independent and dependent variables. Figure 12.15 "Sample APA-Style Table Presenting Means and Standard Deviations" , for example, shows the results of a hypothetical study similar to the one by MacDonald and Martineau (2002) MacDonald, T. K., & Martineau, A. M. (2002). Self-esteem, mood, and intentions to use condoms: When does low self-esteem lead to risky health behaviors? Journal of Experimental Social Psychology, 38 , 299–306. discussed in Chapter 5 "Psychological Measurement" . (The means in Figure 12.15 "Sample APA-Style Table Presenting Means and Standard Deviations" are the means reported by MacDonald and Martineau, but the standard errors are not). Recall that these researchers categorized participants as having low or high self-esteem, put them into a negative or positive mood, and measured their intentions to have unprotected sex. Although not mentioned in Chapter 5 "Psychological Measurement" , they also measured participants’ attitudes toward unprotected sex. Notice that the table includes horizontal lines spanning the entire table at the top and bottom, and just beneath the column headings. Furthermore, every column has a heading—including the leftmost column—and there are additional headings that span two or more columns that help to organize the information and present it more efficiently. Finally, notice that APA-style tables are numbered consecutively starting at 1 (Table 1, Table 2, and so on) and given a brief but clear and descriptive title.

Figure 12.15 Sample APA-Style Table Presenting Means and Standard Deviations

descriptive statistics in psychology research

Another common use of tables is to present correlations—usually measured by Pearson’s r —among several variables. This is called a correlation matrix A table that shows the correlations among several variables. . Figure 12.16 "Sample APA-Style Table (Correlation Matrix) Based on Research by McCabe and Colleagues" is a correlation matrix based on a study by David McCabe and colleagues (McCabe, Roediger, McDaniel, Balota, & Hambrick, 2010). McCabe, D. P., Roediger, H. L., McDaniel, M. A., Balota, D. A., & Hambrick, D. Z. (2010). The relationship between working memory capacity and executive functioning. Neuropsychology, 243 , 222–243. They were interested in the relationships between working memory and several other variables. We can see from the table that the correlation between working memory and executive function, for example, was an extremely strong .96, that the correlation between working memory and vocabulary was a medium .27, and that all the measures except vocabulary tend to decline with age. Notice here that only half the table is filled in because the other half would have identical values. For example, the Pearson’s r value in the upper right corner (working memory and age) would be the same as the one in the lower left corner (age and working memory). The correlation of a variable with itself is always 1.00, so these values are replaced by dashes to make the table easier to read.

Figure 12.16 Sample APA-Style Table (Correlation Matrix) Based on Research by McCabe and Colleagues

descriptive statistics in psychology research

As with graphs, precise statistical results that appear in a table do not need to be repeated in the text. Instead, the writer can note major trends and alert the reader to details (e.g., specific correlations) that are of particular interest.

  • In an APA-style article, simple results are most efficiently presented in the text, while more complex results are most efficiently presented in graphs or tables.
  • APA style includes several rules for presenting numerical results in the text. These include using words only for numbers less than 10 that do not represent precise statistical results, and rounding results to two decimal places, using words (e.g., “mean”) in the text and symbols (e.g., “ M ”) in parentheses.
  • APA style includes several rules for presenting results in graphs and tables. Graphs and tables should add information rather than repeating information, be as simple as possible, and be interpretable on their own with a descriptive caption (for graphs) or a descriptive title (for tables).
  • Practice: In a classic study, men and women rated the importance of physical attractiveness in both a short-term mate and a long-term mate (Buss & Schmitt, 1993). Buss, D. M., & Schmitt, D. P. (1993). Sexual strategies theory: A contextual evolutionary analysis of human mating. Psychological Review, 100 , 204–232. The means and standard deviations are as follows. Men / Short Term: M = 5.67, SD = 2.34; Men / Long Term: M = 4.43, SD = 2.11; Women / Short Term: M = 5.67, SD = 2.48; Women / Long Term: M = 4.22, SD = 1.98. Present these results (a) in writing, (b) in a graph, and (c) in a table.

12.4 Conducting Your Analyses

Learning objective.

  • Describe the steps involved in preparing and analyzing a typical set of raw data.

Even when you understand the statistics involved, analyzing data can be a complicated process. It is likely that for each of several participants, there are data for several different variables: demographics such as sex and age, one or more independent variables, one or more dependent variables, and perhaps a manipulation check. Furthermore, the “raw” (unanalyzed) data might take several different forms—completed paper-and-pencil questionnaires, computer files filled with numbers or text, videos, or written notes—and these may have to be organized, coded, or combined in some way. There might even be missing, incorrect, or just “suspicious” responses that must be dealt with. In this section, we consider some practical advice to make this process as organized and efficient as possible.

Prepare Your Data for Analysis

Whether your raw data are on paper or in a computer file (or both), there are a few things you should do before you begin analyzing them. First, be sure they do not include any information that might identify individual participants and be sure that you have a secure location where you can store the data and a separate secure location where you can store any consent forms. Unless the data are highly sensitive, a locked room or password-protected computer is usually good enough. It is also a good idea to make photocopies or backup files of your data and store them in yet another secure location—at least until the project is complete. Professional researchers usually keep a copy of their raw data and consent forms for several years in case questions about the procedure, the data, or participant consent arise after the project is completed.

Next, you should check your raw data Data in the form in which they were originally collected (e.g., completed questionnaires). to make sure that they are complete and appear to have been accurately recorded (whether it was participants, yourself, or a computer program that did the recording). At this point, you might find that there are illegible or missing responses, or obvious misunderstandings (e.g., a response of “12” on a 1-to-10 rating scale). You will have to decide whether such problems are severe enough to make a participant’s data unusable. If information about the main independent or dependent variable is missing, or if several responses are missing or suspicious, you may have to exclude that participant’s data from the analyses. If you do decide to exclude any data, do not throw them away or delete them because you or another researcher might want to see them later. Instead, set them aside and keep notes about why you decided to exclude them because you will need to report this information.

Now you are ready to enter your data in a spreadsheet program or, if it is already in a computer file, to format it for analysis. You can use a general spreadsheet program like Microsoft Excel or a statistical analysis program like SPSS to create your data file A computer file that contains data formatted for statistical analysis. . (Data files created in one program can usually be converted to work with other programs.) The most common format is for each row to represent a participant and for each column to represent a variable (with the variable name at the top of each column). A sample data file is shown in Table 12.6 "Sample Data File" . The first column contains participant identification numbers. This is followed by columns containing demographic information (sex and age), independent variables (mood, four self-esteem items, and the total of the four self-esteem items), and finally dependent variables (intentions and attitudes). Categorical variables can usually be entered as category labels (e.g., “M” and “F” for male and female) or as numbers (e.g., “0” for negative mood and “1” for positive mood). Although category labels are often clearer, some analyses might require numbers. SPSS allows you to enter numbers but also attach a category label to each number.

Table 12.6 Sample Data File

If you have multiple-response measures—such the self-esteem measure in Table 12.6 "Sample Data File" —you could combine the items by hand and then enter the total score in your spreadsheet. However, it is much better to enter each response as a separate variable in the spreadsheet—as with the self-esteem measure in Table 12.6 "Sample Data File" —and use the software to combine them (e.g., using the “AVERAGE” function in Excel or the “Compute” function in SPSS). Not only is this approach more accurate, but it allows you to detect and correct errors, to assess internal consistency, and to analyze individual responses if you decide to do so later.

Preliminary Analyses

Before turning to your primary research questions, there are often several preliminary analyses to conduct. For multiple-response measures, you should assess the internal consistency of the measure. Statistical programs like SPSS will allow you to compute Cronbach’s α or Cohen’s κ. If this is beyond your comfort level, you can still compute and evaluate a split-half correlation.

Next, you should analyze each important variable separately. (This is not necessary for manipulated independent variables, of course, because you as the researcher determined what the distribution would be.) Make histograms for each one, note their shapes, and compute the common measures of central tendency and variability. Be sure you understand what these statistics mean in terms of the variables you are interested in. For example, a distribution of self-report happiness ratings on a 1-to-10-point scale might be unimodal and negatively skewed with a mean of 8.25 and a standard deviation of 1.14. But what this means is that most participants rated themselves fairly high on the happiness scale, with a small number rating themselves noticeably lower.

Now is the time to identify outliers, examine them more closely, and decide what to do about them. You might discover that what at first appears to be an outlier is the result of a response being entered incorrectly in the data file, in which case you only need to correct the data file and move on. Alternatively, you might suspect that an outlier represents some other kind of error, misunderstanding, or lack of effort by a participant. For example, in a reaction time distribution in which most participants took only a few seconds to respond, a participant who took 3 minutes to respond would be an outlier. It seems likely that this participant did not understand the task (or at least was not paying very close attention). Also, including his or her reaction time would have a large impact on the mean and standard deviation for the sample. In situations like this, it can be justifiable to exclude the outlying response or participant from the analyses. If you do this, however, you should keep notes on which responses or participants you have excluded and why, and apply those same criteria consistently to every response and every participant. When you present your results, you should indicate how many responses or participants you excluded and the specific criteria that you used. And again, do not literally throw away or delete the data that you choose to exclude. Just set them aside because you or another researcher might want to see them later.

Keep in mind that outliers do not necessarily represent an error, misunderstanding, or lack of effort. They might represent truly extreme responses or participants. For example, in one large college student sample, the vast majority of participants reported having had fewer than 15 sexual partners, but there were also a few extreme scores of 60 or 70 (Brown & Sinclair, 1999). Brown, N. R., & Sinclair, R. C. (1999). Estimating number of lifetime sexual partners: Men and women do it differently. The Journal of Sex Research, 36 , 292–297. Although these scores might represent errors, misunderstandings, or even intentional exaggerations, it is also plausible that they represent honest and even accurate estimates. One strategy here would be to use the median and other statistics that are not strongly affected by the outliers. Another would be to analyze the data both including and excluding any outliers. If the results are essentially the same, which they often are, then it makes sense to leave the outliers. If the results differ depending on whether the outliers are included or excluded them, then both analyses can be reported and the differences between them discussed.

Answer Your Research Questions

Finally, you are ready to answer your primary research questions. If you are interested in a difference between group or condition means, you can compute the relevant group or condition means and standard deviations, make a bar graph to display the results, and compute Cohen’s d . If you are interested in a correlation between quantitative variables, you can make a line graph or scatterplot (be sure to check for nonlinearity and restriction of range) and compute Pearson’s r .

At this point, you should also explore your data for other interesting results that might provide the basis for future research (and material for the discussion section of your paper). Daryl Bem (2003) suggests that you

[e]xamine [your data] from every angle. Analyze the sexes separately. Make up new composite indexes. If a datum suggests a new hypothesis, try to find additional evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something—anything—interesting. (p. 186–187) Bem, D. J. (2003). Writing the empirical journal article. In J. M. Darley, M. P. Zanna, & H. L. Roediger III (Eds.), The compleat academic: A career guide (2nd ed., pp. 185–219). Washington, DC: American Psychological Association.

It is important to be cautious, however, because complex sets of data are likely to include “patterns” that occurred entirely by chance. Thus results discovered while “fishing” should be replicated in at least one new study before being presented as new phenomena in their own right.

Understand Your Descriptive Statistics

In the next chapter, we will consider inferential statistics—a set of techniques for deciding whether the results for your sample are likely to apply to the population. Although inferential statistics are important for reasons that will be explained shortly, beginning researchers sometimes forget that their descriptive statistics really tell “what happened” in their study. For example, imagine that a treatment group of 50 participants has a mean score of 34.32 ( SD = 10.45), a control group of 50 participants has a mean score of 21.45 ( SD = 9.22), and Cohen’s d is an extremely strong 1.31. Although conducting and reporting inferential statistics (like a t test) would certainly be a required part of any formal report on this study, it should be clear from the descriptive statistics alone that the treatment worked. Or imagine that a scatterplot shows an indistinct “cloud” of points and Pearson’s r is a trivial −.02. Again, although conducting and reporting inferential statistics would be a required part of any formal report on this study, it should be clear from the descriptive statistics alone that the variables are essentially unrelated. The point is that you should always be sure that you thoroughly understand your results at a descriptive level first, and then move on to the inferential statistics.

  • Raw data must be prepared for analysis by examining them for possible errors, organizing them, and entering them into a spreadsheet program.
  • Preliminary analyses on any data set include checking the reliability of measures, evaluating the effectiveness of any manipulations, examining the distributions of individual variables, and identifying outliers.
  • Outliers that appear to be the result of an error, a misunderstanding, or a lack of effort can be excluded from the analyses. The criteria for excluded responses or participants should be applied in the same way to all the data and described when you present your results. Excluded data should be set aside rather than destroyed or deleted in case they are needed later.
  • Descriptive statistics tell the story of what happened in a study. Although inferential statistics are also important, it is essential to understand the descriptive statistics first.
  • Discussion: What are at least two reasonable ways to deal with each of the following outliers based on the discussion in this chapter? (a) A participant estimating ordinary people’s heights estimates one woman’s height to be “84 inches” tall. (b) In a study of memory for ordinary objects, one participant scores 0 out of 15. (c) In response to a question about how many “close friends” she has, one participant writes “32.”
  • Neuroscience

Descriptive Statistics in Psychology

Descriptive Statistics in Psychology

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 12: Descriptive Statistics

At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single variables, followed by some of the most common techniques for describing statistical relationships between variables. We then look at how to present descriptive statistics in writing and also in the form of tables and graphs that would be appropriate for an American Psychological Association (APA)-style research report. We end with some practical advice for organizing and carrying out your analyses.

Research Methods in Psychology Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Logo for M Libraries Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

12.2 Describing Statistical Relationships

Learning objectives.

  • Describe differences between groups in terms of their means and standard deviations, and in terms of Cohen’s d .
  • Describe correlations between quantitative variables in terms of Pearson’s r .

As we have seen throughout this book, most interesting research questions in psychology are about statistical relationships between variables. Recall that there is a statistical relationship between two variables when the average score on one differs systematically across the levels of the other. In this section, we revisit the two basic forms of statistical relationship introduced earlier in the book—differences between groups or conditions and relationships between quantitative variables—and we consider how to describe them in more detail.

Differences Between Groups or Conditions

Differences between groups or conditions are usually described in terms of the mean and standard deviation of each group or condition. For example, Thomas Ollendick and his colleagues conducted a study in which they evaluated two one-session treatments for simple phobias in children (Ollendick et al., 2009). They randomly assigned children with an intense fear (e.g., to dogs) to one of three conditions. In the exposure condition, the children actually confronted the object of their fear under the guidance of a trained therapist. In the education condition, they learned about phobias and some strategies for coping with them. In the waitlist control condition, they were waiting to receive a treatment after the study was over. The severity of each child’s phobia was then rated on a 1-to-8 scale by a clinician who did not know which treatment the child had received. (This was one of several dependent variables.) The mean fear rating in the education condition was 4.83 with a standard deviation of 1.52, while the mean fear rating in the exposure condition was 3.47 with a standard deviation of 1.77. The mean fear rating in the control condition was 5.56 with a standard deviation of 1.21. In other words, both treatments worked, but the exposure treatment worked better than the education treatment. As we have seen, differences between group or condition means can be presented in a bar graph like that in Figure 12.5 “Bar Graph Showing Mean Clinician Phobia Ratings for Children in Two Treatment Conditions” , where the heights of the bars represent the group or condition means. We will look more closely at creating American Psychological Association (APA)-style bar graphs shortly.

Figure 12.5 Bar Graph Showing Mean Clinician Phobia Ratings for Children in Two Treatment Conditions

Bar Graph Showing Mean Clinician Phobia Ratings for Children in Two Treatment Conditions

It is also important to be able to describe the strength of a statistical relationship, which is often referred to as the effect size . The most widely used measure of effect size for differences between group or condition means is called Cohen’s d , which is the difference between the two means divided by the standard deviation:

d=M1 −M2SD.

In this formula, it does not really matter which mean is M 1 and which is M 2 . If there is a treatment group and a control group, the treatment group mean is usually M 1 and the control group mean is M 2 . Otherwise, the larger mean is usually M 1 and the smaller mean M 2 so that Cohen’s d turns out to be positive. The standard deviation in this formula is usually a kind of average of the two group standard deviations called the pooled-within groups standard deviation. To compute the pooled within-groups standard deviation, add the sum of the squared differences for Group 1 to the sum of squared differences for Group 2, divide this by the sum of the two sample sizes, and then take the square root of that. Informally, however, the standard deviation of either group can be used instead.

Conceptually, Cohen’s d is the difference between the two means expressed in standard deviation units. (Notice its similarity to a z score, which expresses the difference between an individual score and a mean in standard deviation units.) A Cohen’s d of 0.50 means that the two group means differ by 0.50 standard deviations (half a standard deviation). A Cohen’s d of 1.20 means that they differ by 1.20 standard deviations. But how should we interpret these values in terms of the strength of the relationship or the size of the difference between the means? Table 12.4 “Guidelines for Referring to Cohen’s “ presents some guidelines for interpreting Cohen’s d values in psychological research (Cohen, 1992). Values near 0.20 are considered small, values near 0.50 are considered medium, and values near 0.80 are considered large. Thus a Cohen’s d value of 0.50 represents a medium-sized difference between two means, and a Cohen’s d value of 1.20 represents a very large difference in the context of psychological research. In the research by Ollendick and his colleagues, there was a large difference ( d = 0.82) between the exposure and education conditions.

Table 12.4 Guidelines for Referring to Cohen’s d and Pearson’s r Values as “Strong,” “Medium,” or “Weak”

Cohen’s d is useful because it has the same meaning regardless of the variable being compared or the scale it was measured on. A Cohen’s d of 0.20 means that the two group means differ by 0.20 standard deviations whether we are talking about scores on the Rosenberg Self-Esteem scale, reaction time measured in milliseconds, number of siblings, or diastolic blood pressure measured in millimeters of mercury. Not only does this make it easier for researchers to communicate with each other about their results, it also makes it possible to combine and compare results across different studies using different measures.

Be aware that the term effect size can be misleading because it suggests a causal relationship—that the difference between the two means is an “effect” of being in one group or condition as opposed to another. Imagine, for example, a study showing that a group of exercisers is happier on average than a group of nonexercisers, with an “effect size” of d = 0.35. If the study was an experiment—with participants randomly assigned to exercise and no-exercise conditions—then one could conclude that exercising caused a small to medium-sized increase in happiness. If the study was correlational, however, then one could conclude only that the exercisers were happier than the nonexercisers by a small to medium-sized amount. In other words, simply calling the difference an “effect size” does not make the relationship a causal one.

Sex Differences Expressed as Cohen’s d

Researcher Janet Shibley Hyde has looked at the results of numerous studies on psychological sex differences and expressed the results in terms of Cohen’s d (Hyde, 2007). Following are a few of the values she has found, averaging across several studies in each case. (Note that because she always treats the mean for men as M 1 and the mean for women as M 2 , positive values indicate that men score higher and negative values indicate that women score higher.)

Hyde points out that although men and women differ by a large amount on some variables (e.g., attitudes toward casual sex), they differ by only a small amount on the vast majority. In many cases, Cohen’s d is less than 0.10, which she terms a “trivial” difference. (The difference in talkativeness discussed in Chapter 1 “The Science of Psychology” was also trivial: d = 0.06.) Although researchers and nonresearchers alike often emphasize sex differences , Hyde has argued that it makes at least as much sense to think of men and women as fundamentally similar . She refers to this as the “gender similarities hypothesis.”

Figure 12.6

HP manager Regine Phol

Research on psychological sex differences has shown that there is essentially no difference in the leadership effectiveness of women and men.

innovate360 – HP manager Regine Pohl – CC BY 2.0.

Correlations Between Quantitative Variables

As we have seen throughout the book, many interesting statistical relationships take the form of correlations between quantitative variables. For example, researchers Kurt Carlson and Jacqueline Conard conducted a study on the relationship between the alphabetical position of the first letter of people’s last names (from A = 1 to Z = 26) and how quickly those people responded to consumer appeals (Carlson & Conard, 2011). In one study, they sent e-mails to a large group of MBA students, offering free basketball tickets from a limited supply. The result was that the further toward the end of the alphabet students’ last names were, the faster they tended to respond. These results are summarized in Figure 12.7 “Line Graph Showing the Relationship Between the Alphabetical Position of People’s Last Names and How Quickly Those People Respond to Offers of Consumer Goods” .

Figure 12.7 Line Graph Showing the Relationship Between the Alphabetical Position of People’s Last Names and How Quickly Those People Respond to Offers of Consumer Goods

Line Graph Showing the Relationship Between the Alphabetical Position of People's Last Names and How Quickly Those People Respond to Offers of Consumer Goods

Such relationships are often presented using line graphs or scatterplots, which show how the level of one variable differs across the range of the other. In the line graph in Figure 12.7 “Line Graph Showing the Relationship Between the Alphabetical Position of People’s Last Names and How Quickly Those People Respond to Offers of Consumer Goods” , for example, each point represents the mean response time for participants with last names in the first, second, third, and fourth quartiles (or quarters) of the name distribution. It clearly shows how response time tends to decline as people’s last names get closer to the end of the alphabet. The scatterplot in Figure 12.8 “Statistical Relationship Between Several College Students’ Scores on the Rosenberg Self-Esteem Scale Given on Two Occasions a Week Apart” , which is reproduced from Chapter 5 “Psychological Measurement” , shows the relationship between 25 research methods students’ scores on the Rosenberg Self-Esteem Scale given on two occasions a week apart. Here the points represent individuals, and we can see that the higher students scored on the first occasion, the higher they tended to score on the second occasion. In general, line graphs are used when the variable on the x- axis has (or is organized into) a small number of distinct values, such as the four quartiles of the name distribution. Scatterplots are used when the variable on the x- axis has a large number of values, such as the different possible self-esteem scores.

Figure 12.8 Statistical Relationship Between Several College Students’ Scores on the Rosenberg Self-Esteem Scale Given on Two Occasions a Week Apart

Statistical Relationship Between Several College Students' Scores on the Rosenberg Self-Esteem Scale Given on Two Occasions a Week Apart

The data presented in Figure 12.8 “Statistical Relationship Between Several College Students’ Scores on the Rosenberg Self-Esteem Scale Given on Two Occasions a Week Apart” provide a good example of a positive relationship, in which higher scores on one variable tend to be associated with higher scores on the other (so that the points go from the lower left to the upper right of the graph). The data presented in Figure 12.7 “Line Graph Showing the Relationship Between the Alphabetical Position of People’s Last Names and How Quickly Those People Respond to Offers of Consumer Goods” provide a good example of a negative relationship, in which higher scores on one variable tend to be associated with lower scores on the other (so that the points go from the upper left to the lower right).

Both of these examples are also linear relationships, in which the points are reasonably well fit by a single straight line. Nonlinear relationships are those in which the points are better fit by a curved line. Figure 12.9 “A Hypothetical Nonlinear Relationship Between How Much Sleep People Get per Night and How Depressed They Are” , for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best fits the points is a curve—a kind of upside down “U”—because people who get about eight hours of sleep tend to be the least depressed, while those who get too little sleep and those who get too much sleep tend to be more depressed. Nonlinear relationships are not uncommon in psychology, but a detailed discussion of them is beyond the scope of this book.

Figure 12.9 A Hypothetical Nonlinear Relationship Between How Much Sleep People Get per Night and How Depressed They Are

A Hypothetical Nonlinear Relationship Between How Much Sleep People Get per Night and How Depressed They Are

As we saw earlier in the book, the strength of a correlation between quantitative variables is typically measured using a statistic called Pearson’s r . As Figure 12.10 “Pearson’s “ shows, its possible values range from −1.00, through zero, to +1.00. A value of 0 means there is no relationship between the two variables. In addition to his guidelines for interpreting Cohen’s d , Cohen offered guidelines for interpreting Pearson’s r in psychological research (see Table 12.4 “Guidelines for Referring to Cohen’s “ ). Values near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s r is unrelated to its strength. Pearson’s r values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. Like Cohen’s d , Pearson’s r is also referred to as a measure of “effect size” even though the relationship may not be a causal one.

Figure 12.10 Pearson’s r Ranges From −1.00 (Representing the Strongest Possible Negative Relationship), Through 0 (Representing No Relationship), to +1.00 (Representing the Strongest Possible Positive Relationship)

Pearson’s r Ranges From −1.00 (Representing the Strongest Possible Negative Relationship), Through 0 (Representing No Relationship), to +1.00 (Representing the Strongest Possible Positive Relationship)

The computations for Pearson’s r are more complicated than those for Cohen’s d . Although you may never have to do them by hand, it is still instructive to see how. Computationally, Pearson’s r is the “mean cross-product of z scores.” To compute it, one starts by transforming all the scores to z scores. For the X variable, subtract the mean of X from each score and divide each difference by the standard deviation of X . For the Y variable, subtract the mean of Y from each score and divide each difference by the standard deviation of Y . Then, for each individual, multiply the two z scores together to form a cross-product. Finally, take the mean of the cross-products. The formula looks like this:

r=∑( zxzy)N.

Table 12.5 “Sample Computations for Pearson’s “ illustrates these computations for a small set of data. The first column lists the scores for the X variable, which has a mean of 4.00 and a standard deviation of 1.90. The second column is the z score for each of these raw scores. The third and fourth columns list the raw scores for the Y variable, which has a mean of 40 and a standard deviation of 11.78, and the corresponding z scores. The fifth column lists the cross-products. For example, the first one is 0.00 multiplied by −0.85, which is equal to 0.00. The second is 1.58 multiplied by 1.19, which is equal to 1.88. The mean of these cross-products, shown at the bottom of that column, is Pearson’s r , which in this case is +.53. There are other formulas for computing Pearson’s r by hand that may be quicker. This approach, however, is much clearer in terms of communicating conceptually what Pearson’s r is.

Table 12.5 Sample Computations for Pearson’s r

There are two common situations in which the value of Pearson’s r can be misleading. One is when the relationship under study is nonlinear. Even though Figure 12.9 “A Hypothetical Nonlinear Relationship Between How Much Sleep People Get per Night and How Depressed They Are” shows a fairly strong relationship between depression and sleep, Pearson’s r would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s r . The other is when one or both of the variables have a limited range in the sample relative to the population. This is referred to as restriction of range . Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 12.11 “Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range” . Pearson’s r here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 12.11 “Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range” —then the relationship would seem to be quite weak. In fact, Pearson’s r for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s r in light of it. (There are also statistical methods to correct Pearson’s r for restriction of range, but they are beyond the scope of this book).

Figure 12.11 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

The overall correlation here is −.77, but the correlation for the 18- to 24-year-olds (in the blue box) is 0.

Key Takeaways

  • Differences between groups or conditions are typically described in terms of the means and standard deviations of the groups or conditions or in terms of Cohen’s d and are presented in bar graphs.
  • Cohen’s d is a measure of relationship strength (or effect size) for differences between two group or condition means. It is the difference of the means divided by the standard deviation. In general, values of ±0.20, ±0.50, and ±0.80 can be considered small, medium, and large, respectively.
  • Correlations between quantitative variables are typically described in terms of Pearson’s r and presented in line graphs or scatterplots.
  • Pearson’s r is a measure of relationship strength (or effect size) for relationships between quantitative variables. It is the mean cross-product of the two sets of z scores. In general, values of ±.10, ±.30, and ±.50 can be considered small, medium, and large, respectively.

Practice: The following data represent scores on the Rosenberg Self-Esteem Scale for a sample of 10 Japanese college students and 10 American college students. (Although hypothetical, these data are consistent with empirical findings [Schmitt & Allik, 2005].) Compute the means and standard deviations of the two groups, make a bar graph, compute Cohen’s d , and describe the strength of the relationship in words.

Practice: The hypothetical data that follow are extroversion scores and the number of Facebook friends for 15 college students. Make a scatterplot for these data, compute Pearson’s r , and describe the relationship in words.

Carlson, K. A., & Conard, J. M. (2011). The last name effect: How last name influences acquisition timing. Journal of Consumer Research . doi: 10.1086/658470.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112 , 155–159.

Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263.

Ollendick, T. H., Öst, L.-G., Reuterskiöld, L., Costa, N., Cederlund, R., Sirbu, C.,…Jarrett, M. A. (2009). One-session treatments of specific phobias in youth: A randomized clinical trial in the United States and Sweden. Journal of Consulting and Clinical Psychology, 77 , 504–516.

Schmitt, D. P., & Allik, J. (2005). Simultaneous administration of the Rosenberg Self-Esteem Scale in 53 nations: Exploring the universal and culture-specific features of global self-esteem. Journal of Personality and Social Psychology, 89 , 623–642.

Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

The field of statistics is concerned with collecting, analyzing, interpreting, and presenting data. Learn statistics and probability for free, in simple and easy steps starting from basic to advanced concepts.

Exploratory Data Analysis

Illustration of 3 people surrounding a larger computer screen displaying graphs and charts. One person is holding a giant magnifying glass

What Is Face Validity In Research? Importance & How To Measure

Reviewed by Saul Mcleod, PhD

Criterion Validity: Definition & Examples

Convergent validity: definition and examples, scientific method.

The scientific method is a step-by-step process used by researchers and scientists to determine if there is a relationship between two or more variables. Psychologists use this method to conduct psychological research, gather data, process information, and describe behaviors.

Learn More: Steps of the Scientific Method

Variables apply to experimental investigations. The independent variable is the variable the experimenter manipulates or changes. The dependent variable is the variable being tested and measured in an experiment, and is 'dependent' on the independent variable.

Learn More: Independent and Dependent Variables

When you perform a statistical test a p-value helps you determine the significance of your results in relation to the null hypothesis. A p-value less than 0.05 (typically ≤ 0.05) is statistically significant.

Learn More: P-Value and Statistical Significance

Frequent Asked Questions

What does p-value of 0.05 mean?

A p-value less than 0.05 (typically ≤ 0.05) is statistically significant. It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the results have occurred by random chance rather than a real effect. Therefore, we reject the null hypothesis and accept the alternative hypothesis.

However, it is important to note that the p-value is not the only factor that should be considered when interpreting the results of a hypothesis test. Other factors, such as effect size, should also be considered.

Learn More: What A p-Value Tells You About Statistical Significance

What does z-score tell you?

A  z-score  describes the position of a raw score in terms of its distance from the mean when measured in standard deviation units. It is also known as a standard score because it allows the comparison of scores on different variables by standardizing the distribution. The z-score is positive if the value lies above the mean and negative if it lies below the mean.

Learn More: Z-Score: Definition, Calculation, Formula, & Interpretation

What is an independent vs dependent variable?

The independent variable is the variable the experimenter manipulates or changes and is assumed to have a direct effect on the dependent variable. For example, allocating participants to either drug or placebo conditions (independent variable) to measure any changes in the intensity of their anxiety (dependent variable).

Learn More : What are Independent and Dependent Variables?

What is the difference between qualitative and quantitative?

Quantitative data is numerical information about quantities and qualitative data is descriptive and regards phenomena that can be observed but not measured, such as language.

Learn More: What’s the difference between qualitative and quantitative research?

Explore Statistics

content validity

Content Validity in Research: Definition & Examples

construct validity

Construct Validity In Psychology Research

concurrent validity

Concurrent Validity In Psychology

one way ind anova

What Is An ANOVA Test In Statistics: Analysis Of Variance

box whisker plot

Box Plot Explained: Interpretation, Examples, & Comparison

Reviewed by Olivia Guy-Evans, MSc

Chi Square Test of Independence

Chi-Square (Χ²) Test & How To Calculate Formula Equation

sampling distribution

What is the Central Limit Theorem in Statistics?

confidence interval

Confidence Intervals Explained: Examples, Formula & Interpretation

cohen d

What Does Effect Size Tell You?

What is kurtosis in statistics.

bell curve

Introduction to the Normal Distribution (Bell Curve)

Probability and statistical significance in ab testing. Statistical significance in a b experiments

P-Value And Statistical Significance: What It Is & Why It Matters

sampling distribution

Sampling Distribution In Statistics

Standard error formula

Standard Error In Statistics: What It Is, Why It Matters, & How to Calculate

type 1 and 2 errors

Type 1 and Type 2 Errors in Statistics

z score

Z-Score: Definition, Formula, Calculation & Interpretation

z score table

How to Use the Z-Score Table (Standard Normal Table)

descriptive statistics in psychology research

Exam revision at the cinema! Still time to join us in Bristol, Birmingham or London Book now →

Reference Library

Collections

  • See what's new
  • All Resources
  • Student Resources
  • Assessment Resources
  • Teaching Resources
  • CPD Courses
  • Livestreams

Study notes, videos, interactive activities and more!

Psychology news, insights and enrichment

Currated collections of free resources

Browse resources by topic

  • All Psychology Resources

Resource Selections

Currated lists of resources

Descriptive Statistics

Descriptive statistics analyse data to help describe, show or summarise it in a meaningful way. Examples are measures of central tendency and measures of dispersion.

  • Share on Facebook
  • Share on Twitter
  • Share by Email

Measures of Central Tendency

Study Notes

Research Methods - Descriptive Statistics

Quizzes & Activities

Research Methods: MCQ Revision Test 1 for AQA A Level Psychology

Topic Videos

Example Answers for Research Methods: A Level Psychology, Paper 2, June 2018 (AQA)

Exam Support

A Level Psychology Topic Quiz - Research Methods

Our subjects.

  • › Criminology
  • › Economics
  • › Geography
  • › Health & Social Care
  • › Psychology
  • › Sociology
  • › Teaching & learning resources
  • › Student revision workshops
  • › Online student courses
  • › CPD for teachers
  • › Livestreams
  • › Teaching jobs

Boston House, 214 High Street, Boston Spa, West Yorkshire, LS23 6AD Tel: 01937 848885

  • › Contact us
  • › Terms of use
  • › Privacy & cookies

© 2002-2024 Tutor2u Limited. Company Reg no: 04489574. VAT reg no 816865400.

IMAGES

  1. Descriptive Statistics for A Level Psychology (& research methods practice)

    descriptive statistics in psychology research

  2. 12.1 Describing Single Variables

    descriptive statistics in psychology research

  3. Descriptive Statistics

    descriptive statistics in psychology research

  4. Descriptive Statistics: Definition, Overview, Types, Example

    descriptive statistics in psychology research

  5. A-Level Psychology (AQA): Research Methods

    descriptive statistics in psychology research

  6. 12.3 Expressing Your Results

    descriptive statistics in psychology research

VIDEO

  1. Topic 2: Descriptive statistics

  2. Descriptive Statistics and Normality

  3. Descriptive and Inferential statistics for Research

  4. Reporting Descriptive Analysis

  5. Unit 1: Statistics Part 2 (AP Psychology)

  6. 1b.2

COMMENTS

  1. 2.3 Descriptive and Inferential Statistics

    Inferential statistics. We have seen that descriptive statistics are useful in providing an initial way to describe, summarize, and interpret a set of data. They are limited in usefulness because they tell us nothing about how meaningful the data are. The second step in analyzing data requires inferential statistics.

  2. Descriptive Statistics

    Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population. In quantitative research, after collecting data, ... psychology and cognitive neuroscience. As an interdisciplinary researcher, she enjoys writing articles explaining tricky ...

  3. Unit 3. Descriptive Statistics for Psychological Research

    Unit 3. Descriptive Statistics for Psychological Research J Toby Mordkoff and Leyre Castro. Summary. This unit briefly reviews the distinction between descriptive and inferential statistics and then discusses the ways in which both numerical and categorical data are usually summarized for psychological research.

  4. Descriptive Statistics for Summarising Data

    Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s - the fastest quality decision to 17.10 - the slowest quality decision).

  5. 12: Descriptive Statistics

    12.1: Describing Single Variables. Descriptive statistics refers to a set of techniques for summarizing and displaying data. Let us assume here that the data are quantitative and consist of scores on one or more variables for each of several study participants. Although in most cases the primary research question will be about one or more ...

  6. Descriptive Statistics

    Descriptive Statistics. At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single variables ...

  7. 13. Descriptive Statistics

    Conducting Your Analyses. 13. Descriptive Statistics. Statistics is the grammar of science. —Karl Pearson. At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics, a set of techniques for summarizing and displaying the data from your sample.

  8. Chapter 12: Descriptive Statistics

    Research Methods in Psychology. At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single ...

  9. 12.5: Descriptive Statistics (Summary)

    It is the difference of the means divided by the standard deviation. In general, values of ±0.20, ±0.50, and ±0.80 can be considered small, medium, and large, respectively. Correlations between quantitative variables are typically described in terms of Pearson's r and presented in line graphs or scatterplots.

  10. Chapter 12: Descriptive Statistics

    Chapter 12: Descriptive Statistics. 12.1 Describing Single Variables. 12.2 Describing Statistical Relationships. 12.3 Expressing Your Results. 12.4 Conducting Your Analyses.

  11. 12.3 Expressing Your Results

    There are a few important APA style guidelines here. First, statistical results are always presented in the form of numerals rather than words and are usually rounded to two decimal places (e.g., "2.00" rather than "two" or "2"). They can be presented either in the narrative description of the results or parenthetically—much like ...

  12. Descriptive Statistics

    Descriptive Statistics. Descriptive statistics employs a set of procedures that make it possible to meaningfully and accurately summarize and describe samples of data. In order for one to make meaningful statements about psychological events, the variable or variables involved must be organized, measured, and then expressed as quantities.

  13. Expressing Your Results

    In this section, we focus on presenting descriptive statistical results in writing, in graphs, and in tables—following American Psychological Association (APA) guidelines for written research reports. These principles can be adapted easily to other presentation formats such as posters and slide show presentations.

  14. Descriptive Statistics

    Descriptive statistics summarise and organise characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population. In quantitative research, after collecting data, ... psychology and cognitive neuroscience. As an interdisciplinary researcher, she enjoys writing articles explaining tricky ...

  15. Descriptive Research in Psychology

    Sometimes you need to dig deeper than the pure statistics. Descriptive research is one of the key tools needed in any psychology researcher's toolbox in order to create and lead a project that is both equitable and effective. Because psychology, as a field, loves definitions, let's start with one. The University of Minnesota's ...

  16. 2.6 Analyzing the Data

    Typically, data are analyzed using both descriptive and inferential statistics. Descriptive statistics are used to summarize the data and inferential statistics are used to generalize the results from the sample to the population. In turn, inferential statistics are used to make conclusions about whether or not a theory has been supported ...

  17. Chapter 12 Descriptive Statistics

    Chapter 12. Descriptive Statistics. At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single ...

  18. Descriptive Statistics in Psychology

    Statistics is the branch of mathematics that studies variability, as well as the process that generates it by following the laws of probability. It's necessary both to do research and to understand how it's currently being conducted beyond the conclusions of any study. Thus, knowledge about the branch of descriptive statistics will allow ...

  19. Chapter 12: Descriptive Statistics

    Chapter 12: Descriptive Statistics. At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single ...

  20. 12.2 Describing Statistical Relationships

    Pearson's r is a measure of relationship strength (or effect size) for relationships between quantitative variables. It is the mean cross-product of the two sets of z scores. In general, values of ±.10, ±.30, and ±.50 can be considered small, medium, and large, respectively.

  21. Statistics

    Statistics. The field of statistics is concerned with collecting, analyzing, interpreting, and presenting data. Learn statistics and probability for free, in simple and easy steps starting from basic to advanced concepts. Student Resources.

  22. The 3 Descriptive Research Methods of Psychology

    Types of descriptive research. Observational method. Case studies. Surveys. Recap. Descriptive research methods are used to define the who, what, and where of human behavior and other ...

  23. Descriptive Statistics

    A Level Psychology Topic Quiz - Research Methods. Quizzes & Activities. Descriptive statistics analyse data to help describe, show or summarise it in a meaningful way. Examples are measures of central tendency and measures of dispersion.