Suppose we have the following dataset of 20 students along with their final exam scores in three subjects:
In this dataset, the subjects are the 20 students and the variables are the math, science, and reading final exam scores.
Related: What are variables?
In the field of statistics, we often want to summarize variables in some way so we can make sense of them. In this case, we may want to summarize the exam scores for each of these three subjects.
The most common way to summarize a variable is is to find the mean.
The mean is the average value. We find it by adding up all the individual values and dividing by the total number of values.
For example, to find the average math score we would add up each individual math score and divide by the total number of scores (in this case, 20):
(92+88++84+88+98+88+82+74+73+71+78+90+94+90+66+58+96+92+77+85) / 20 = 83.2
We can do the same calculation to find the mean score on the science exam:
(88+82+90+98+74+86+90+78+75+76+84+85+99+89+75+77+84+92+87+90) / 20 = 84.95
And the mean score on the reading exam:
(75+88+90+74+86+98+88+96+92+90+90+67+99+84+85+85+87+90+68+67) / 20 = 84.95
Another way to summarize a variable is is to find the median.
The median is the “middle” value. We find it by arranging all the individual values from smallest to largest and finding the middle value. If there are an odd number of values, the median is the middle value. If there are an even number of values, the median is the average of the two middle values.
For example, the median math score is the average of the two middle values: 86.5
58, 66, 71, 73, 74, 77, 78, 82, 84, 85, 88, 88, 88, 90, 90, 92, 92, 94, 96, 98
The median science score is 85.5:
74, 75, 75, 76, 77, 78, 82, 84, 84, 85, 86, 87, 88, 89, 90, 90, 90, 92, 98, 99
And the median reading score is 87.5:
67, 67, 68, 74, 75, 84, 85, 85, 86, 87, 88, 88, 90, 90, 90, 90, 92, 96, 98, 99
Another less common way to summarize a variable is to find the mode.
The mode is the value that occurs most often. For example, the mode on the math exam is 88 since that score shows up three times, more than any other score.
The mode on the science exam is 90.
The mode on the reading exam is 90.
Measuring “The Center”
The mean, median, and mode are commonly referred to as “measures of center” since they give us a sense of where the “center” value of a variable is located.
Among these three descriptive statistics, the mean and the median are the most commonly used.
For datasets with outliers, the median is a better measure of center because it’s more resistant to outliers. For example, consider the following dataset:
The mean income for these ten people is $298,900. The median income is $57,500.
Person J is an extreme outlier whose income pulls the mean much higher. In this case, the mean is a poor indicator of what the “center” or “typical” income is for this group of people, which is why we would prefer to use the median.
For large datasets without outliers, the median and mean will typically be similar in value. For example, the following table shows the number of wins that the Cincinnati Reds have recorded each year from 1882 to 2018:
Note: This data comes from Wander Cincinnati.
It turns out that the mean number of wins per year is 76.8 and the median number of wins is 76. Because this dataset is quite large (137 total observations) and has no outliers, it’s no surprise that the values for the mean and the median are similar to each other.
Test Your Knowledge
Find the mean, median, and mode of a list of numbers easily using the Descriptive Statistics Calculator.