The median represents the middle value of a dataset.
It is calculated by arranging all of the observations in a dataset from smallest to largest and then identifying the middle value.
There are two main advantages of using the median to describe the center of a dataset:
Advantage #1: The median is not affected by outliers. Since the median only finds the middle value of a dataset, it isn’t affected by extremely small or large values on either end of a dataset.
Advantage #2: The median is a good measure of center for skewed datasets. When a dataset is skewed to the left or right, the median still does a good job of identifying the center value of a dataset, unlike the mean which is heavily affected by skewed distributions.
However, there are two potential disadvantages of using the median to summarize a dataset:
Disadvantage #1: The median does not use all of the observations in a dataset in its calculation. In statistics, we usually say it’s a good thing if we can use all observations in a dataset because then we are using all of the available information from our data. However, the median does not consider the information from extremely small or large values in a dataset.
Disadvantage #2: The median cannot be used to find the sum of all observations in the dataset. If we know the mean and the total sample size of a dataset, we can find the sum of all values in the dataset. However, we cannot do the same with the median.
The following examples illustrate these advantages and disadvantages in practice.
Example 1: The Advantages of Using the Median
Suppose we have a distribution of salaries that is right skewed and we decide to calculate both the mean and median salary:
The mean tells us that the typical individual earns about $47,000 per year while the median tells us that the typical individual only earns about $32,000 per year, which is much more representative of the typical individual.
In this example, the mean is affected by the higher values on the right tail of the distribution while the median is not.
Or suppose we have another distribution that contains information about the square footage of houses on a certain street and we decide to calculate both the mean and median of the dataset:
The mean is influenced by a couple extremely large houses, which causes it to take on a much larger value.
However, the median is unaffected by these outliers and thus provides a much better measure of the “typical” square footage of a house on this street.
Example 2: The Disadvantages of Using the Median
Recall the first potential disadvantage of the median:
Disadvantage #1: The median does not use all of the observations in a dataset in its calculation.
For example, suppose we have the following dataset that shows the distribution of exam scores for students in a class:
Scores: 68, 70, 71, 75, 78, 82, 83, 83, 85, 90, 91, 91, 92
The median exam score is 83.
Now suppose we have the same dataset but the lowest three exam scores are much lower:
Scores: 22, 35, 38, 75, 78, 82, 83, 83, 85, 90, 91, 91, 92
The median exam score in this distribution is still 83.
This is why we say the median does not use all of the available information in a dataset: It doesn’t take into account the actual values of the data since it is only a measure of position.
Now recall the second potential disadvantage of the median:
Disadvantage #2: The median cannot be used to find the sum of all observations in the dataset.
Suppose we have the following dataset that contains information about the total sales made by 11 different employees during a particular quarter:
Sales: 12, 12, 15, 19, 22, 24, 28, 30, 32, 35, 38
We know the median value is 24 and we know that there are 11 total employees. However, we can’t use this information to find the total sum of sales for all the employees.
By contrast, if we knew that the mean value was 24 and there were 11 total employees, we could simply multiply 24 by 11 to find that the total sum of sales is 24 * 11 = 264.
Note: Depending on the distribution of your data and the problem you’re trying to solve, the mean or median could turn out to be the preferred metric to use.
The following tutorials provide additional information about the mean and median in statistics: