When to Use Mean vs. Median (With Examples)


The mean of a dataset represents the average value of the dataset. It is calculated as:

Mean = Σxi / n

where:

  • Σ: A symbol that means “sum”
  • xi: The ith observation in a dataset
  • n: The total number of observations in the dataset

The median represents the middle value of a dataset. It is calculated by arranging all of the observations in a dataset from smallest to largest and then identifying the middle value. 

For example, suppose we have the following dataset with 11 observations:

Dataset: 3, 4, 4, 6, 7, 8, 12, 13, 15, 16, 17

The mean of the dataset is calculated as:

Mean = (3+4+4+6+7+8+12+13+15+16+17) / 11 = 9.54

The median of the dataset is the value directly in the middle, which turns out to be 8:

3, 4, 4, 6, 7, 8, 12, 13, 15, 16, 17

Both the mean and the median estimate where the center of a dataset is located. However, depending on the nature of the data, either the mean or the median may be more useful for describing the center of the dataset.

When to Use the Mean

It’s best to use the mean to describe the center of a dataset when the distribution is mostly symmetrical and there are no outliers.

For example, suppose we have the following distribution that shows the salaries of residents in a certain city:

Since this distribution is fairly symmetrical (if you split it down the middle, each half would look roughly equal) and there are no outliers, we can use the mean to describe the center of this dataset.

The mean turns out to be $63,000, which is located approximately in the center of the distribution:

When to Use the Median

It is best to use the median when the distribution is either skewed or there are outliers present.

Skewed Data:

When a distribution is skewed, the median does a better job of describing the center of the distribution than the mean.

For example, consider the following distribution of salaries for residents in a certain city:

The median does a better job of capturing the “typical” salary of a resident than the mean. This is because the large values on the tail end of the distribution tend to pull the mean away from the center and towards the long tail.

In this example, the mean tells us that the typical individual earns about $47,000 per year while the median tells us that the typical individual only earns about $32,000 per year, which is much more representative of the typical individual.

Outliers:

The median also does a better job of capturing the central location of a distribution when there are outliers present in the data. For example, consider the following chart that shows the square footage of houses on a certain street:

When to use the mean vs. the median

The mean is heavily influenced by a couple extremely large houses, while the median is not. Thus, the median does a better job of capturing the “typical” square footage of a house on this street compared to the mean.

Summary

In summary:

  • Both the mean and the median can be used to describe where the “center” of a dataset is located.
  • It’s best to use the mean when the distribution of the data values is symmetrical and there are no clear outliers.
  • It’s best to use the median when the the distribution of data values is skewed or when there are clear outliers.

Additional Resources

How Do Outliers Affect the Mean?
How to Estimate the Mean and Median of Any Histogram
How to Find the Mean & Median of Stem-and-Leaf Plots

5 Replies to “When to Use Mean vs. Median (With Examples)”

  1. thanks for this post.
    I have a question, why would using the mean be more beneficial if the distribution of the data values is symmetrical ? Wouldn’t the median in that case also be the same as the mean ?
    So you might as well just always use the median ?

  2. In a fictitious society where ALL adult females or exactly 4 ft tall, and ALL adult males are exactly 6 ft tall, how would you avoid the inaccurate and false conclusion that ALL people within the society have and average (Mean) height of 5 ft, when absolutely no one exist in that society who is of that height? Interpreting by Median value would not solve this paradox either. Is any result simply considered a deception of data?

    1. Never thought of THAT, but average and mean are not exact values , sre they lets say in this fictitious group of 10, 5 females and 5 males we come up with 4.4, , meaning the average height in this group s 4.4, meaning again the shortest persons height is close to this
      let me try another group of 100
      50 females=200
      50 males=300
      500/100=5
      here the mean/ average increases but still is within the 6 ft limit
      let me try another
      500 females=2000
      500 males=3000
      5000/1000=5
      From these calculations you can see the mean or average falls between 4 and 6 , I dont think t will go below or above either value since as I said its just an average not exact value..when you say 5 what you are saying is the average height of adults in this group 0f 1000 Is 5 not exact height
      no time for more calculations
      if you can come up with a value higher than 6 or lower than 4 let me know

  3. Thanks for the useful explanation Zack

    Would it be fair to say that in a normal distribution, the mean and the median would be close together i.e. similar?

    The reason I ask is that if I don’t know whether the data is skewed or not, then would I be better off using the median?

    Regards
    Aztrix

Leave a Reply

Your email address will not be published. Required fields are marked *