How to Interpret Variability in Box Plots


A box plot is a type of plot that displays the five number summary of a dataset, which includes:

  • The minimum value
  • The first quartile (the 25th percentile)
  • The median value
  • The third quartile (the 75th percentile)
  • The maximum value

Here is how a typical box plot looks:

The most common way to measure variation in a box plot is by analyzing the interquartile range.

The interquartile range represents the spread of the middle 50% of the data.

In a box plot, it is represented by the width of the box, which ranges from the first quartile (Q1) to the third quartile (Q3)

variability in box plots

Often we create multiple box plots on one plot to compare the distribution of several datasets at once.

The following example shows how to compare the variability between several box plots in practice.

Note: We prefer to use the interquartile range to measure variability in box plots instead of the range (max value – min value) because the interquartile range is resistant to outliers.

Example: How to Analyze Variability in Box Plots

Suppose we collect data on the points scored by basketball players on three different teams.

Suppose we create the following three side-by-side box plots to visualize the distribution of points scored by players on each of the teams:

From the box plots we can see that Team B has the greatest variation in points scored because they have the greatest distance between the two ends of their box.

The interquartile range for Team B is roughly 21 – 12 = 9.

Conversely, we can see that Team C has the least variation in points scored because their box plot has the least distance between the two ends of the box.

The interquartile range for Team C is roughly 27 – 23 = 4.

This example demonstrates the benefit of using box plots to analyze variability in datasets.

By simply looking at several box plots side by side, we are able to visually compare the variability in the underlying data.

Note: Here is the exact code that we used to generate these side-by-side box plots in R:

#create data frame
df <- data.frame(team=rep(c('A', 'B', 'C'), each=8),
                 points=c(5, 5, 6, 6, 8, 9, 13, 15,
                          11, 11, 12, 14, 15, 19, 22, 24,
                          19, 23, 23, 23, 24, 26, 29, 33))

#create vertical side-by-side boxplots
boxplot(df$points ~ df$team,
        col='steelblue',
        main='Points by Team',
        xlab='Team',
        ylab='Points') 

Additional Resources

The following tutorials provide additional information about box plots:

How to Compare Box Plots
How to Identify Skewness in Box Plots
How to Create Side-by-Side Boxplots in R

Leave a Reply

Your email address will not be published. Required fields are marked *