How to Calculate Five Number Summary in R (With Examples)


five number summary is a way to summarize a dataset using the following five values:

  • The minimum
  • The first quartile
  • The median
  • The third quartile
  • The maximum

The five number summary is useful because it provides a concise summary of the distribution of the data in the following ways:

  • It tells us where the middle value is located, using the median.
  • It tells us how spread out the data is, using the first and third quartiles.
  • It tells us the range of the data, using the minimum and the maximum.

The easiest way to calculate a five number summary of a dataset in R is to use the fivenum() function from base R:

fivenum(data)

The following example shows how to use this syntax in practice.

Example 1: Five Number Summary of Vector

The following code shows how to calculate the five number summary of a numeric vector in R:

#define numeric vector
data <- c(4, 6, 6, 7, 8, 9, 12, 13, 14, 15, 15, 18, 22)

#calculate five number summary of data
fivenum(data)

[1]  4  7 12 15 22

From the output we can see:

  • The minimum: 4
  • The first quartile: 7
  • The median: 12
  • The third quartile: 15
  • The maximum: 22

We can quickly visualize the five number summary by creating a boxplot:

boxplot(data)

[1]  4  7 12 15 22

Here’s how to interpret the boxplot:

  • The line at the bottom of the plot represents the minimum value (4).
  • The line at the bottom of the box represents the first quartile (7).
  • The line in the middle of the box represents the median (12).
  • The line at the top of the box represents the third quartile (15).
  • The line at the top of the plot represents the maximum value (22).

Example 2: Five Number Summary of Column in Data Frame

The following code shows how to calculate the five number summary of a specific column in a data frame:

#create data frame
df <- data.frame(team=c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
                 points=c(99, 90, 86, 88, 95, 87, 85, 89),
                 assists=c(33, 28, 31, 39, 34, 30, 29, 25),
                 rebounds=c(30, 28, 24, 24, 28, 30, 31, 35))

#calculate five number summary of points column
fivenum(df$points)

[1] 85.0 86.5 88.5 92.5 99.0

Example 3: Five Number Summary of Multiple Columns

The following code shows how to use the sapply() function to calculate the five number summary of several columns in a data frame at once:

#create data frame
df <- data.frame(team=c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
                 points=c(99, 90, 86, 88, 95, 87, 85, 89),
                 assists=c(33, 28, 31, 39, 34, 30, 29, 25),
                 rebounds=c(30, 28, 24, 24, 28, 30, 31, 35))

#calculate five number summary of points, assists, and rebounds column
sapply(df[c('points', 'assists', 'rebounds')], fivenum)

     points assists rebounds
[1,]   85.0    25.0     24.0
[2,]   86.5    28.5     26.0
[3,]   88.5    30.5     29.0
[4,]   92.5    33.5     30.5
[5,]   99.0    39.0     35.0

Related: A Guide to apply(), lapply(), sapply(), and tapply() in R

Additional Resources

How to Create Summary Tables in R
How to Find the Range in R
How to Remove Outliers in R

Leave a Reply

Your email address will not be published.