The interquartile range is the difference between the first quartile and the third quartile in a dataset. It is a measure of the spread of the middle portion of a dataset.
This tutorial explains how to calculate the interquartile range of a dataset in R.
Calculating the Interquartile Range in R
R has a built-in function IQR() that calculates the interquartile range of a dataset.
#create dataset data <- c(1, 3, 4, 5, 5, 6, 6, 7, 8, 12, 13, 14) #find interquartile range of dataset IQR(data) # 4.25
The IQR() function actually just uses the quantile() function in R to find the first and third quartile and the difference between them.
For example, the following code produces the same value for the interquartile range:
#find first quartile Q1 <- quantile(data, 0.25) #find third quartile Q3 <- quantile(data, 0.75) #find the difference between third quartile and first quartile Q3 - Q1 # 75% #4.25
Unfortunately, there seems to be no universally accepted answer for the correct formula to use to calculate the interquartile range of a dataset.
In fact, the IQR() function offers nine different quantile algorithms that can be used to calculate the interquartile range. Find the nine algorithms on the R quantile help page. Alternatively, in R you can type ?quantile to see the nine different algorithms.
You can specify the type of quantile algorithm to use with the type argument. By default, R uses type = 7.
IQR(data, type = 7) # 4.25
Type 3 is the method used by the statistical software SAS.
IQR(data, type = 3) # 4
Type 6 is the method used by the statistical softwares Minitab and SPSS.
IQR(data, type = 6) # 6.75
Which Method is Best for Calculating the Interquartile Range?
Notice that depending on the type of quantile algorithm we specify, we could get different results for the interquartile range. This begs the question: which method is best for calculating the interquartile range?
Unfortunately there is no clear answer. Even major statistical software packages don’t agree on which method is best to use. Namely, SAS, Minitab and SPSS, and R all use three different methods. In addition, some statistics textbooks use different methods still.
In general, the differences between these methods become negligible for large sample sizes. For example, suppose we have a dataset with 500 values. The following code illustrates the interquartile range value produced by three different methods:
#make this example reproducible set.seed(0) #create dataset with 500 values uniformally distributed between 1 and 100 data <- runif(500, 1, 100) IQR(data, type = 3) #SAS method # 47.39249 IQR(data, type = 6) #Minitab and SPSS method # 47.46096 IQR(data, type = 7) #R default method # 47.31338
For even larger sample sizes, the interquartile range values produced by these different methods are even closer together:
#make this example reproducible set.seed(0) #create dataset with 5000 values uniformally distributed between 1 and 100 data <- runif(5000, 1, 100) IQR(data, type = 3) #SAS method # 51.13757 IQR(data, type = 6) #Minitab and SPSS method # 51.15794 IQR(data, type = 7) #R default method # 51.11651
This is good news because analyzing quantiles makes the most sense for large sample sizes. Since quartiles are intended to separate a dataset into four “groups” of equal size, it doesn’t make much since to do that with a small dataset (e.g. how do you do that with a dataset of only 10 elements?).
Thus, if your sample size is large enough then the method you use to calculate the interquartile range in R doesn’t matter very much.