How to Specify Histogram Breaks in R (With Examples)


By default, the hist() function in R uses Sturges’ Rule to determine how many bins to use in a histogram.

Sturges’ Rule uses the following formula to determine the optimal number of bins to use in a histogram:

Optimal Bins = ⌈log2n + 1⌉

where:

  • n: The total number of observations in the dataset.
  • ⌈ ⌉: Symbols that mean “ceiling” – i.e. round the answer up to the nearest integer.

For example, if there are 31 observations in a dataset, Sturge’s Rule will use the following formula to determine the optimal number of bins to use in a histogram:

Optimal Bins = ⌈log2(31) + 1⌉ = ⌈4.954 + 1⌉ = ⌈5.954⌉ = 6.

According to Sturges’ Rule, we should use 6 bins in the histogram to visualize this dataset.

If you use the hist() function in R, Sturges’ Rule will be used to automatically choose the number of bins to display in the histogram.

hist(data)

Even if you use the breaks argument to specify a different number of bins to use, R will only use this as a “suggestion” for how many bins to use.

hist(data, breaks=7)

However, you can use the following code to force R to use a specific number of bins in a histogram:

#create histogram with 7 bins
hist(data, breaks = seq(min(data), max(data), length.out = 8))

Note: You must use a length of n+1 for length.out where n is your desired number of bins.

The following example shows how to use this code in practice.

Example: Specify Histogram Breaks in R

Suppose we have the following dataset in R with 16 values:

#create vector of 16 values
data <- c(2, 3, 3, 3, 4, 4, 5, 6, 8, 10, 12, 14, 15, 18, 20, 21)

If we use the hist() function, R will create the following histogram with 5 bins:

#create histogram
hist(data)

Note: R used Sturges’ Rule to determine that 5 bins was the optimal number of bins to use to visualize a dataset with 16 observations.

If we attempt to use the breaks argument to specify 7 bins to use in the histogram, R will only take this as a “suggestion” and instead choose to use 10 bins:

#attempt to create histogram with 7 bins
hist(data, breaks=7)

However, we can use the following code to force R to use 7 bins in the histogram:

#create histogram with 7 bins
hist(data, breaks = seq(min(data), max(data), length.out = 8))

Notice that the result is a histogram with 7 equally-spaced bins.

Additional Resources

The following tutorials explain how to perform other common operations in R:

How to Create a Relative Frequency Histogram in R
How to Plot Multiple Histograms in R

Leave a Reply

Your email address will not be published.