Histograms are commonly used to analyze the “shape” of a data distribution.
Shapes of Distributions
Suppose we have this histogram that shows the number of pets that each family in your neighborhood owns:
We would describe the distribution below as left-skewed because it has a “tail” on the left side that skews the distribution to the left:
We would describe the distribution below as right-skewed because it has a “tail” on the right side that skews the distribution to the right:
We would describe the distribution below as bimodal because it has two (hence the bi) “peaks”, one on the left and one on the right:
We would describe the distribution below as uniform since the values are roughly “uniform” all the way across:
Histograms also help us identify the center (the median) and the spread of a distribution.
In the distribution below, the center is located at three:
And between the two distributions below, A is more “spread out” than B:
Identifying Unusual Features
Histograms can also be used to identify unusual features like gaps and outliers.
A gap is simply an area in a distribution with no observations. In the distribution below, there is a gap in the middle:
An outlier is a value that is significantly different than all the other values in a dataset. In the distribution below there is a family with 15 pets, which could be considered an outlier:
Note: In general, a value is considered an outlier if it is 1.5 interquartile ranges above the third quartile (Q3) or 1.5 interquartile ranges below the first quartile (Q1).