Two plots that are commonly used to visualize the distribution of values in a dataset are dot plots and histograms.
A dot plot displays individual data values along the x-axis and uses dots to represent the frequencies of each individual value.
A histogram displays data ranges along the x-axis and uses rectangular bars to represent the frequencies of values that fall into each range.
The following example shows how to create a dot plot and histogram for the same dataset.
Example: Creating a Dot Plot & Histogram for Same Dataset
Suppose we have the following dataset with 18 values:
Data: 1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 5, 6, 6, 6, 6, 7, 8, 10
Here is what a dot plot would look like for this dataset:
The x-axis shows the individual data values and the y-axis shows the frequency of each value.
For example, we can see the value “2” occurs three times in the dataset because there are three dots above it. Similarly, we can see that the value “3” occurs just once because there is only one dot above it.
And here is what a histogram would look like for this dataset:
The x-axis shows ranges of values (0-2, 2-4, 4-6, 6-8 , 8-10) and the y-axis uses rectangular bars to represent the frequency of individual values in the dataset that fall into each range.
For example, we can see that seven values are between 0 and 2, two values are between 2 and 4, and so on.
Bonus: For those who are curious, we used the following R code to create the dot plot and histogram shown above:
#define dataset data <- c(1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 5, 6, 6, 6, 6, 7, 8, 10) #create dot plot stripchart(data, method = "stack", offset = .5, at = 0, pch = 19, cex=5, col = "steelblue", main = "Dot Plot", xlab = "Data Values", ylab="Frequency") #create histogram hist(data, col='steelblue', main='Histogram', xlab='Data Values')
Dot Plot vs. Histogram: Which Should You Use?
As mentioned earlier, both a dot plot and a histogram can be used to visualize the distribution of values in a dataset.
As a rule of thumb, we typically use dot plots when our dataset is small because it allows us to see exactly how many times each individual value occurs.
Conversely, we typically use histograms when our dataset is large because it’s cumbersome to create a dot to represent every single individual value in a large dataset.
Keep in mind that the one drawback of using a histogram is that we can’t tell exactly how many times each individual value occurs.
For example, in the histogram from earlier we saw that seven values fell in the range of 0 to 2, but we don’t know exactly how many values were equal to 1 and how many values were equal to 2.
If we’re just interested in understanding the general “shape” of a distribution, then it usually isn’t a big deal that we don’t know the individual values in a dataset.
Also keep in mind that we can’t calculate the exact median or average by just looking at a histogram because we don’t know the individual values.
The following tutorials offer additional information on histograms:
The following tutorials offer additional information on dot plots: