How to Plot Distribution of Column Values in R


You can use the following methods to plot a distribution of column values in R:

Method 1: Plot Distribution of Values Using Density Plot

plot(density(df$my_column))

Method 2: Plot Distribution of Values Using Histogram

hist(df$my_column)

The following examples show how to use each method in practice with the following data frame

#create data frame
df = data.frame(team=rep(c('A', 'B'), each=10),
                points=c(3, 3, 4, 5, 4, 7, 7, 7, 10, 11, 8,
                         7, 8, 9, 12, 12, 12, 14, 15, 17))

#view data frame
df

   team points
1     A      3
2     A      3
3     A      4
4     A      5
5     A      4
6     A      7
7     A      7
8     A      7
9     A     10
10    A     11
11    B      8
12    B      7
13    B      8
14    B      9
15    B     12
16    B     12
17    B     12
18    B     14
19    B     15
20    B     17

Example 1: Plot Distribution of Values Using Density Plot

The following code shows how to plot the distribution of values in the points column using a density plot:

#plot distribution of values in points column
plot(density(df$points)) 

This syntax produces a smooth curve that summarizes the distribution of values for a variable.

Note that we can also modify the title, axis labels, and color of the line in the density plot if we’d like:

#plot distribution of values in points column
plot(density(df$points), col='red', main='Density Plot of Points', xlab='Points')

plot distribution of column values in R using density plot

Example 2: Plot Distribution of Values Using Histogram

The following code shows how to plot the distribution of values in the points column using a histogram:

#plot distribution of values in points column using histogram
hist(df$points)

A histogram uses bars to represent frequencies of values in the points column as opposed to a smooth line that summarizes the shape of the distribution.

Note that we can also modify the title, axis labels, color, and number of breaks used in the histogram:

#plot distribution of values in points column using histogram
hist(df$points, main='Histogram of Points', xlab='Points', col='steelblue', breaks=12)

plot distribution of column values in R using histogram

Note: The larger the value you choose for the breaks argument, the more bars there will be in the histogram.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Add Vertical Line to Histogram in R
How to Create Kernel Density Plots in R
How to Overlay Density Plots in ggplot2

Leave a Reply

Your email address will not be published. Required fields are marked *