How to Create Beautiful Violin Plots in R

How to create violin plots in R

Similar to boxplots, violin plots are used to visualize the distribution of continuous data. Unlike boxplots, though, violin plots show the kernel probability density of the data as opposed to just the five-number summary of the data.

This tutorial explains how to easily create violin plots in R using the library ggplot2.

Violin Plots in R: The Basics

For each of the following examples, we will use the built-in R dataset mtcars:

#view first six rows of mtcars dataset
head(mtcars)

#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The following code illustrates how to create a basic violin plot using gear as the x-axis categorical variable and hp as the y-axis continuous variable:

#load ggplot2 
library(ggplot2)

#create violin plot
ggplot(data = mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin()

Basic violin plot in R with ggplot2

If we compare this violin plot to a box plot using the same data, we can see that the two plots are similar, but the violin plot offers a more granular view of the shape of the distributions:

#load necessary libraries
library(ggplot2) #for creating plots
library(gridExtra) #for putting two plots side by side

#create boxplot
box <- ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_boxplot()

#create violin plot
violin <- ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin()

#show boxplot and violin plot side by side
grid.arrange(box, violin, ncol=2)

Boxplot and violin plot side by side in ggplot2

The boxplots show a five-number summary of the data – the minimum, first quartile, median, third quartile, and maximum – while the violin plots show the actual kernel probability density of the data.

Depending on who your audience is, the boxplot may be preferable if you want a simple plot to summarize your data. If you want to show a more granular look at the data, though, a violin plot is likely a better choice.

Flipping the Coordinates

While it’s most common to display the categorical variable along the x-axis and the continuous variable along the y-axis, it’s also possible to flip the coordinates of the violin plot using the coord_flip() argument:

ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin() +
  coord_flip()

Violin plot with flipped coordinates in ggplot2

Displaying Raw Data Values

We can also display the raw data values as individual points on the plot using geom_jitter(), which ensures that any identical data values don’t overlap. Using this approach, we not only see the distribution of the data, but we see the raw data itself.

ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin() +
  geom_jitter(width = 0.1)

Violin plots with raw data points in ggplot2

We can also change the size and color of the individual points:

ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin() +
  geom_jitter(size = 3, color = 'steelblue', width = 0.1)

Violin plot with colored individual points in R using ggplot2

Displaying Quantiles

We mentioned earlier that a nice feature of using boxplots is that we can easily see the median value of the data along with the first and third quantiles. Well, we can also do this with violin plots by using the argument draw_quantiles:

ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin(draw_quantiles = c(0.25, 0.5, 0.75))

Violin plots with quantiles in ggplot2

Adding Colors

We can easily modify the border color and the fill color of the violin plots by using the following code:

ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin(fill = 'steelblue', color = 'black')

geom_violin example in R

We can also set the color to be equal to some other variable in the dataset like am:

ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin(aes(fill = factor(am)))

Violin plot color by variable in R

Customizing Labels

We can customize the title and axis labels of the chart as well:

ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin(fill = 'steelblue', color = 'black') +
  labs(title = 'Distribution of hp by gear', x = 'Gear', y = 'hp')

Violin plot with labels in R

Customizing Themes

Lastly, we can modify the theme of the chart to make it look even better. Using the code below, we make the following modifications:

  • Enlarge, center, and bold the title
  • Use the built in ggplot2 classic theme, which gives the chart a minimalist look
  • Enlarge the text on each axis
ggplot(mtcars, aes(x = factor(gear), y = hp)) + 
  geom_violin(fill = 'steelblue', color = 'black') +
  labs(title = 'Distribution of hp by gear', x = 'Gear', y = 'hp') +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, size = 20, face = 'bold'),
        text = element_text(size = 16))

Violin plot in R with custom theme

Leave a Reply

Your email address will not be published. Required fields are marked *