# How to Create Beautiful Violin Plots in R

Similar to boxplots, violin plots are used to visualize the distribution of continuous data. Unlike boxplots, though, violin plots show the kernel probability density of the data as opposed to just the five-number summary of the data.

This tutorial explains how to easily create violin plots in R using the library ggplot2.

## Violin Plots in R: The Basics

For each of the following examples, we will use the built-in R dataset mtcars:

```#view first six rows of mtcars dataset

#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
```

The following code illustrates how to create a basic violin plot using gear as the x-axis categorical variable and hp as the y-axis continuous variable:

```#load ggplot2
library(ggplot2)

#create violin plot
ggplot(data = mtcars, aes(x = factor(gear), y = hp)) +
geom_violin()```

If we compare this violin plot to a box plot using the same data, we can see that the two plots are similar, but the violin plot offers a more granular view of the shape of the distributions:

```#load necessary libraries
library(ggplot2) #for creating plots
library(gridExtra) #for putting two plots side by side

#create boxplot
box <- ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_boxplot()

#create violin plot
violin <- ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_violin()

#show boxplot and violin plot side by side
grid.arrange(box, violin, ncol=2)```

The boxplots show a five-number summary of the data – the minimum, first quartile, median, third quartile, and maximum – while the violin plots show the actual kernel probability density of the data.

Depending on who your audience is, the boxplot may be preferable if you want a simple plot to summarize your data. If you want to show a more granular look at the data, though, a violin plot is likely a better choice.

### Flipping the Coordinates

While it’s most common to display the categorical variable along the x-axis and the continuous variable along the y-axis, it’s also possible to flip the coordinates of the violin plot using the coord_flip() argument:

```ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_violin() +
coord_flip()
```

### Displaying Raw Data Values

We can also display the raw data values as individual points on the plot using geom_jitter(), which ensures that any identical data values don’t overlap. Using this approach, we not only see the distribution of the data, but we see the raw data itself.

```ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_violin() +
geom_jitter(width = 0.1)```

We can also change the size and color of the individual points:

```ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_violin() +
geom_jitter(size = 3, color = 'steelblue', width = 0.1)```

### Displaying Quantiles

We mentioned earlier that a nice feature of using boxplots is that we can easily see the median value of the data along with the first and third quantiles. Well, we can also do this with violin plots by using the argument draw_quantiles:

```ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_violin(draw_quantiles = c(0.25, 0.5, 0.75))```

We can easily modify the border color and the fill color of the violin plots by using the following code:

```ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_violin(fill = 'steelblue', color = 'black')```

We can also set the color to be equal to some other variable in the dataset like am:

```ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_violin(aes(fill = factor(am)))```

## Customizing Labels

We can customize the title and axis labels of the chart as well:

```ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_violin(fill = 'steelblue', color = 'black') +
labs(title = 'Distribution of hp by gear', x = 'Gear', y = 'hp')```

### Customizing Themes

Lastly, we can modify the theme of the chart to make it look even better. Using the code below, we make the following modifications:

• Enlarge, center, and bold the title
• Use the built in ggplot2 classic theme, which gives the chart a minimalist look
• Enlarge the text on each axis
```ggplot(mtcars, aes(x = factor(gear), y = hp)) +
geom_violin(fill = 'steelblue', color = 'black') +
labs(title = 'Distribution of hp by gear', x = 'Gear', y = 'hp') +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, size = 20, face = 'bold'),
text = element_text(size = 16))```