A **Q-Q plot,** short for “quantile-quantile” plot, is used to assess whether or not a set of data potentially came from some theoretical distribution.

In most cases, this type of plot is used to determine whether or not a set of data follows a normal distribution.

If the data is normally distributed, the points in a Q-Q plot will lie on a straight diagonal line.

Conversely, if the points deviate significantly from the straight diagonal line, then it’s less likely that the data is normally distributed.

To create a Q-Q plot in ggplot2, you can use the** stat_qq()** and **stat_qq_line()** functions as follows:

library(ggplot2) ggplot(df, aes(sample=y)) + stat_qq() + stat_qq_line()

The following examples show how to use this syntax to create a Q-Q plot in two different scenarios.

**Example 1: Q-Q Plot for Normal Data**

The following code shows how to generate a normally distributed dataset with 200 observations and create a Q-Q plot for the dataset in R:

library(ggplot2) #make this example reproducible set.seed(1) #create some fake data that follows a normal distribution df <- data.frame(y=rnorm(200)) #create Q-Q plot ggplot(df, aes(sample=y)) + stat_qq() + stat_qq_line()

We can see that the points lie mostly along the straight diagonal line with some minor deviations along each of the tails.

Based on this plot, we would assume that this set of data is normally distributed.

Note that we could also use the color and size arguments to change the color and size of the points in the plot if we’d like to:

library(ggplot2) #make this example reproducible set.seed(1) #create some fake data that follows a normal distribution df <- data.frame(y=rnorm(200)) #create Q-Q plot ggplot(df, aes(sample=y)) + stat_qq(size=2.5, color='red') + stat_qq_line()

**Example 2: Q-Q Plot for Non-Normal Data**

The following code shows how to create a Q-Q plot for a dataset that follows an exponential distribution with 200 observations:

#make this example reproducible set.seed(1) #create some fake data that follows an exponential distribution df <- data.frame(y=rexp(200, rate=3)) #create Q-Q plot ggplot(df, aes(sample=y)) + stat_qq() + stat_qq_line()

We can see that the points deviate significantly from the straight diagonal line. This is a clear indication that the set of data is not normally distributed.

This should make sense considering we specified that the data should follow an exponential distribution.

**Additional Resources**

The following tutorials explain how to perform other common tasks in ggplot2:

How to Plot Multiple Lines in ggplot2

How to Plot Mean and Standard Deviation in ggplot2

How to Change Line Colors in ggplot2