How to Create and Interpret Pairs Plots in R


A pairs plot is a matrix of scatterplots that lets you understand the pairwise relationship between different variables in a dataset.

Fortunately it’s easy to create a pairs plot in R by using the pairs() function. This tutorial provides several examples of how to use this function in practice.

Example 1: Pairs Plot of All Variables

The following code illustrates how to create a basic pairs plot for all variables in a data frame in R:

#make this example reproducible 
set.seed(0)

#create data frame 
var1 <- rnorm(1000)
var2 <- var1 + rnorm(1000, 0, 2)
var3 <- var2 - rnorm(1000, 0, 5)
 
df <- data.frame(var1, var2, var3)

#create pairs plot 
pairs(df)

The way to interpret the matrix is as follows:

  • The variable names are shown along the diagonals boxes.
  • All other boxes display a scatterplot of the relationship between each pairwise combination of variables. For example, the box in the top right corner of the matrix displays a scatterplot of values for var1 and var3. The box in the middle left displays a scatterplot of values for var1 and var2, and so on.

This single plot gives us an idea of the relationship between each pair of variables in our dataset. For example, var1 and var2 seem to be positively correlated while var1 and var3 seem to have little to no correlation.

Example 2: Pairs Plot of Specific Variables

The following code illustrates how to create a basic pairs plot for just the first two variables in a dataset:

#create pairs plot for var1 and var2 only
pairs(df[, 1:2])

Pairs plot of specific variables in R

Example 3: Modify the Aesthetics of a Pairs Plot

The following code illustrates how to modify the aesthetics of a pairs plot, including the title, the color, and the labels:

pairs(df,
      col = 'blue', #modify color
      labels = c('First', 'Second', 'Third'), #modify labels
      main = 'Custom Title') #modify title

Custom pairs plot in R

Example 4: Obtaining Correlations with ggpairs

You can also obtain the Pearson correlation coefficient between variables by using the ggpairs() function from the GGally library. The following code illustrates how to use this function:

#install necessary libraries
install.packages('ggplot2')
install.packages('GGally')

#load libraries
library(ggplot2)
library(GGally)

#create pairs plot
ggpairs(df)

ggpairs function in R example

The way to interpret this matrix is as follows:

  • The variable names are displayed on the outer edges of the matrix.
  • The boxes along the diagonals display the density plot for each variable.
  • The boxes in the lower left corner display the scatterplot between each variable.
  • The boxes in the upper right corner display the Pearson correlation coefficient between each variable. For example, the correlation between var1 and var2 is 0.425.

The benefit of using ggpairs() over the base R function pairs() is that you can obtain more information about the variables. Specifically, you can see the correlation coefficient between each pairwise combination of variables as well as a density plot for each individual variable.

You can find the complete documentation for the ggpairs() function here.

Leave a Reply

Your email address will not be published.