How to Use the ggpairs() Function in R


Often you may want to create a matrix of plots in ggplot2 so that you can visualize the relationship between several variables in a data frame at the same time in a single matrix.

The easiest way to do so is by using the ggpairs() function from the GGally package, which uses the following basic syntax:

ggpairs(data, columns, …)

where:

  • data: Name of the data frame that contains the variables to plot
  • columns: The names of the columns from the data frame to use when plotting

Note that the default functionality of the ggpairs() function is to simply plot all variables available in a data frame, but you can provide a vector of specific columns to use in the plot in the columns argument.

The following example shows how to use the ggpairs() function in practice.

Note: Before you can use the ggpairs() function, you may first need to install the GGally package by using the following syntax:

install.packages('GGally')

Once you have successfully installed the GGally package then you will be able to use the ggpairs() function.

Example: How to Use the ggpairs() Function in R

Suppose that we create the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(99, 68, 86, 88, 95, 74, 78, 93),
                 assists=c(22, 28, 31, 35, 34, 45, 28, 31),
                 rebounds=c(30, 28, 24, 24, 30, 36, 30, 29))

#view data frame
df

  team points assists rebounds
1    A     99      22       30
2    A     68      28       28
3    A     86      31       24
4    A     88      35       24
5    B     95      34       30
6    B     74      45       36
7    B     78      28       30
8    B     93      31       29

Now suppose that we would like to create a matrix of plots to visualize the relationship between only the numeric variables in the data frame.

We can see that there are three numeric variables: points, assists and rebounds.

We can use the following syntax with ggpairs() to create a matrix of plots to visualize the relationships between these three variables:

library(GGally)

#create matrix of plots for numeric variables only
ggpairs(df, columns=c('points', 'assists', 'rebounds'))

This produces the following output:

The way to interpret the matrix is as follows:

  • The variable names are shown along the borders.
  • The plots along the diagonal show the density plot of each variable.
  • The plots in the lower left corner show the scatterplot relationship between each pairwise combination of variables.
  • The correlation coefficients are shown in the top right corner for each pairwise combination of variables.

From the correlation coefficients in the top right corner we can see the following:

  • The correlation coefficient between assists and points is -0.326.
  • The correlation coefficient between rebounds and points is -0.215.
  • The correlation coefficient between rebounds and assists is 0.404.

Note that a positive correlation coefficient indicates a positive linear relationship while a negative correlation coefficient indicates a negative linear relationship between the variables.

This single matrix of plots allows us to gain a strong understanding of how the values are distributed for each of the three numeric variables along with the relationship between the three variables.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Create a Density Plot with ggplot2
How to Sort Bars by Value in ggplot2
How to Add Panel Border to ggplot2
How to Fix the Aspect Ratio in ggplot2

Leave a Reply

Your email address will not be published. Required fields are marked *