How to Group by Multiple Columns in R


Often you may want to group by multiple columns and calculate some aggregate statistic in a data frame in R.

Fortunately this is easy to do by using the group_by() function from the dplyr package in R, which is designed to perform this exact task.

You can use the following basic syntax to group by multiple columns using the group_by() function:

library(dplyr)

df %>%
  group_by(team, position) %>%
  summarize(points_sum=sum(points))

This particular example groups the data frame named df by the columns named team and position, then calculates the sum of values in the points column.

Note that you can use as many column names as you’d like within the group_by() function to group by as many columns as you would like before using the summarize() function to calculate a summary statistic.

The following example shows how to use the group_by() function from the dplyr package in practice to group by multiple columns.

Note that you may need to first use the following syntax to install the dplyr package if it is not already installed:

install.packages('dplyr')

Once the package is installed, you can proceed to use the group_by() function to group by multiple columns.

Example: How to Group by Multiple Columns in R

Suppose we create the following data frame that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 position=c('G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'),
                 points=c(22, 28, 31, 35, 34, 45, 28, 31),
                 assists=c(8, 10, 12, 12, 8, 4, 3, 9))

#view data frame
df

The data frame contains the following columns:

  • team: The team name the player belongs to
  • position: The position of the player (G=Guard, F=Forward)
  • points: The total points scored by the player
  • assists: The total assists made by the player

Suppose that we would like to group the rows by the team and position columns and then calculate the sum of points scored.

We can use the following syntax to do so:

library(dplyr)

#group by team and position columns, then calculate sum of points
df %>%
  group_by(team, position) %>%
  summarize(points_sum=sum(points))

# A tibble: 4 x 3
# Groups:   team [2]
  team  position points_sum
            
1 A     F                66
2 A     G                50
3 B     F                59
4 B     G                79

 The output displays the sum of points scored, grouped by team and position.

For example, we can see:

  • Players on team A and in position F scored a total of 66 points.
  • Players on team A and in position G scored a total of 50 points.
  • Players on team B and in position F scored a total of 59 points.
  • Players on team B and in position G scored a total of 79 points.

Note that you can replace the sum function with any other function you’d like (mean, max, median, etc.) to calculate a different summary statistic.

Note: You can find the complete documentation for the group_by() function from the dplyr package here.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Insert Row into Data Frame in R
How to Append Values to List in R
How to Convert Data Frame Column to List in R
How to Count Number of Elements in List in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *