Often you may want to group by multiple columns and calculate some aggregate statistic in a data frame in R.

Fortunately this is easy to do by using the **group_by()** function from the **dplyr** package in R, which is designed to perform this exact task.

You can use the following basic syntax to group by multiple columns using the **group_by()** function:

library(dplyr) df %>% group_by(team, position) %>% summarize(points_sum=sum(points))

This particular example groups the data frame named **df** by the columns named **team** and **position**, then calculates the sum of values in the **points** column.

Note that you can use as many column names as you’d like within the **group_by()** function to group by as many columns as you would like before using the **summarize()** function to calculate a summary statistic.

The following example shows how to use the **group_by()** function from the **dplyr** package in practice to group by multiple columns.

Note that you may need to first use the following syntax to install the **dplyr** package if it is not already installed:

install.packages('dplyr')

Once the package is installed, you can proceed to use the **group_by()** function to group by multiple columns.

**Example: How to Group by Multiple Columns in R**

Suppose we create the following data frame that contains information about various basketball players:

#create data frame df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'), position=c('G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'), points=c(22, 28, 31, 35, 34, 45, 28, 31), assists=c(8, 10, 12, 12, 8, 4, 3, 9)) #view data frame df

The data frame contains the following columns:

**team**: The team name the player belongs to**position**: The position of the player (G=Guard, F=Forward)**points**: The total points scored by the player**assists**: The total assists made by the player

Suppose that we would like to group the rows by the **team** and **position** columns and then calculate the sum of **points** scored.

We can use the following syntax to do so:

library(dplyr) #group by team and position columns, then calculate sum of points df %>% group_by(team, position) %>% summarize(points_sum=sum(points)) # A tibble: 4 x 3 # Groups: team [2] team position points_sum 1 A F 66 2 A G 50 3 B F 59 4 B G 79

** **The output displays the sum of points scored, grouped by team and position.

For example, we can see:

- Players on team A and in position F scored a total of
**66**points. - Players on team A and in position G scored a total of
**50**points. - Players on team B and in position F scored a total of
**59**points. - Players on team B and in position G scored a total of
**79**points.

Note that you can replace the **sum **function with any other function you’d like (**mean**, **max**, **median**, etc.) to calculate a different summary statistic.

**Note**: You can find the complete documentation for the **group_by()** function from the **dplyr** package here.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Insert Row into Data Frame in R

How to Append Values to List in R

How to Convert Data Frame Column to List in R

How to Count Number of Elements in List in R