R: How to Group By and Count with Condition


You can use the following basic syntax to perform a group by and count with condition in R:

library(dplyr)

df %>%
  group_by(var1) %>%
  summarize(count = sum(var2 == 'val'))

This particular syntax groups the rows of the data frame based on var1 and then counts the number of rows where var2 is equal to ‘val.’

The following example shows how to use this syntax in practice.

Example: Group By and Count with Condition in R

Suppose we have the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 pos=c('Gu', 'Fo', 'Fo', 'Fo', 'Gu', 'Gu', 'Fo', 'Fo'),
                 points=c(18, 22, 19, 14, 14, 11, 20, 28))


#view data frame
df

  team pos points
1    A  Gu     18
2    A  Fo     22
3    A  Fo     19
4    A  Fo     14
5    B  Gu     14
6    B  Gu     11
7    B  Fo     20
8    B  Fo     28

The following code shows how to group the data frame by the team variable and count the number of rows where the pos variable is equal to ‘Gu’:

library(dplyr)

#group by team and count rows where pos is 'Gu'
df %>%
  group_by(team) %>%
  summarize(count = sum(pos == 'Gu'))

# A tibble: 2 x 2
  team  count
   
1 A         1
2 B         2

From the output we can see:

  • Team A has 1 row where the pos column is equal to ‘Gu’
  • Team B has 2 rows where the pos column is equal to ‘Gu’

We can use similar syntax to perform a group by and count with some numerical condition.

For example, the following code shows how to group by the team variable and count the number of rows where the points variable is greater than 15:

library(dplyr)

#group by team and count rows where pos is 'Gu'
df %>%
  group_by(team) %>%
  summarize(count = sum(points > 15))

# A tibble: 2 x 2
  team  count
   
1 A         3
2 B         2

From the output we can see:

  • Team A has 3 rows where the points column is greater than 15
  • Team B has 2 rows where the points column is greater than 15 

You can use similar syntax to perform a group by and count with any specific condition you’d like.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Count Values in Column with Condition in R
How to Select Top N Values by Group in R

Leave a Reply

Your email address will not be published.