You can use the following basic syntax in dplyr to use the mutate() function to create a new column based on multiple conditions:
library(dplyr) df <- df%>% mutate(class = case_when((team == 'A' & points >= 20) ~ 'A_Good', (team == 'A' & points < 20) ~ 'A_Bad', (team == 'B' & points >= 20) ~ 'B_Good', TRUE ~ 'B_Bad'))
This particular syntax creates a new column called class that takes on the following values:
- A_Good if team is equal to A and points is greater than or equal to 20.
- A_Bad if team is equal to A and points is less than 20.
- B_Good if team is equal to B and points is greater than or equal to 20.
- B_Bad if none of the previous conditions are met.
The following example shows how to use this syntax in practice.
Related: How to Use case_when() in dplyr
Example: Use mutate() in dplyr with Multiple Conditions
Suppose we have the following data frame in R that contains information about various basketball players:
#create data frame df <- data.frame(team=c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'), points=c(22, 30, 34, 19, 14, 12, 39, 15, 22, 25)) #view data frame df team points 1 A 22 2 A 30 3 A 34 4 A 19 5 A 14 6 B 12 7 B 39 8 B 15 9 B 22 10 B 25
We can use the following syntax with the mutate() function to create a new column called class whose values are based on the values of the team and points columns:
library(dplyr) #add new column based on values in team and points columns df <- df%>% mutate(class = case_when((team == 'A' & points >= 20) ~ 'A_Good', (team == 'A' & points < 20) ~ 'A_Bad', (team == 'B' & points >= 20) ~ 'B_Good', TRUE ~ 'B_Bad')) #view updated data frame df team points class 1 A 22 A_Good 2 A 30 A_Good 3 A 34 A_Good 4 A 19 A_Bad 5 A 14 A_Bad 6 B 12 B_Bad 7 B 39 B_Good 8 B 15 B_Bad 9 B 22 B_Good 10 B 25 B_Good
The new class column takes on values based on the values in the team and points columns.
For example, the first row had a value of A in the team column and a points value greater than or equal to 20, so it received a value of A_Good in the new class column.
Note that in this example we used the & symbol as an “AND” operator to check if two conditions were both met before assigning a value in the class column.
However, we could have used the | symbol as an “OR” operator to instead check if either one of two conditions were met before assigning a value in the class column.
Additional Resources
The following tutorials explain how to perform other common tasks in dplyr:
dplyr: How to Mutate Variable if Column Contains String
dplyr: How to Change Factor Levels Using mutate()
dplyr: How to Use the across() Function