dplyr: How to Mutate Variable if Column Contains String


You can use the following basic syntax in dplyr to mutate a variable if a column contains a particular string:

library(dplyr)
df %>% mutate_at(vars(contains('starter')), ~ (scale(.) %>% as.vector))

This particular syntax applies the scale() function to each variable in the data frame that contains the string ‘starter’ in the column name.

The following example shows how to use this syntax in practice.

Example: Mutate Variable if Column Contains String

Suppose we have the following data frame in R:

#create data frame
df <- data.frame(team=c('A', 'B', 'C', 'D', 'E', 'F'),
                 starter_points=c(22, 26, 25, 13, 15, 22),
                 starter_assists=c(4, 5, 10, 14, 12, 10),
                 bench_points=c(7, 7, 9, 14, 13, 10),
                 bench_assists=c(2, 5, 5, 4, 9, 14))

#view data frame
df

  team starter_points starter_assists bench_points bench_assists
1    A             22               4            7             2
2    B             26               5            7             5
3    C             25              10            9             5
4    D             13              14           14             4
5    E             15              12           13             9
6    F             22              10           10            14

We can use the following syntax to apply the scale() function to each variable in the data frame that contains the string ‘starter’ in the column name.

library(dplyr)

#apply scale() function to each variable that contains 'starter' in the name
df %>% mutate_at(vars(contains('starter')), ~ (scale(.) %>% as.vector))

  team starter_points starter_assists bench_points bench_assists
1    A      0.2819668      -1.3180158            7             2
2    B      1.0338784      -1.0629159            7             5
3    C      0.8459005       0.2125832            9             5
4    D     -1.4098342       1.2329825           14             4
5    E     -1.0338784       0.7227828           13             9
6    F      0.2819668       0.2125832           10            14

Using this syntax, we were able to apply the scale() function to scale each column that contained ‘starter’ such that their values now have a mean of 0 and standard deviation of 1.

Notice that the following columns were modified:

  • starter_points
  • starter_assists

All other columns remained unchanged.

Also note we can apply any function we’d like using this syntax.

In the previous example, we chose to scale each column with the string ‘starter’ in the name.

However, we could do something simpler such as multiply the values by two for each column with ‘starter’ in the name:

library(dplyr)

#multiply values by two for each variable that contains 'starter' in the name
df %>% mutate_at(vars(contains('starter')), ~ (. * 2))

  team starter_points starter_assists bench_points bench_assists
1    A             44               8            7             2
2    B             52              10            7             5
3    C             50              20            9             5
4    D             26              28           14             4
5    E             30              24           13             9
6    F             44              20           10            14

Notice that the values in the starter_points and starter_assists columns have been multiplied by two, while all other columns have remained unchanged.

Additional Resources

The following tutorials explain how to perform other common tasks in dplyr:

How to Remove Rows Using dplyr
How to Select Columns by Index Using dplyr
How to Filter Rows that Contain a Certain String Using dplyr

Leave a Reply

Your email address will not be published.