How to Use the across() Function in dplyr (3 Examples)


You can use the across() function from the dplyr package in R to apply a transformation to multiple columns.

There are countless ways to use this function, but the following methods illustrate some common uses:

Method 1: Apply Function to Multiple Columns

#multiply values in col1 and col2 by 2
df %>% 
  mutate(across(c(col1, col2), function(x) x*2))

Method 2: Calculate One Summary Statistic for Multiple Columns

#calculate mean of col1 and col2
df %>%
  summarise(across(c(col1, col2), mean, na.rm=TRUE))

Method 3: Calculate Multiple Summary Statistics for Multiple Columns

#calculate mean and standard deviation for col1 and col2
df %>%
  summarise(across(c(col1, col2), list(mean=mean, sd=sd), na.rm=TRUE))

The following examples show how to each method with the following data frame:

#create data frame
df <- data.frame(conf=c('East', 'East', 'East', 'West', 'West', 'West'),
                 points=c(22, 25, 29, 13, 22, 30),
                 rebounds=c(12, 10, 6, 6, 8, 11))

#view data frame
df

  conf points rebounds
1 East     22       12
2 East     25       10
3 East     29        6
4 West     13        6
5 West     22        8
6 West     30       11

Example 1: Apply Function to Multiple Columns

The following code shows how to use the across() function to multiply the values in both the points and rebounds columns by 2:

library(dplyr)

#multiply values in points and rebounds columns by 2
df %>% 
  mutate(across(c(points, rebounds), function(x) x*2))

  conf points rebounds
1 East     44       24
2 East     50       20
3 East     58       12
4 West     26       12
5 West     44       16
6 West     60       22

Example 2: Calculate One Summary Statistic for Multiple Columns

The following code shows how to use the across() function to calculate the mean value for both the points and rebounds columns:

library(dplyr) 

#calculate mean value of points an rebounds columns
df %>%
  summarise(across(c(points, rebounds), mean, na.rm=TRUE))

  points rebounds
1   23.5 8.833333

Note that we can also use the is.numeric function to automatically calculate a summary statistic for all of the numeric columns in the data frame:

library(dplyr) 

#calculate mean value for every numeric column in data frame
df %>%
  summarise(across(where(is.numeric), mean, na.rm=TRUE))

  points rebounds
1   23.5 8.833333

Example 3: Calculate Multiple Summary Statistics for Multiple Columns

The following code shows how to use the across() function to calculate the mean and standard deviation of both the points and rebounds columns:

library(dplyr) 

#calculate mean and standard deviation for points and rebounds columns
df %>%
  summarise(across(c(points, rebounds), list(mean=mean, sd=sd), na.rm=TRUE))

  points_mean points_sd rebounds_mean rebounds_sd
1        23.5  6.156298      8.833333    2.562551

Note: You can find the complete documentation for the across() function here.

Additional Resources

The following tutorials explain how to perform other common functions using dplyr:

How to Remove Rows Using dplyr
How to Arrange Rows Using dplyr
How to Filter by Multiple Conditions Using dplyr

Leave a Reply

Your email address will not be published.