How to Scale Only Numeric Columns in R (With Example)


You can use the following syntax from the dplyr package to scale only the numeric columns in a data frame in R:

library(dplyr)

df %>% mutate(across(where(is.numeric), scale))

The following example shows how to use this function in practice.

Example: Scale Only Numeric Columns Using dplyr

Suppose we have the following data frame in R that contains information about various basketball players:

#create data frame
df <- data.frame(team=c('A', 'B', 'C', 'D', 'E'),
                 points=c(22, 34, 30, 12, 18),
                 assists=c(7, 9, 9, 12, 14),
                 rebounds=c(5, 10, 10, 8, 8))

#view data frame
df

  team points assists rebounds
1    A     22       7        5
2    B     34       9       10
3    C     30       9       10
4    D     12      12        8
5    E     18      14        8

Suppose we would like to use the scale function in R to scale only the numeric columns in the data frame.

We can use the following syntax to do so:

library(dplyr)

#scale only the numeric columns in the data frame
df %>% mutate(across(where(is.numeric), scale))

  team     points   assists    rebounds
1    A -0.1348400 -1.153200 -1.56144012
2    B  1.2135598 -0.432450  0.87831007
3    C  0.7640932 -0.432450  0.87831007
4    D -1.2585064  0.648675 -0.09759001
5    E -0.5843065  1.369425 -0.09759001

Notice that the values in the three numeric columns (points, assists, and rebounds) have been scaled while the team column has remain unchanged.

Technical Notes

The scale() function in R uses the following basic syntax:

scale(x, center = TRUE, scale = TRUE)

where:

  • x: Name of the object to scale
  • center: Whether to subtract the mean when scaling. Default is TRUE.
  • scale: Whether to divide by the standard deviation when scaling. Default is TRUE.

This function uses the following formula to calculate scaled values:

xscaled = (xoriginal – x̄) / s

where:

  • xoriginal: The original x-value
  • : The sample mean
  • s: The sample standard deviation

This is also known as standardizing data, which simply converts each original value into a z-score.

Additional Resources

The following tutorials explain how to perform other common tasks using dplyr:

How to Select Columns by Name Using dplyr
How to Select Columns by Index Using dplyr
How to Use select_if with Multiple Conditions in dplyr

Leave a Reply

Your email address will not be published.