How to Use the scale() Function in R (With Examples)


The scale() function in R can be used to scale the values in a vector, matrix, or data frame.

This function uses the following basic syntax:

scale(x, center = TRUE, scale = TRUE)

where:

  • x: Name of the object to scale
  • center: Whether to subtract the mean when scaling. Default is TRUE.
  • scale: Whether to divide by the standard deviation when scaling. Default is TRUE.

This function uses the following formula to calculate scaled values:

xscaled = (xoriginal – x̄) / s

where:

  • xoriginal: The original x-value
  • : The sample mean
  • s: The sample standard deviation

This is also known as standardizing data, which simply converts each original value into a z-score.

The following examples show how to use this function in practice.

Example 1: Scale the Values in a Vector

Suppose we have the following vector of values in R:

#define vector of values
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

#view mean and standard deviation of values
mean(x)

[1] 5

sd(x)

[1] 2.738613

The following code shows how to scale the values in the vector using the scale() function:

#scale the values of x
x_scaled <- scale(x)

#view scaled values
x_scaled

            [,1]
 [1,] -1.4605935
 [2,] -1.0954451
 [3,] -0.7302967
 [4,] -0.3651484
 [5,]  0.0000000
 [6,]  0.3651484
 [7,]  0.7302967
 [8,]  1.0954451
 [9,]  1.4605935

Here is how each scaled value was calculated:

  • Value 1: (1 – 5) / 2.738613 = -1.46
  • Value 2: (2 – 5) / 2.738613 = -1.09
  • Value 3: (3 – 5) / 2.738613 = -0.73

And so on.

Note that if we specified scale=FALSE then the function would not have divided by the standard deviation when performing the scaling:

#scale the values of x but don't divide by standard deviation
x_scaled <- scale(x, scale = FALSE)

#view scaled values
x_scaled

      [,1]
 [1,]   -4
 [2,]   -3
 [3,]   -2
 [4,]   -1
 [5,]    0
 [6,]    1
 [7,]    2
 [8,]    3
 [9,]    4

Here is how each scaled value was calculated:

  • Value 1: 1 – 5 = -4
  • Value 2: 2 – 5 = -3
  • Value 3: 3 – 5 = -2

And so on.

Example 2: Scale the Column Values in a Data Frame

More often than not, we use the scale() function when we want to scale the values in multiple columns of a data frame such that each column has a mean of 0 and a standard deviation of 1.

For example, suppose we have the following data frame in R:

#create data frame
df <- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9),
                 y=c(10, 20, 30, 40, 50, 60, 70, 80, 90))

#view data frame
df

  x  y
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
6 6 60
7 7 70
8 8 80
9 9 90

Notice that the range of values for the y variable is much larger than the range of values for the x variable.

We can use the scale() function to scale the values in both columns such that the scaled values of x and y both have a mean of 0 and a standard deviation of 1:

#scale values in each column of data frame
df_scaled <- scale(df)

#view scaled data frame
df_scaled

               x          y
 [1,] -1.4605935 -1.4605935
 [2,] -1.0954451 -1.0954451
 [3,] -0.7302967 -0.7302967
 [4,] -0.3651484 -0.3651484
 [5,]  0.0000000  0.0000000
 [6,]  0.3651484  0.3651484
 [7,]  0.7302967  0.7302967
 [8,]  1.0954451  1.0954451
 [9,]  1.4605935  1.4605935

Both the x column and the y column now have a mean of 0 and a standard deviation of 1.

Additional Resources

The following tutorials explain how to perform other common operations in R:

How to Normalize Data in R
How to Standardize Data in R
How to Average Across Columns in R

Leave a Reply

Your email address will not be published.