The **scale()** function in R can be used to scale the values in a vector, matrix, or data frame.

This function uses the following basic syntax:

scale(x, center = TRUE, scale = TRUE)

where:

**x**: Name of the object to scale**center**: Whether to subtract the mean when scaling. Default is TRUE.**scale**: Whether to divide by the standard deviation when scaling. Default is TRUE.

This function uses the following formula to calculate scaled values:

**x _{scaled} = (x_{original} – x̄) / s**

where:

**x**: The original x-value_{original}**x̄**: The sample mean**s**: The sample standard deviation

This is also known as *standardizing* data, which simply converts each original value into a z-score.

The following examples show how to use this function in practice.

**Example 1: Scale the Values in a Vector**

Suppose we have the following vector of values in R:

#define vector of values x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) #view mean and standard deviation of values mean(x) [1] 5 sd(x) [1] 2.738613

The following code shows how to scale the values in the vector using the **scale()** function:

#scale the values of x x_scaled <- scale(x) #view scaled values x_scaled [,1] [1,] -1.4605935 [2,] -1.0954451 [3,] -0.7302967 [4,] -0.3651484 [5,] 0.0000000 [6,] 0.3651484 [7,] 0.7302967 [8,] 1.0954451 [9,] 1.4605935

Here is how each scaled value was calculated:

- Value 1: (1 – 5) / 2.738613 =
**-1.46** - Value 2: (2 – 5) / 2.738613 =
**-1.09** - Value 3: (3 – 5) / 2.738613 =
**-0.73**

And so on.

Note that if we specified **scale=FALSE** then the function would not have divided by the standard deviation when performing the scaling:

#scale the values of x but don't divide by standard deviation x_scaled <- scale(x, scale = FALSE) #view scaled values x_scaled [,1] [1,] -4 [2,] -3 [3,] -2 [4,] -1 [5,] 0 [6,] 1 [7,] 2 [8,] 3 [9,] 4

Here is how each scaled value was calculated:

- Value 1: 1 – 5 =
**-4** - Value 2: 2 – 5 =
**-3** - Value 3: 3 – 5 =
**-2**

And so on.

**Example 2: Scale the Column Values in a Data Frame**

More often than not, we use the scale() function when we want to scale the values in multiple columns of a data frame such that each column has a mean of 0 and a standard deviation of 1.

For example, suppose we have the following data frame in R:

#create data frame df <- data.frame(x=c(1, 2, 3, 4, 5, 6, 7, 8, 9), y=c(10, 20, 30, 40, 50, 60, 70, 80, 90)) #view data frame df x y 1 1 10 2 2 20 3 3 30 4 4 40 5 5 50 6 6 60 7 7 70 8 8 80 9 9 90

Notice that the range of values for the y variable is much larger than the range of values for the x variable.

We can use the **scale()** function to scale the values in both columns such that the scaled values of x and y both have a mean of 0 and a standard deviation of 1:

#scale values in each column of data frame df_scaled <- scale(df) #view scaled data frame df_scaled x y [1,] -1.4605935 -1.4605935 [2,] -1.0954451 -1.0954451 [3,] -0.7302967 -0.7302967 [4,] -0.3651484 -0.3651484 [5,] 0.0000000 0.0000000 [6,] 0.3651484 0.3651484 [7,] 0.7302967 0.7302967 [8,] 1.0954451 1.0954451 [9,] 1.4605935 1.4605935

Both the x column and the y column now have a mean of 0 and a standard deviation of 1.

**Additional Resources**

The following tutorials explain how to perform other common operations in R:

How to Normalize Data in R

How to Standardize Data in R

How to Average Across Columns in R