How to Calculate Standard Deviation of Columns in R


You can use the following basic syntax to calculate the standard deviation of columns in R:

#calculate standard deviation of one column
sd(df$col1)

#calculate standard deviation of all columns
sapply(df, sd)

#calculate standard deviation of specific columns
sapply(df[c('col1', 'col2', 'col5')], sd)

The following examples show how to use this syntax in practice with the following data frame:

#create data frame
df <- data.frame(team=c('A', 'B', 'C', 'D', 'E'),
                 points=c(99, 91, 86, 88, 95),
                 assists=c(33, 28, 31, 39, 34),
                 rebounds=c(30, 28, 24, 24, 28))

#view data frame
df

  team points assists rebounds
1    A     99      33       30
2    B     91      28       28
3    C     86      31       24
4    D     88      39       24
5    E     95      34       28

Example 1: Standard Deviation of One Column

The following code shows how to calculate the standard deviation of one column in the data frame:

#calculate standard deviation of 'points' column
sd(df$points)

[1] 5.263079

The standard deviation of values in the ‘points’ column is 5.263079.

Example 2: Standard Deviation of All Columns

The following code shows how to calculate the standard deviation of every column in the data frame:

#calculate standard deviation of all columns in data frame
sapply(df, sd)

    team   points  assists rebounds 
      NA 5.263079 4.062019 2.683282 
Warning message:
In var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
  NAs introduced by coercion

Since the ‘team’ column is a character variable, R returns NA and gives us a warning.

However, it successfully computes the standard deviation of the other three numeric columns.

Example 3: Standard Deviation of Specific Columns

The following code shows how to calculate the standard deviation of specific columns in the data frame:

#calculate standard deviation of 'points' and 'rebounds' columns
sapply(df[c('points', 'rebounds')], sd)

  points rebounds 
5.263079 2.683282 

Note that we could use column index values to select columns as well:

#calculate standard deviation of 'points' and 'rebounds' columns
sapply(df[c(2, 4)], sd)

  points rebounds 
5.263079 2.683282 

Additional Resources

The following tutorials explain how to perform other common functions on columns in R:

How to Calculate the Mean of Multiple Columns in R
How to Find the Max Value Across Multiple Columns in R
How to Select Specific Columns in R

Leave a Reply

Your email address will not be published. Required fields are marked *