How to Calculate Standard Deviation Using dplyr (With Examples)


You can use the following methods to calculate the standard deviation of values in a data frame in dplyr:

Method 1: Calculate Standard Deviation of One Variable

library(dplyr)

df %>%
  summarise(sd_var1 = sd(var1, na.rm=TRUE))

Method 2: Calculate Standard Deviation of Multiple Variables

library(dplyr)

df %>%
  summarise(sd_var1 = sd(var1, na.rm=TRUE),
            sd_var2 = sd(var2, na.rm=TRUE))

Method 3: Calculate Standard Deviation of Multiple Variables, Grouped by Another Variable

library(dplyr)

df %>%
  group_by(var3) %>%
  summarise(sd_var1 = sd(var1, na.rm=TRUE),
            sd_var2 = sd(var2, na.rm=TRUE))

This tutorial explains how to use each method in practice with the following data frame in R:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(12, 15, 18, 22, 14, 17, 29, 35),
                 assists=c(4, 4, 3, 6, 7, 8, 3, 10))

#view data frame
df

  team points assists
1    A     12       4
2    A     15       4
3    A     18       3
4    A     22       6
5    B     14       7
6    B     17       8
7    B     29       3
8    B     35      10

Example 1: Calculate Standard Deviation of One Variable

The following code shows how to calculate the standard deviation of the points variable:

library(dplyr)

#calculate standard deviation of points variable
df %>%
  summarise(sd_points = sd(points, na.rm=TRUE))

  sd_points
1  7.995534

From the output we can see that the standard deviation of values for the points variable is 7.995534.

Example 2: Calculate Standard Deviation of Multiple Variables

The following code shows how to calculate the standard deviation of the points and the assists variables:

library(dplyr)

#calculate standard deviation of points and assists variables
df %>%
  summarise(sd_points = sd(points, na.rm=TRUE),
            sd_assists = sd(assists, na.rm=TRUE))

  sd_points sd_assists
1  7.995534   2.559994

The output displays the standard deviation for both the points and assists variables.

Example 3: Calculate Standard Deviation of Multiple Variables, Grouped by Another Variable

The following code shows how to calculate the standard deviation of the points and the assists variables:

library(dplyr)

#calculate standard deviation of points and assists variables
df %>%
  group_by(team) %>%
  summarise(sd_points = sd(points, na.rm=TRUE),
            sd_assists = sd(assists, na.rm=TRUE))

# A tibble: 2 x 3
  team  sd_points sd_assists
             
1 A          4.27       1.26
2 B          9.91       2.94

The output displays the standard deviation for both the points and assists variables for team A and team B.

Note: You can include a list of several variables in the group_by() function if you would like to group by multiple variables.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Filter for Unique Values Using dplyr
How to Filter by Multiple Conditions Using dplyr
How to Count Number of Occurrences in Columns in R

Leave a Reply

Your email address will not be published. Required fields are marked *