How to Calculate a Rolling Standard Deviation in Pandas


Often you may want to calculate a rolling standard deviation for a specific column of a pandas DataFrame.

A rolling standard deviation is simply the standard deviation of a certain number of previous periods in a given column.

The easiest way to calculate a rolling standard deviation in pandas is by using the Rolling.std() function, which uses the following basic syntax:

Rolling.std(ddof=1, numeric_only=False, engine=None, engine_kwargs=None)

where:

  • ddof: Delta degrees of freedom
  • numeric_only: Whether to include only float, int and boolean columns
  • engine: The specific engine to use for performing calculations
  • engine_kwargs: An optional list of keyword arguments to use with the engine

The following example shows how to use this syntax in practice to calculate a rolling standard deviation in a pandas DataFrame.

Example: How to Calculate Rolling Standard Deviation in Pandas

Suppose we create the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'C', 'C', 'C'],
                   'points': [12, 15, 29, 22, 30, 41, 12],
                   'assists': [8, 10, 11, 11, 7, 14, 18]})

#view DataFrame
print(df)

  team  points  assists
0    A      12        8
1    A      15       10
2    B      29       11
3    B      22       11
4    C      30        7
5    C      41       14
6    C      12       18

Suppose that we would like to calculate a rolling standard deviation for the values in the points column.

We can use the following syntax with the rolling.std() function to do so:

#calculate rolling standard deviation using most 3 most recent values
df['points'].rolling(3).std()

0          NaN
1          NaN
2     9.073772
3     7.000000
4     4.358899
5     9.539392
6    14.640128
Name: points, dtype: float64

The output displays the rolling standard deviation for the 3 most recent values int the points column.

In practice, we typically add a new column to the DataFrame to hold these rolling standard deviation of values:

#calculate rolling standard deviation of values in points column
df['points_rolling_std'] = df['points'].rolling(3).std()

#view updated DataFrame
print(df)

  team  points  assists  points_rolling_std
0    A      12        8                 NaN
1    A      15       10                 NaN
2    B      29       11            9.073772
3    B      22       11            7.000000
4    C      30        7            4.358899
5    C      41       14            9.539392
6    C      12       18           14.640128

The new column named points_rolling_std contains the rolling standard deviation of the 3 most recent values in the points column.

For example, we can see:

  • The standard deviation of the first 3 points values is 9.073.
  • The standard deviation of the next 3 points values is 7.
  • The standard deviation of the next 3 points values is 4.358.

And so on.

Note that the first two values in the points_rolling_std column contain NaN values because there aren’t at least 3 values available to calculate the rolling standard deviation.

To use a different number of recent values to calculate a rolling standard deviation, simply change the value in the rolling() function.

For example, we can use the following syntax to calculate a rolling standard deviation of values in the points column using the 4 most recent values:

#calculate rolling standard deviation of values in points column
df['points_rolling'] = df['points'].rolling(4).std()

#view updated DataFrame
print(df)

  team  points  assists  points_rolling_std
0    A      12        8                 NaN
1    A      15       10                 NaN
2    B      29       11                 NaN
3    B      22       11            7.593857
4    C      30        7            6.976150
5    C      41       14            7.852813
6    C      12       18           12.284814

The new column named points_rolling_std contains the rolling standard deviation of the 4 most recent values in the points column.

Note that the first three values in the column are all NaN because we don’t have at least 3 recent values to use to calculate the rolling standard deviation.

Note: You can find the complete documentation for the rolling.std() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Calculate Rolling Correlation in Pandas
How to Calculate Rolling Median in Pandas
How to Calculate a Rolling Maximum in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *