How to Calculate a Rolling Sum in Pandas


Often you may want to calculate a rolling sum for a specific column of a pandas DataFrame.

A rolling sum is simply the sum of a certain number of previous periods in a given column.

The easiest way to calculate a rolling sum in pandas is by using the Rolling.sum() function, which uses the following basic syntax:

Rolling.sum(numeric_only=False, engine=None, engine_kwargs=None)

where:

  • numeric_only: Whether to include only float, int and boolean columns
  • engine: The specific engine to use for performing calculations
  • engine_kwargs: An optional list of keyword arguments to use with the engine

The following example shows how to use this syntax in practice to calculate a rolling sum in a pandas DataFrame.

Example: How to Calculate Rolling Sum in Pandas

Suppose we create the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'C', 'C', 'C'],
                   'points': [12, 15, 29, 22, 30, 41, 12],
                   'assists': [8, 10, 11, 11, 7, 14, 18]})

#view DataFrame
print(df)

  team  points  assists
0    A      12        8
1    A      15       10
2    B      29       11
3    B      22       11
4    C      30        7
5    C      41       14
6    C      12       18

Suppose that we would like to calculate a rolling sum for the values in the points column.

We can use the following syntax with the rolling.sum() function to do so:

#calculate rolling sum using most 3 most recent values
df['points'].rolling(3).sum()

0     NaN
1     NaN
2    56.0
3    66.0
4    81.0
5    93.0
6    83.0
Name: points, dtype: float64

The output displays the rolling sum for the 3 most recent values int the points column.

In practice, we typically add a new column to the DataFrame to hold these rolling sum values:

#calculate rolling sum of values in points column
df['points_rolling'] = df['points'].rolling(3).sum()

#view updated DataFrame
print(df)

  team  points  assists  points_rolling
0    A      12        8             NaN
1    A      15       10             NaN
2    B      29       11            56.0
3    B      22       11            66.0
4    C      30        7            81.0
5    C      41       14            93.0
6    C      12       18            83.0

The new column named points_rolling contains the rolling sum of the 3 most recent values in the points column.

For example, we can see:

  • The sum of the first 3 points values is 12 + 15 + 29 = 56.
  • The sum of the next 3 points values is 15 + 29 + 22 = 66.
  • The sum of the next 3 points values is 29 + 22 + 30 = 81.

And so on.

Note that the first two values in the points_rolling column contain NaN values because there aren’t at least 3 values available to calculate the rolling sum.

To use a different number of recent values to calculate a rolling sum, simply change the value in the rolling() function.

For example, we can use the following syntax to calculate a rolling sum of values in the points column using the 4 most recent values:

#calculate rolling sum of values in points column
df['points_rolling'] = df['points'].rolling(4).sum()

#view updated DataFrame
print(df)

  team  points  assists  points_rolling
0    A      12        8             NaN
1    A      15       10             NaN
2    B      29       11             NaN
3    B      22       11            78.0
4    C      30        7            96.0
5    C      41       14           122.0
6    C      12       18           105.0

The new column named points_rolling contains the rolling sum of the 4 most recent values in the points column.

Note that the first three values in the column are all NaN because we don’t have at least 3 recent values to use to calculate the rolling sum.

Note: You can find the complete documentation for the rolling.sum() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Calculate Rolling Correlation in Pandas
How to Calculate Rolling Median in Pandas
How to Calculate a Rolling Maximum in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *