Pandas: How to Calculate Cumulative Sum by Group


You can use the following syntax to calculate a cumulative sum by group in pandas:

df['cumsum_col'] = df.groupby(['col1'])['col2'].cumsum()

This particular formula calculates the cumulative sum of col2, grouped by col1, and displays the results in a new column titled cumsum_col.

The following example shows how to use this syntax in practice.

Example: Calculate Cumulative Sum by Group in Pandas

Suppose we have the following pandas DataFrame that contains information about sales for various stores:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'store': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   'sales': [4, 7, 10, 5, 8, 9, 12, 15, 10, 8]})

#view DataFrame
print(df)

  store  sales
0     A      4
1     A      7
2     A     10
3     A      5
4     A      8
5     B      9
6     B     12
7     B     15
8     B     10
9     B      8

We can use the following syntax to calculate the cumulative sum of sales for each store:

#add column that shows cumulative sum of sales by store
df['cumsum_sales'] = df.groupby(['store'])['sales'].cumsum()

#view updated DataFrame
print(df)

  store  sales  cumsum_sales
0     A      4             4
1     A      7            11
2     A     10            21
3     A      5            26
4     A      8            34
5     B      9             9
6     B     12            21
7     B     15            36
8     B     10            46
9     B      8            54

The cumsum_sales column shows the cumulative sales, grouped by each store.

Note: You can find the complete documentation for the cumsum function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Sum Specific Columns in Pandas
How to Sum Columns Based on a Condition in Pandas
How to Calculate a Reversed Cumulative Sum in Pandas

Leave a Reply

Your email address will not be published.