How to Use the cumprod() Function in Pandas


Often you may want to calculate a cumulative product of values in a pandas Series or column of a pandas DataFrame.

A cumulative product is simply the running product of a list of numbers. It’s useful to calculate in a wide variety of settings where knowing the product of several numbers in a row is necessary.

The easiest way to calculate this value in pandas is by using the cumprod() function, which performs this exact task.

The cumprod() function uses the following syntax:

pandas.DataFrame.cumprod(axis=None, skipna=True, …)

where:

  • axis: The index or the name of the axis
  • skipna: Whether to exclude null values (default is True)

Note: If you’re using this function with a pandas Series then the axis parameter is not used and defaults to 0.

The following example shows how to use the cumprod() function in practice with a pandas DataFrame.

Example: How to Use the cumprod() Function in Pandas

Suppose we create the following pandas DataFrame that contains information about total sales made by an employee at a company during 10 consecutive sales periods:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'period': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'sales': [4, 5, 5, 3, np.nan, 6, 12, 14, 9, 2]})

#view DataFrame
print(df)

   period  sales
0       1    4.0
1       2    5.0
2       3    5.0
3       4    3.0
4       5    NaN
5       6    6.0
6       7   12.0
7       8   14.0
8       9    9.0
9      10    2.0

Suppose that we would like to calculate the cumulative product of values in the sales column.

We can use the cumprod() function with the following syntax to do so:

#calculate cumulative product of sales
df['sales'].cumprod()

0          4.0
1         20.0
2        100.0
3        300.0
4          NaN
5       1800.0
6      21600.0
7     302400.0
8    2721600.0
9    5443200.0
Name: sales, dtype: float64

This returns the cumulative product of all values in the sales column of the DataFrame.

Here is how to interpret the output:

  • The cumulative product of the first value is 4.
  • The cumulative product of the first two values is 4*5=20.
  • The cumulative product of the first three values is 4*5*5=100.
  • The cumulative product of the first four values is 4*5*5*3=300.

And so on.

A couple notes about the values in the output:

  • The cumulative product of the first value will always be equal to the value itself.
  • The cumulative product is shown as NaN for any missing values in the column. 

Note that the default behavior of the cumprod() function is to simply skip null values in the column.

However, suppose we instead specified skipna=False as follows:

#calculate cumulative product of sales, don't skip null values
df['sales'].cumprod(skipna=False)

0      4.0
1     20.0
2    100.0
3    300.0
4      NaN
5      NaN
6      NaN
7      NaN
8      NaN
9      NaN
Name: sales, dtype: float64

Notice that when the first null value is encountered in this example every value after it in the column is simply set to null as well.

Depending on the type of data that you’re working with, you may or may not specify skipna=False if you don’t want to continue calculating a cumulative product after encountering a missing value in the column.

Note: You can find the complete documentation for the cumprod() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Calculate Cumulative Sum by Group in Pandas
How to Calculate a Reversed Cumulative Sum in Pandas
How to Use the nunique() Function in Pandas
How to Use idxmin() Function in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *