How to Use the Rolling.apply() Function in Pandas


Often you may want to calculating some rolling value based on a custom function in a pandas DataFrame.

The easiest way to do so is by using the Rolling.apply() function, which uses the following syntax:

Rolling.apply(func, raw=False, …)

where:

  • func: A custom function to be used to return a single value
  • raw: Whether to pass each row or column as a Series to the function

This function is particularly useful when you want to perform some calculating on a rolling basis that is more complex than a simple aggregation such as a sum, mean, standard deviation, etc.

The following example shows how to use the Rolling.apply() function in practice with a pandas DataFrame.

Example: How to Use the Rolling.apply() Function in Pandas

Suppose we create the following pandas DataFrame that contains information about the total sales made by some employee at a company during 10 consecutive sales periods:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'period': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'sales': [30, 28, 28, 22, 30, 41, 12, 30, 40, 44]})

#view DataFrame
print(df)

   period  sales
0       1     30
1       2     28
2       3     28
3       4     22
4       5     30
5       6     41
6       7     12
7       8     30
8       9     40
9      10     44

Suppose that we would like to calculate a rolling range (max sales – min sales) for the sales based on a 4-period rolling basis.

We can use the following syntax to do so:

#calculate rolling range based on 4-period rolling basis
df['sales'].rolling(4).apply(lambda x: x.max() - x.min())

0     NaN
1     NaN
2     NaN
3     8.0
4     8.0
5    19.0
6    29.0
7    29.0
8    29.0
9    32.0
Name: sales, dtype: float64

The output displays the rolling range based on a 4-period rolling basis.

Note: The first three values in the output are NaN values because we didn’t have at least four periods to use for those calculations.

Here is how to interpret the output:

  • The range for sales periods 0 through 3 is calculated as 30 – 22 = 8.
  • The range for sales periods 1 through 4 is calculated as 30 – 22 = 8.
  • The range for sales periods 2 through 5 is calculated as 41- 22 = 19.
  • The range for sales periods 3 through 6 is calculated as 41- 12 = 29.

And so on.

Note that the value specified within the Rolling() function specifies the number of values to use in the rolling calculation.

We chose to use a value of 4 in this example but you can choose any value that you’d like based on the number of periods that makes sense for your particular DataFrame.

You can use any custom function that you’d like within the apply() function as well.

For example, you could use the following syntax to calculate the mean sales multiplied by 2 during a 4-period rolling basis:

#calculate rolling mean * 2 based on 4-period rolling basis
df['sales'].rolling(4).apply(lambda x: x.mean() * 2)

0     NaN
1     NaN
2     NaN
3    54.0
4    54.0
5    60.5
6    52.5
7    56.5
8    61.5
9    63.0
Name: sales, dtype: float64

The output displays the mean sales multiplied by 2 during a 4-period rolling basis.

Once again note that NaN values are shown for the first three values in the column because we didn’t have at least 4 periods available to perform the calculation for those rows.

Note: You can find the complete documentation for the Rolling.apply() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Use the mask() Function in Pandas
How to Use the nunique() Function in Pandas
How to Use the get_loc() Function in Pandas
How to Create a Tuple from Two Columns in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *