How to Calculate a Rolling Mean in Pandas


A rolling mean is simply the mean of a certain number of previous periods in a time series.

To calculate the rolling mean for one or more columns in a pandas DataFrame, we can use the following syntax:

df['column_name'].rolling(rolling_window).mean()

This tutorial provides several examples of how to use this function in practice.

Example: Calculate the Rolling Mean in Pandas

Suppose we have the following pandas DataFrame:

import numpy as np
import pandas as pd

#make this example reproducible
np.random.seed(0)

#create dataset
period = np.arange(1, 101, 1)
leads = np.random.uniform(1, 20, 100)
sales = 60 + 2*period + np.random.normal(loc=0, scale=.5*period, size=100)
df = pd.DataFrame({'period': period, 'leads': leads, 'sales': sales})

#view first 10 rows
df.head(10)

   period	    leads	    sales
0	1	11.427457	61.417425
1	2	14.588598	64.900826
2	3	12.452504	66.698494
3	4	11.352780	64.927513
4	5	9.049441	73.720630
5	6	13.271988	77.687668
6	7	9.314157	78.125728
7	8	17.943687	75.280301
8	9	19.309592	73.181613
9	10	8.285389	85.272259

We can use the following syntax to create a new column that contains the rolling mean of ‘sales’ for the previous 5 periods:

#find rolling mean of previous 5 sales periods
df['rolling_sales_5'] = df['sales'].rolling(5).mean()

#view first 10 rows
df.head(10)

	period	    leads	    sales	rolling_sales_5
0	1	11.427457	61.417425	NaN
1	2	14.588598	64.900826	NaN
2	3	12.452504	66.698494	NaN
3	4	11.352780	64.927513	NaN
4	5	9.049441	73.720630	66.332978
5	6	13.271988	77.687668	69.587026
6	7	9.314157	78.125728	72.232007
7	8	17.943687	75.280301	73.948368
8	9	19.309592	73.181613	75.599188
9	10	8.285389	85.272259	77.909514

We can manually verify that the rolling mean sales displayed for period 5 is the mean of the previous 5 periods:

Rolling mean at period 5: (61.417+64.900+66.698+64.927+73.720)/5 = 66.33

We can use similar syntax to calculate the rolling mean of multiple columns:

#find rolling mean of previous 5 leads periods 
df['rolling_leads_5'] = df['leads'].rolling(5).mean() 

#find rolling mean of previous 5 leads periods
df['rolling_sales_5'] = df['sales'].rolling(5).mean()

#view first 10 rows
df.head(10)

	period	    leads	    sales	rolling_sales_5	 rolling_leads_5
0	1	11.427457	61.417425	NaN	NaN
1	2	14.588598	64.900826	NaN	NaN
2	3	12.452504	66.698494	NaN	NaN
3	4	11.352780	64.927513	NaN	NaN
4	5	9.049441	73.720630	66.332978	 11.774156
5	6	13.271988	77.687668	69.587026	 12.143062
6	7	9.314157	78.125728	72.232007	 11.088174
7	8	17.943687	75.280301	73.948368	 12.186411
8	9	19.309592	73.181613	75.599188	 13.777773
9	10	8.285389	85.272259	77.909514	 13.624963

We can also create a quick line plot using Matplotlib to visualize the raw sales compared to the rolling mean of sales:

import matplotlib.pyplot as plt
plt.plot(df['rolling_sales_5'], label='Rolling Mean')
plt.plot(df['sales'], label='Raw Data')
plt.legend()
plt.ylabel('Sales')
plt.xlabel('Period')
plt.show()

Plot rolling mean in pandas in Python

The blue line displays the 5-period rolling mean of sales and the orange line displays the raw sales data.

Additional Resources

How to Calculate Rolling Correlation in Pandas
How to Calculate the Mean of Columns in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *