To resample time series data means to aggregate the data by a new time period.
If you’d like to resample a time series in pandas while using the groupby operator, you can use the following basic syntax:
grouper = df.groupby([pd.Grouper(freq='W'), 'store']) result = grouper['sales'].sum().unstack('store').fillna(0)
This particular example groups the rows in the DataFrame by the store column, then resamples the time series by week (freq=’W’), then calculates the sum of values in the sales column.
Note that we can resample the time series data by various time periods, including:
- S: Seconds
- min: Minutes
- H: Hours
- D: Day
- W: Week
- M: Month
- Q: Quarter
- A: Year
The following example shows how to resample time series data with a groupby operation in practice.
Example: Resample Time Series with groupby in Pandas
Suppose we have the following pandas DataFrame that shows the total sales made each day at two different stores:
import pandas as pd #create DataFrame df = pd.DataFrame({'sales': [13, 14, 17, 17, 16, 22, 28, 10, 17, 10, 11], 'store': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B']}, index=pd.date_range('2023-01-06', '2023-01-16', freq='d')) #view DataFrame print(df) sales store 2023-01-06 13 A 2023-01-07 14 A 2023-01-08 17 A 2023-01-09 17 A 2023-01-10 16 A 2023-01-11 22 B 2023-01-12 28 B 2023-01-13 10 B 2023-01-14 17 B 2023-01-15 10 B 2023-01-16 11 B
Suppose we would like to group the rows by store, then resamples the time series by week, then calculates the sum of values in the sales column.
We can use the following syntax to do so:
#group by store and resample time series by week grouper = df.groupby([pd.Grouper(freq='W'), 'store']) #calculate sum of sales each week by store result = grouper['sales'].sum().unstack('store').fillna(0) #view results print(result) store A B 2023-01-08 14.0 0.0 2023-01-15 16.5 17.0 2023-01-22 0.0 11.0
From the output we cans see:
- The sum of sales on the week ending 1/8/2023 at store A is 14.
- The sum of sales on the week ending 1/8/2023 at store B is 0.
And so on.
Note that in this example we chose to calculate the sum of values in the sales column.
However, we could choose some other metric such as the count, mean, median, min, max, etc.
Simply replace sum() in the code above with count(), mean(), median(), etc. to calculate whatever metric you’d like.
Additional Resources
The following tutorials explain how to perform other common operations in Python:
How to Plot a Time Series in Matplotlib
How to Plot a Time Series in Seaborn
How to Calculate MAPE of Time Series in Python