You can use the following syntax to group data by hour and perform some aggregation in pandas:
df.groupby([df['time'].dt.hour]).sales.sum()
This particular example groups the values by hour in a column called time and then calculates the sum of values in the sales column for each hour.
The following example shows how to use this syntax in practice.
Example: Group Data by Hour in Pandas
Suppose we have the following pandas DataFrame that shows the number of sales made at various times throughout the day for some store:
import pandas as pd #create DataFrame df = pd.DataFrame({'time': ['2022-01-01 01:14:00', '2022-01-01 01:24:15', '2022-01-01 02:52:19', '2022-01-01 02:54:00', '2022-01-01 04:05:10', '2022-01-01 05:35:09'], 'sales': [18, 20, 15, 14, 10, 9]}) #convert date column to datetime df['time'] = pd.to_datetime(df['time']) #view DataFrame print(df) time sales 0 2022-01-01 01:14:00 18 1 2022-01-01 01:24:15 20 2 2022-01-01 02:52:19 15 3 2022-01-01 02:54:00 14 4 2022-01-01 04:05:10 10 5 2022-01-01 05:35:09 9
We can use the following syntax to group the time column by hours and calculate the sum of sales for each hour:
#group by hours in time column and calculate sum of sales
df.groupby([df['time'].dt.hour]).sales.sum()
time
1 38
2 29
4 10
5 9
Name: sales, dtype: int64
From the output we can see:
- A total of 38 sales were made during the first hour.
- A total of 29 sales were made during the second hour.
- A total of 10sales were made during the fourth hour.
- A total of 9 sales were made during the fifth hour.
Note that we can also perform some other aggregation.
For example, we could calculate the mean number of sales per hour:
#group by hours in time column and calculate mean of sales
df.groupby([df['time'].dt.hour]).sales.mean()
time
1 19.0
2 14.5
4 10.0
5 9.0
Name: sales, dtype: float64
We can also group by hours and minutes if we’d like.
For example, the following code shows how to calculate the sum of sales, grouped by hours and minutes:
#group by hours and minutes in time column and calculate mean of sales
df.groupby([df['time'].dt.hour, df['time'].dt.minute]).sales.mean()
time time
1 14 18
24 20
2 52 15
54 14
4 5 10
5 35 9
Name: sales, dtype: int64
From the output we can see:
- The mean number of sales during 1:14 was 18.
- The mean number of sales during 1:23 was 20.
- The mean number of sales during 2:52 was 15.
And so on.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Create a Date Range in Pandas
How to Extract Month from Date in Pandas
How to Convert Timestamp to Datetime in Pandas