How to Group Data by Hour in Pandas (With Example)


You can use the following syntax to group data by hour and perform some aggregation in pandas:

df.groupby([df['time'].dt.hour]).sales.sum()

This particular example groups the values by hour in a column called time and then calculates the sum of values in the sales column for each hour.

The following example shows how to use this syntax in practice.

Example: Group Data by Hour in Pandas

Suppose we have the following pandas DataFrame that shows the number of sales made at various times throughout the day for some store:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'time': ['2022-01-01 01:14:00', '2022-01-01 01:24:15',
                            '2022-01-01 02:52:19', '2022-01-01 02:54:00',
                            '2022-01-01 04:05:10', '2022-01-01 05:35:09'],
                   'sales': [18, 20, 15, 14, 10, 9]})

#convert date column to datetime
df['time'] = pd.to_datetime(df['time'])

#view DataFrame
print(df)

                 time  sales
0 2022-01-01 01:14:00     18
1 2022-01-01 01:24:15     20
2 2022-01-01 02:52:19     15
3 2022-01-01 02:54:00     14
4 2022-01-01 04:05:10     10
5 2022-01-01 05:35:09      9

We can use the following syntax to group the time column by hours and calculate the sum of sales for each hour:

#group by hours in time column and calculate sum of sales
df.groupby([df['time'].dt.hour]).sales.sum()

time
1    38
2    29
4    10
5     9
Name: sales, dtype: int64

From the output we can see:

  • A total of 38 sales were made during the first hour.
  • A total of 29 sales were made during the second hour.
  • A total of 10sales were made during the fourth hour.
  • A total of 9 sales were made during the fifth hour.

Note that we can also perform some other aggregation.

For example, we could calculate the mean number of sales per hour:

#group by hours in time column and calculate mean of sales
df.groupby([df['time'].dt.hour]).sales.mean()

time
1    19.0
2    14.5
4    10.0
5     9.0
Name: sales, dtype: float64

We can also group by hours and minutes if we’d like.

For example, the following code shows how to calculate the sum of sales, grouped by hours and minutes:

#group by hours and minutes in time column and calculate mean of sales
df.groupby([df['time'].dt.hour, df['time'].dt.minute]).sales.mean()

time  time
1     14      18
      24      20
2     52      15
      54      14
4     5       10
5     35       9
Name: sales, dtype: int64

From the output we can see:

  • The mean number of sales during 1:14 was 18.
  • The mean number of sales during 1:23 was 20.
  • The mean number of sales during 2:52 was 15.

And so on.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Create a Date Range in Pandas
How to Extract Month from Date in Pandas
How to Convert Timestamp to Datetime in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *