How to Use the floor() Function in Pandas


Often you may want to convert each value in the index of a pandas DataFrame to a floored value.

The most efficient way to do so is by using the floor() function, which is designed to perform this exact task.

The floor() function uses the following syntax:

pandas.Series.dt.floor(freq, …)

where:

  • freq: The frequency to use to floor values

Here are some of the most common arguments you may provide to the freq argument:

  • ms: milliseconds
  • s: seconds
  • min: minutes
  • h: hours
  • D: days
  • W: weeks
  • ME: month end

You can find a complete list of frequency aliases here.

The following example shows how to use the floor() function in practice with a pandas DataFrame.

Example: How to Use the floor() Function in Pandas

Suppose we create the following pandas DataFrame that contains information about the total sales made by an employee at a company during 10 consecutive days:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'sales': [2, 5, 5, 4, 7, 8, 9, 12, 10, 14],
                   'time': pd.date_range('1/1/2018 11:59:00', periods=10, freq='30min')})

#view DataFrame
print(df)

   sales                time
0      2 2018-01-01 11:59:00
1      5 2018-01-01 12:29:00
2      5 2018-01-01 12:59:00
3      4 2018-01-01 13:29:00
4      7 2018-01-01 13:59:00
5      8 2018-01-01 14:29:00
6      9 2018-01-01 14:59:00
7     12 2018-01-01 15:29:00
8     10 2018-01-01 15:59:00
9     14 2018-01-01 16:29:00

Currently the time column of the DataFrame is represented by intervals at 30 minutes apart.

However, suppose that we would like to floor each of the values in the time column so that each time is rounded down to the nearest hour.

We can use the floor() function with the following syntax to do so:

#floor the values of the 'time' column to nearest hour
df['time'].dt.floor('h')

0   2018-01-01 11:00:00
1   2018-01-01 12:00:00
2   2018-01-01 12:00:00
3   2018-01-01 13:00:00
4   2018-01-01 13:00:00
5   2018-01-01 14:00:00
6   2018-01-01 14:00:00
7   2018-01-01 15:00:00
8   2018-01-01 15:00:00
9   2018-01-01 16:00:00
Name: time, dtype: datetime64[ns]

Notice that each of the values in the time column is now floored down to the nearest hour.

By specify ‘h’ as the frequency argument of the floor( ) function, we specified that we wanted to use hours as the frequency.

We could also create a new column that contains these floored values.

For example, we could use the following syntax to create a new column named time_floor:

#create new column to floor the values of the 'time' column to nearest hour
df['time_floor'] = df['time'].dt.floor('h')

#view updated DataFrame
print(df)

   sales                time          time_floor
0      2 2018-01-01 11:59:00 2018-01-01 11:00:00
1      5 2018-01-01 12:29:00 2018-01-01 12:00:00
2      5 2018-01-01 12:59:00 2018-01-01 12:00:00
3      4 2018-01-01 13:29:00 2018-01-01 13:00:00
4      7 2018-01-01 13:59:00 2018-01-01 13:00:00
5      8 2018-01-01 14:29:00 2018-01-01 14:00:00
6      9 2018-01-01 14:59:00 2018-01-01 14:00:00
7     12 2018-01-01 15:29:00 2018-01-01 15:00:00
8     10 2018-01-01 15:59:00 2018-01-01 15:00:00
9     14 2018-01-01 16:29:00 2018-01-01 16:00:00

The new column named time_floor contains the values from the time column floored down to the nearest hour.

Also note that we could use a larger frequency interval to floor the values to if we’d like.

For example, we could use the alias ‘D’ in the freq argument to specify that we’d like the time values floored to the nearest day.

The following syntax shows how to do so:

#create new column to floor the values of the 'time' column to nearest day
df['time_floor'] = df['time'].dt.floor('D')

#view updated DataFrame
print(df)


   sales                time time_floor
0      2 2018-01-01 11:59:00 2018-01-01
1      5 2018-01-01 12:29:00 2018-01-01
2      5 2018-01-01 12:59:00 2018-01-01
3      4 2018-01-01 13:29:00 2018-01-01
4      7 2018-01-01 13:59:00 2018-01-01
5      8 2018-01-01 14:29:00 2018-01-01
6      9 2018-01-01 14:59:00 2018-01-01
7     12 2018-01-01 15:29:00 2018-01-01
8     10 2018-01-01 15:59:00 2018-01-01
9     14 2018-01-01 16:29:00 2018-01-01

The new time_floor column contains the values from the time column floored to the nearest day.

In this particular example, each of the time values are floored to 2018-01-01 since each interval occurs during this day.

Note: You can find the complete documentation for the floor() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Use the Rolling.apply() Function in Pandas
How to Use the nunique() Function in Pandas
How to Use the get_loc() Function in Pandas
How to Use idxmin() Function in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *