How to Use Rolling.quantile() Function in Pandas


Often you may want to calculate a rolling quantile for a specific column of a pandas DataFrame.

A rolling quantile is simply a specific quantile of a certain number of previous periods in a given column.

The easiest way to calculate a rolling quantile in pandas is by using the rolling.quantile() function, which uses the following basic syntax:

rolling.quantile(q, interpolation=’linear’, numeric_only=False)

where:

  • q: Quantile to calculate (between 0 and 1, inclusive)
  • interpolation: Interpolation method to use (default is linear)
  • numeric_only: Whether to include only int, float and boolean columns

The following example shows how to use this syntax in practice to calculate a rolling quantile in a pandas DataFrame.

Example: How to Calculate Rolling Quantiles in Pandas

Suppose we create the following pandas DataFrame that contains information about sales made by a particular employee at a company during 15 consecutive days:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
                   'sales': [4, 5, 7, 7, 6, 8, 10, 11, 15, 19, 13, 12, 15, 10, 11]})

#view DataFrame
print(df)

    day  sales
0     1      4
1     2      5
2     3      7
3     4      7
4     5      6
5     6      8
6     7     10
7     8     11
8     9     15
9    10     19
10   11     13
11   12     12
12   13     15
13   14     10
14   15     11

Suppose that we would like to calculate the rolling 90th percentile value for the sales column.

We can use the following syntax with the rolling.quantile() function to do so:

#calculate rolling 90th percentile using most 5 most recent values
df['sales'].rolling(5).quantile(0.9)

0      NaN
1      NaN
2      NaN
3      NaN
4      7.0
5      7.6
6      9.2
7     10.6
8     13.4
9     17.4
10    17.4
11    17.4
12    17.4
13    17.4
14    14.2

The output displays the rolling 90th percentile for the 5 most recent values int the sales column.

In practice, we typically add a new column to the DataFrame to hold these rolling percentile values:

#calculate rolling 90th percentile of values in sales column
df['sales_rolling90'] = df['sales'].rolling(5).quantile(0.9)

#view updated DataFrame
print(df)

    day  sales  sales_rolling90
0     1      4              NaN
1     2      5              NaN
2     3      7              NaN
3     4      7              NaN
4     5      6              7.0
5     6      8              7.6
6     7     10              9.2
7     8     11             10.6
8     9     15             13.4
9    10     19             17.4
10   11     13             17.4
11   12     12             17.4
12   13     15             17.4
13   14     10             17.4
14   15     11             14.2

The new column named saleS_rolling90 contains the rolling 90th percentile of the 5 most recent values in the sales column.

For example, we can see:

  • The 90th percentile of the first 5 sales values is 7.
  • The 90th percentile of the next 5 sales values is 7.6.
  • The 90th percentile of the next 5 sales values is 9.2.
  • The 90th percentile of the next 5 sales values is 10.6.

And so on.

Note that the first four values in the sales_rolling90 column contain NaN values because there aren’t at least 5 values available to calculate the rolling 90th percentile.

To use a different number of recent values to calculate a rolling 90th percentile, simply change the value in the rolling() function.

For example, we can use the following syntax to calculate a rolling 90th percentile of values in the sales column using the 7 most recent values:

#calculate rolling 90th percentile of values in sales column
df['sales_rolling90'] = df['sales'].rolling(7).quantile(0.9)

#view updated DataFrame
print(df)

    day  sales  sales_rolling90
0     1      4              NaN
1     2      5              NaN
2     3      7              NaN
3     4      7              NaN
4     5      6              NaN
5     6      8              NaN
6     7     10              8.8
7     8     11             10.4
8     9     15             12.6
9    10     19             16.6
10   11     13             16.6
11   12     12             16.6
12   13     15             16.6
13   14     10             16.6
14   15     11             16.6

The new column named sales_rolling90 contains the rolling 90th percentile of the 7 most recent values in the sales column.

Note that the first six values in the column are all NaN because we don’t have at least seven recent values to use to calculate the rolling 90th percentile.

Note: You can find the complete documentation for the rolling.quantile() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Calculate a Rolling Standard Deviation in Pandas
How to Calculate Rolling Correlation in Pandas
How to Calculate Rolling Median in Pandas
How to Calculate a Rolling Maximum in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *