How to Use qcut() in Pandas (With Examples)


Often you may want to cut the values in a pandas Series into a specific number of bins.

The easiest way to do so is by using the qcut() function, which uses the following syntax:

pandas.qcut(x, q, labels=None, …)

where:

  • x: Name of pandas Series
  • q: Number of quantiles (e.g. 10 for deciles)
  • labels: Labels for the resulting bins

Note that if you don’t specify labels to use for the resulting bins then the names of the bins will simply contain the minimum and maximum value of each bin.

The following example shows how to use the qcut() method in practice with a pandas DataFrame.

Example: How to Use qcut() Method in Pandas

Suppose we create the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
                   'points': [8, 12, 14, 14, 18, 15, 39, 24, 28]})

#view DataFrame
print(df)

  player  points
0      A       8
1      B      12
2      C      14
3      D      14
4      E      18
5      F      15
6      G      39
7      H      24
8      I      28

The DataFrame contains information about the names of various basketball players along with the number of points they scored.

Suppose that we would like to categorize each player into bins based on the number of points they scored.

We can use the qcut() method in pandas, which is designed to “cut” a pandas Series into numerical bins.

We can use the following syntax to categorize each player into one of four bins based on the values in the points column of the DataFrame:

#cut values in 'points' column into four groups
pd.qcut(df['points'], q=4)

0    (7.999, 14.0]
1    (7.999, 14.0]
2    (7.999, 14.0]
3    (7.999, 14.0]
4     (15.0, 24.0]
5     (14.0, 15.0]
6     (24.0, 39.0]
7     (15.0, 24.0]
8     (24.0, 39.0]
Name: points, dtype: category
Categories (4, interval[float64, right]):
[(7.999, 14.0] < (14.0, 15.0] < (15.0, 24.0] < (24.0, 39.0]]

The qcut() function places each value in the points column into one of four numerical ranges.

From the output we can see the ranges for the four groups:

  • (7.999, 14]
  • (14, 15]
  • (15, 24]
  • (24, 39]

One useful way to use the qcut() function is by assigning these results as values in a new column of the DataFrame.

We can use the following syntax to do so:

#cut values in 'points' column into four groups
df['points_group'] = pd.qcut(df['points'], q=4)

#view updated DataFrame
print(df)

  player  points   points_group
0      A       8  (7.999, 14.0]
1      B      12  (7.999, 14.0]
2      C      14  (7.999, 14.0]
3      D      14  (7.999, 14.0]
4      E      18   (15.0, 24.0]
5      F      15   (14.0, 15.0]
6      G      39   (24.0, 39.0]
7      H      24   (15.0, 24.0]
8      I      28   (24.0, 39.0]

Notice that this produces a new column in the DataFrame named points_group, which contains the ranges that result from the qcut() function.

Note that we could also use the labels argument to specify text labels that should be assigned to each resulting bin instead of bin ranges:

#cut values in 'points' column into four groups with labels
df['points_group'] = pd.qcut(df['points'], q=4, labels=['Bad', 'OK', 'Good', 'Great'])

#view updated DataFrame
print(df)

  player  points points_group
0      A       8          Bad
1      B      12          Bad
2      C      14          Bad
3      D      14          Bad
4      E      18         Good
5      F      15           OK
6      G      39        Great
7      H      24         Good
8      I      28        Great

The values in the new points_group column now display Bad, OK, Good or Great based on the points scored by each player.

Note: You can find the complete documentation for the qcut() method in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Select Only Numeric Columns in Pandas
How to Convert Categorical Variable to Numeric in Pandas
How to Extract Number from String in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *