How to Calculate Deciles in Python (With Examples)


In statistics, deciles are numbers that split a dataset into ten groups of equal frequency.

The first decile is the point where 10% of all data values lie below it. The second decile is the point where 20% of all data values lie below it, and so on.

We can use the following syntax to calculate the deciles for a dataset in Python:

import numpy as np

np.percentile(var, np.arange(0, 100, 10))

The following example shows how to use this function in practice.

Example: Calculate Deciles in Python

The following code shows how to create a fake dataset with 20 values and then calculate the values for the deciles of the dataset:

import numpy as np

#create data
data = np.array([56, 58, 64, 67, 68, 73, 78, 83, 84, 88,
                 89, 90, 91, 92, 93, 93, 94, 95, 97, 99])

#calculate deciles of data
np.percentile(data, np.arange(0, 100, 10))

array([56. , 63.4, 67.8, 76.5, 83.6, 88.5, 90.4, 92.3, 93.2, 95.2])

The way to interpret the deciles is as follows:

  • 10% of all data values lie below 63.4
  • 20% of all data values lie below 67.8.
  • 30% of all data values lie below 76.5.
  • 40% of all data values lie below 83.6.
  • 50% of all data values lie below 88.5.
  • 60% of all data values lie below 90.4.
  • 70% of all data values lie below 92.3.
  • 80% of all data values lie below 93.2.
  • 90% of all data values lie below 95.2.

Note that the first value in the output (56) simply denotes the minimum value in the dataset.

Example: Place Values into Deciles in Python

To place each data value into a decile, we can use the qcut pandas function.

Here’s how to use this function for the dataset we created in the previous example:

import pandas as pd

#create data frame
df = pd.DataFrame({'values': [56, 58, 64, 67, 68, 73, 78, 83, 84, 88,
                              89, 90, 91, 92, 93, 93, 94, 95, 97, 99]})

#calculate decile of each value in data frame
df['Decile'] = pd.qcut(df['values'], 10, labels=False)

#display data frame
df

	values	Decile
0	56	0
1	58	0
2	64	1
3	67	1
4	68	2
5	73	2
6	78	3
7	83	3
8	84	4
9	88	4
10	89	5
11	90	5
12	91	6
13	92	6
14	93	7
15	93	7
16	94	8
17	95	8
18	97	9
19	99	9

The way to interpret the output is as follows:

  • The data value 56 falls between the percentile 0% and 10%, thus it falls in decile 0.
  • The data value 58 falls between the percentile 0% and 10%, thus it falls in decile 0.
  • The data value 64 falls between the percentile 10% and 20%, thus it falls in decile 1..
  • The data value 67 falls between the percentile 10% and 20%, thus it falls decile 1.
  • The data value 68 falls between the percentile 20% and 30%, thus it falls decile 2.

And so on.

Additional Resources

How to Calculate Percentiles in Python
How to Calculate The Interquartile Range in Python

Leave a Reply

Your email address will not be published.