How to Calculate Percentiles in Python (With Examples)


The nth percentile of a dataset is the value that cuts off the first n percent of the data values when all of the values are sorted from least to greatest.

For example, the 90th percentile of a dataset is the value that cuts of the bottom 90% of the data values from the top 10% of data values.

We can quickly calculate percentiles in Python by using the numpy.percentile() function, which uses the following syntax:

numpy.percentile(a, q)

where:

  • a: Array of values
  • q: Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.

This tutorial explains how to use this function to calculate percentiles in Python.

How to Find Percentiles of an Array

The following code illustrates how to find various percentiles for a given array in Python:

import numpy as np

#make this example reproducible
np.random.seed(0)

#create array of 100 random integers distributed between 0 and 500
data = np.random.randint(0, 500, 100)

#find the 37th percentile of the array
np.percentile(data, 37)

173.26

#Find the quartiles (25th, 50th, and 75th percentiles) of the array
np.percentile(data, [25, 50, 75])

array([116.5, 243.5, 371.5])

How to Find Percentiles of a DataFrame Column

The following code shows how to find the 95th percentile value for a single pandas DataFrame column:

import numpy as np 
import pandas as pd

#create DataFrame
df = pd.DataFrame({'var1': [25, 12, 15, 14, 19, 23, 25, 29, 33, 35],
                   'var2': [5, 7, 7, 9, 12, 9, 9, 4, 14, 15],
                   'var3': [11, 8, 10, 6, 6, 5, 9, 12, 13, 16]})

#find 90th percentile of var1 column
np.percentile(df.var1, 95)

34.1

How to Find Percentiles of Several DataFrame Columns

The following code shows how to find the 95th percentile value for a several columns in a pandas DataFrame:

import numpy as np 
import pandas as pd

#create DataFrame
df = pd.DataFrame({'var1': [25, 12, 15, 14, 19, 23, 25, 29, 33, 35],
                   'var2': [5, 7, 7, 9, 12, 9, 9, 4, 14, 15],
                   'var3': [11, 8, 10, 6, 6, 5, 9, 12, 13, 16]})

#find 95th percentile of each column
df.quantile(.95)

var1    34.10
var2    14.55
var3    14.65

#find 95th percentile of just columns var1 and var2
df[['var1', 'var2']].quantile(.95)

var1    34.10
var2    14.55

Note that we were able to use the pandas quantile() function in the examples above to calculate percentiles.

Related: How to Calculate Percentiles in R (With Examples)

Leave a Reply

Your email address will not be published.