How to Perform Bootstrapping in Python (With Example)


Bootstrapping is a method that can be used to construct a confidence interval for a statistic when the sample size is small and the underlying distribution is unknown.

The basic process for bootstrapping is as follows:

  • Take k repeated samples with replacement from a given dataset.
  • For each sample, calculate the statistic you’re interested in.
  • This results in k different estimates for a given statistic, which you can then use to calculate a confidence interval for the statistic.

The easiest way to perform bootstrapping in Python is to use the bootstrap function from the SciPy library.

The following example shows how to use this function in practice.

Example: Perform Bootstrapping in Python

Suppose we create a dataset in Python that contains 15 values:

#define array of data values
data = [7, 9, 10, 10, 12, 14, 15, 16, 16, 17, 19, 20, 21, 21, 23]

We can use the following code to calculate a 95% bootstrapped confidence interval for the median value:

from scipy.stats import bootstrap
import numpy as np

#convert array to sequence
data = (data,)

#calculate 95% bootstrapped confidence interval for median
bootstrap_ci = bootstrap(data, np.median, confidence_level=0.95,
                         random_state=1, method='percentile')

#view 95% boostrapped confidence interval
print(bootstrap_ci.confidence_interval)

ConfidenceInterval(low=10.0, high=20.0)

The 95% bootstrapped confidence interval for the median turns out to be [10.0, 20.0].

Here’s what the boostrap() function actually did under the hood:

  • The bootstrap() function generated 9,999 samples with replacement. (The default is 9,999 but you can use the n_resamples argument to change this number)
  • For each bootstrapped sample, the median was calculated.
  • The median value of each sample was arranged from smallest to largest and the median value at percentile 2.5% and percentile 97.5% were used to construct the lower and upper limits of the 95% confidence interval.

Note that you can calculate a bootstrapped confidence interval for virtually any statistic.

For example, we can change np.median to np.std within the bootstrap() function to instead calculate a 95% confidence interval for the standard deviation:

from scipy.stats import bootstrap
import numpy as np

#convert array to sequence
data = (data,)

#calculate 95% bootstrapped confidence interval for median
bootstrap_ci = bootstrap(data, np.std, confidence_level=0.95,
                         random_state=1, method='percentile')

#view 95% boostrapped confidence interval
print(bootstrap_ci.confidence_interval)

ConfidenceInterval(low=3.3199732261303283, high=5.66478399066117)

The 95% bootstrapped confidence interval for the standard deviation turns out to be [3.32, 5.67].

Note: For these examples we chose to create 95% confidence intervals, but you can change the value in the confidence_level argument to construct a confidence interval of a different size.

Additional Resources

The following tutorials explain how to perform bootstrapping in other statistical software:

How to Perform Bootstrapping in R
How to Perform Bootstrapping in Excel

Leave a Reply

Your email address will not be published.