Bootstrapping is a method that can be used to construct a confidence interval for a statistic when the sample size is small and the underlying distribution is unknown.
The basic process for bootstrapping is as follows:
- Take k repeated samples with replacement from a given dataset.
- For each sample, calculate the statistic you’re interested in.
- This results in k different estimates for a given statistic, which you can then use to calculate a confidence interval for the statistic.
The following step-by-step example shows how to perform bootstrapping in Excel.
Step 1: Enter the Original Data
First, we’ll enter the values for some dataset:
Step 2: Generate Bootstrap Samples
Next, we’ll use the following formula to generate bootstrap samples:
=INDEX($A$2:$A$16, RANDBETWEEN(1, ROWS($A$2:$A$16)),1)
We can type this formula into cell D2 to randomly select one value from the original dataset.
We can then drag this formula to the right for 10 cells to generate our first bootstrapped sample.
We can then drag this formula down 300 rows to create 300 bootstrapped samples:
Note: Bootstrapping uses sampling with replacement, which means that one value from the original dataset may appear multiple times in any given sample.
Step 3: Calculate Statistic of Interest for Each Sample
Next, we can calculate the statistic of interest for each sample.
For example, we could calculate the mean, median, standard deviation, interquartile range, etc. for every sample.
For this particular example, we’ll calculate the median value for each sample:
We can see:
- The first bootstrapped sample has a median value of 14.
- The second bootstrapped sample has a median value of 16.
- The third bootstrapped sample has a median value of 13.5.
And so on.
Step 4: Calculate Bootstrapped Confidence Interval
Lastly, we can calculate a 95% bootstrapped confidence interval for the median by finding the value located at percentile 2.5% and percentile 97.5% in column N.
We can use the following formulas to do so:
=PERCENTILE(N2:N301, 0.025) =PERCENTILE(N2:N301, 0.975)
The following screenshot shows how to use these formulas in practice:
From the output we can see that the 95% bootstrapped confidence interval for the median value of the original dataset is [10.475, 19.7625].
Note that in this example we chose to generate 300 bootstrapped samples each with a sample size of n=10, but you can generate as many bootstrapped samples as you’d like.
When using statistical software, it’s common to generate thousands of bootstrapped samples which can then be used to construct a confidence interval.
The following tutorials explain how to perform other common tasks in Excel: