Box-Cox Transformation in Excel (Step-by-Step)

box-cox transformation is a commonly used method for transforming a non-normally distributed dataset into a more normally distributed one.

The basic idea is to find some value for λ such that the transformed data is as close to normally distributed as possible, using the following formula:

  • y(λ) = (yλ – 1) / λ  if y ≠ 0
  • y(λ) = log(y)  if y = 0

The following step-by-step example shows how to perform a box-cox transformation on a dataset in Excel.

Step 1: Enter the Data

First, let’s enter the values for a dataset:

Step 2: Sort the Data

Next, create an index column and a column of sorted data:

Step 3: Choose an Arbitrary Value for Lambda

Next, we’ll choose an arbitrary value of 1 for lambda and apply a temporary box-cox transformation to the data:

Step 4: Calculate the Z-Scores

Next, we’ll calculate the z-score for each value in the index:

We’ll then calculate the correlation between the box-cox transformed values and the z-scores:

Step 5: Find the Optimal Lambda Value

Next, we’ll use Goal Seek to find the optimal lambda value to use in the box-cox transformation.

To do so, click the Data tab along the top ribbon. Then click What-If-Analysis within the Forecast group.

In the dropdown menu, click Goal Seek and fill in the following values:

Box-cox transformation in Excel

Once you click OK, Goal Seek will automatically find the optimal lambda value to be -0.5225.

Step 6: Perform the Box-Cox Transformation

Lastly, we’ll apply the box-cox transformation to the original data, using a lambda value of -0.5225:

Bonus: We can confirm that the transformed data is normally distributed by performing a Jarque-Bera test in Excel.

Additional Resources

How to Transform Data in Excel (Log, Square Root, Cube Root)
How to Calculate Z-Scores in Excel

6 Replies to “Box-Cox Transformation in Excel (Step-by-Step)”

  1. Thanks for this informative article. The conditions in the mathematical statements introducing the Box-Cox transformation procedure need to be restated as follows;
    y(λ) = (yλ – 1) / λ if λ ≠ 0
    y(λ) = log(y) if λ = 0
    This fixes the problem of an undefined value for log(0).

  2. I believe there is a typo in the function definition.
    The conditions should be
    y(lambda) = (y^lambda-1)/lambda if LAMBDA != 0
    log(y) if LAMBDA = 0

Leave a Reply

Your email address will not be published. Required fields are marked *