A **box-cox transformation** is a commonly used method for transforming a non-normally distributed dataset into a more normally distributed one.

The basic idea is to find some value for λ such that the transformed data is as close to normally distributed as possible, using the following formula:

- y(λ) = (y
^{λ}– 1) / λ if y ≠ 0 - y(λ) = log(y) if y = 0

The following step-by-step example shows how to perform a box-cox transformation on a dataset in Excel.

**Step 1: Enter the Data**

First, let’s enter the values for a dataset:

**Step 2: Sort the Data**

Next, create an index column and a column of sorted data:

**Step 3: Choose an Arbitrary Value for Lambda**

Next, we’ll choose an arbitrary value of 1 for lambda and apply a temporary box-cox transformation to the data:

**Step 4: Calculate the Z-Scores**

Next, we’ll calculate the z-score for each value in the index:

We’ll then calculate the correlation between the box-cox transformed values and the z-scores:

**Step 5: Find the Optimal Lambda Value**

Next, we’ll use Goal Seek to find the optimal lambda value to use in the box-cox transformation.

To do so, click the **Data** tab along the top ribbon. Then click **What-If-Analysis** within the **Forecast** group.

In the dropdown menu, click **Goal Seek** and fill in the following values:

Once you click **OK**, Goal Seek will automatically find the optimal lambda value to be **-0.5225**.

**Step 6: Perform the Box-Cox Transformation**

Lastly, we’ll apply the box-cox transformation to the original data, using a lambda value of -0.5225:

**Bonus:** We can confirm that the transformed data is normally distributed by performing a Jarque-Bera test in Excel.

**Additional Resources**

How to Transform Data in Excel (Log, Square Root, Cube Root)

How to Calculate Z-Scores in Excel

when calculating the z scores, why did you subtract by .5?

Wow. Thanks for the information and sample to work with

Thanks for this informative article. The conditions in the mathematical statements introducing the Box-Cox transformation procedure need to be restated as follows;

y(λ) = (yλ – 1) / λ if λ ≠ 0

y(λ) = log(y) if λ = 0

This fixes the problem of an undefined value for log(0).

I believe there is a typo in the function definition.

The conditions should be

y(lambda) = (y^lambda-1)/lambda if LAMBDA != 0

log(y) if LAMBDA = 0

I wonder if this is easier in Excel “ToolPack”? I am using Excel 2019.

Nice work.. I tried to use your information

Thank you