What is a Modified Z-Score? (Definition & Example)


In statistics, a z-score tells us how many standard deviations away a value is from the mean. We use the following formula to calculate a z-score:

Z-Score = (xi – μ) / σ

where:

  • xi: A single data value
  • μ: The mean of the dataset
  • σ: The standard deviation of the dataset

Z-scores are often used to detect outliers in a dataset. For example, observations with a z-score less than -3 or greater than 3 are often deemed to be outliers.

However, z-scores can be affected by unusually large or small data values, which is why a more robust way to detect outliers is to use a modified z-score, which is calculated as:

Modified z-score = 0.6745(xi – x̃) / MAD

where:

  • xi: A single data value
  • x̃: The median of the dataset
  • MAD: The median absolute deviation of the dataset

A modified z-score is more robust because it uses the median to calculate z-scores as opposed to the mean, which is known to be influenced by outliers.

Iglewicz and Hoaglin recommend that values with modified z-scores less than -3.5 or greater than 3.5 be labeled as potential outliers.

The following step-by-step example shows how to calculate modified z-scores for a given dataset.

Step 1: Create the Data

Suppose we have the following dataset with 16 values:

Step 2: Find the Median

Next, we will find the median. This represents the middle point in the dataset, which turns out to be 16.

Step 3: Find the Absolute Difference Between Each Value & the Median

Next, we will find the absolute difference between each individual data value and the median. For example, the absolute difference between the first data value and the median is calculated as:

Absolute Difference = |6 – 16| = 10

We can use the same formula to calculate the absolute difference between each individual data value and the median:

Step 4: Find the Median Absolute Deviation

Next, we’ll find the median absolute deviation. This is the median of the second column, which turns out to be 8.

Step 5: Find the Modified Z-Score for Each Data Value

Lastly, we can calculate the modified z-score for each data value using the following formula:

Modified z-score = 0.6745(xi – x̃) / MAD

For example, the modified z-score for the first data value is calculated as:

Modified z-score = 0.6745*(6-16) / 8 = -0.843

We can repeat this formula for every value in the dataset:

We can see that no value in the dataset has a modified z-score less than -3.5 or greater than 3.5, thus we wouldn’t label any value in this dataset as a potential outlier.

How to Handle Outliers

If an outlier is present in your dataset, you have a few options:

  • Make sure the outlier is not the result of a data entry error. Sometimes an individual simply enters the wrong data value when recording data. If an outlier is present, first verify that the value was entered correctly and that it wasn’t an error.
  • Assign a new value to the outlier. If the outlier turns out to be a result of a data entry error, you may decide to assign a new value to it such as the mean or the median of the dataset.
  • Remove the outlier. If the value is a true outlier, you may choose to remove it if it will have a significant impact on your overall analysis. Just make sure to mention in your final report or analysis that you removed an outlier.

Leave a Reply

Your email address will not be published.