Anscombe’s quartet is a collection of four datasets that all have identical descriptive statistics, yet have wildly different distributions when shown on a graph.
The four datasets are shown below:
This tutorial provides a step-by-step example of how to create Ansombe’s Quartet in Excel.
Step 1: Create the Data
First, we can enter the raw data values for each dataset into Excel:
Step 2: Visualize the Data
Next, we can create scatterplots to visualize each dataset.
To create a scatterplot for the first dataset, highlight cells B3:C13 as shown below:
Next, click the Insert tab along the top ribbon. Then click Insert Scatter (X, Y) or Bubble Chart within the Charts group:
Once you click the first option in this group, the following scatterplot will appear:
Repeat this process for each dataset.
Step 3: Calculate the Descriptive Statistics
We can also calculate the mean, variance, and correlation of each dataset to show that they’re all equal.
The following formulas show how to calculate these metrics for the x-values in the first dataset:
Copy and paste these formulas to calculate the metrics for each dataset:
Notice that the metrics are all the exact same for each dataset.
The Importance of Anscombe’s Quartet
The whole point of Anscombe’s quartet is to show the importance of data visualization.
If we only look at the descriptive statistics for each dataset, it will appear that each dataset is identical. However, when we actually create graphs we can see that the datasets are quite different.
Descriptive statistics can summarize data numerically, but graphs can help us visualize data. This is why it’s important to create graphs and plots before analyzing a dataset and drawing conclusions.