A box plot is used to visualize the five number summary of a dataset, which includes:
- The minimum
- The first quartile
- The median
- The third quartile
- The maximum
This tutorial explains how to create and modify box plots in SPSS.
How to Create a Single Box Plot in SPSS
Suppose we have the following dataset that shows the average points scored per game by 16 basketball players on a certain team:
To create a box plot to visualize the distribution of these data values, we can click the Analyze tab, then Descriptive Statistics, then Explore:
This will bring up the following window:
To create a box plot, drag the variable points into the box labelled Dependent List. Then make sure Plots is selected under the option that says Display near the bottom of the box.
Once you click OK, the following box plot will appear:
Here’s how to interpret this box plot:
A Note on Outliers
The interquartile range (IQR) is the distance between the third quartile and the first quartile. SPSS considers any data value to be an outlier if it is 1.5 times the IQR larger than the third quartile or 1.5 times the IQR smaller than the first quartile.
Outliers are displayed as tiny circles in SPSS. In the previous example there were no outliers, which is why there were no tiny circles shown in the box plot. However, if our largest value in the dataset was actually 50 then the box plot would show a tiny circle to indicate the outlier:
If an outlier is present in your dataset, you have a few options:
- Make sure the outlier is not a data entry error. Sometimes data values are simply recorded incorrectly. If an outlier is present, first verify that the value was entered correctly and that it wasn’t an error.
- Assign a new value to the outlier. If the outlier turns out to be a result of a data entry error, you may decide to assign a new value to it such as the mean or the median of the dataset.
- Remove the outlier. If the value is a true outlier, you may choose to remove it if it will have a significant impact on your overall analysis. Just make sure to mention in your final report or analysis that you removed an outlier.
How to Create Multiple Box Plots in SPSS
If you have several variables, SPSS can also create multiple side-by-side box plots. For example, suppose we have the following data on average points scored by 16 players on three different teams:
To create a box plot for each of these variables, we can once again click on the Analyze tab, then Descriptive Statistics, then Explore. We can then drag all three variables into the box labelled Dependent List:
Once we click OK, the following box plots will appear:
This helps us easily visualize the differences in the distributions between these three teams.
We can also observe the following:
- The median points scored per game is highest for team B and lowest for team C
- The variation in the number of points scored per game is highest for team B, which can be seen by how long their box plot is compared to team A and team C.
- The player with the highest points per game is on team B and the player with the lowest points per game is on team C.
Box plots are useful because they can provide us with so much information about the distribution of datasets just from a single plot.