This tutorial provides a step-by-step example of how to label outliers in boxplots in ggplot2.

**Step 1: Create the Data Frame**

First, let’s create the following data frame that contains information on points scored by 60 different basketball players on three different teams:

**#make this example reproducible
set.seed(1)
#create data frame
df <- data.frame(team=rep(c('A', 'B', 'C'), each=20),
player=rep(LETTERS[1:20], times=3),
points=round(rnorm(n=60, mean=30, sd=10), 2))
#view head of data frame
head(df)
team player points
1 A A 23.74
2 A B 31.84
3 A C 21.64
4 A D 45.95
5 A E 33.30
6 A F 21.80
**

**Note**: We used the set.seed() function to ensure that this example is reproducible.

**Step 2: Define a Function to Identify Outliers**

In ggplot2, an observation is defined as an outlier if it meets one of the following two requirements:

- The observation is 1.5 times the interquartile range less than the first quartile (Q1)
- The observation is 1.5 times the interquartile range greater than the third quartile (Q3).

We can create the following function in R to label observations as outliers if they meet one of these two requirements:

find_outlier <- function(x) { return(x < quantile(x, .25) - 1.5*IQR(x) | x > quantile(x, .75) + 1.5*IQR(x)) }

**Related:** How to Interpret Interquartile Range

**Step 3: Label Outliers in Boxplots in ggplot2**

Next, we can use the following code to label outliers in boxplots in ggplot2:

library(ggplot2) library(dplyr) #add new column to data frame that indicates if each observation is an outlier df <- df %>% group_by(team) %>% mutate(outlier = ifelse(find_outlier(points), points, NA)) #create box plot of points by team and label outliers ggplot(df, aes(x=team, y=points)) + geom_boxplot() + geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)

Notice that two outliers are labeled in the plot.

The first outlier is a player on team A who scored **7.85** points and the other outlier is a player on team B who scored **10.11** points.

Note that we could also use a different variable to label these outliers.

For example, we could swap out **points** for **player** in the **mutate()** function to instead label the outliers based on the player name:

library(ggplot2) library(dplyr) #add new column to data frame that indicates if each observation is an outlier df <- df %>% group_by(team) %>% mutate(outlier = ifelse(find_outlier(points), player, NA)) #create box plot of points by team and label outliers ggplot(df, aes(x=team, y=points)) + geom_boxplot() + geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)

The outlier on team A now has a label of **N** and the outlier on team B now has a label of **D**, since these represent the player names who have outlier values for points.

**Note**: The hjust argument in **geom_text()** is used to push the label horizontally to the right so that it doesn’t overlap the dot in the plot.

**Additional Resources**

The following tutorials explain how to perform other common tasks in ggplot2:

How to Change Font Size in ggplot2

How to Remove a Legend in ggplot2

How to Rotate Axis Labels in ggplot2