How to Use stat_smooth() Function in R


You can use the stat_smooth() function in ggplot2 to “smooth” the results of a scatterplot and gain a better understanding of the general pattern of points in a plot.

This function is extremely versatile and can be used to summarize both linear and non-linear trends in a dataset with and without standard error bars.

The following example shows how to use the stat_smooth() function in practice in R.

Example: How to Use stat_smooth() in R

For this particular example we will use the built-in mtcars dataset in R, which contains various measurements on different cars.

We can use the head() function to view the first six rows of this dataset:

#view first six rows of mtcars dataset
head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Suppose that we would like to create a scatterplot to visualize the relationship between the mpg (miles per gallon) and wt (weight) of each vehicle in the data frame.

We can use the following syntax to do so:

library(ggplot2)

#generate scatterplot of mpg vs wt
ggplot(mtcars, aes(mpg, wt)) +
  geom_point()

This produces the following scatterplot:

The x-axis displays the mpg values while the y-axis displays the wt values.

Just from looking at the scatterplot we can see that there is a general trend of higher mpg values being associated with lower wt values.

To make this trend even easier to view, we can add the stat_smooth() argument.

We can use the following syntax to do so:

library(ggplot2)

#generate scatterplot of mpg vs wt and add stat_smooth
ggplot(mtcars, aes(mpg, wt)) +
  geom_point() +
  stat_smooth()

This produces the following result:

stat_smooth in R with loess line

The same scatterplot points still exist in the plot, but now a “smooth” line with standard error boundaries are also shown, which captures the general trend of the data.

It’s worth noting that the default smoothing method is loess, which allows flexibility to capture a trend without using a straight line.

However, you can specify method=’lm’ to instead force the smoothing method to be a linear trend.

It’s also worth noting that standard error boundaries are shown by default, but you can specify se=FALSE within the stat_smooth() function to hide these boundaries.

We can use the following syntax to add a smooth straight line with no standard error bars instead:

library(ggplot2)

#generate scatterplot of mpg vs wt and add stat_smooth
ggplot(mtcars, aes(mpg, wt)) +
  geom_point() +
  stat_smooth(method='lm', se=FALSE)

This produces the following result:

stat_smooth function in R with no standard error bars

Notice that the stat_smooth() function produces a straight line this time with no standard error bars.

Note that this line also represents the “line of best fit” if we were to perform simple linear regression using these two variables.

Note: You can find the complete documentation for the stat_smooth() function in ggplot2 here.

Additional Resources

The following tutorials explain how to perform other common tasks in ggplot2:

How to Use scale_y_continuous in ggplot2
How to Rotate Axis Labels in ggplot2
How to Change Legend Labels in ggplot2
How to Use the ggarrange Function in R

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *