How to Plot Predicted Values in R (With Examples)


Often you may want to plot the predicted values of a regression model in R in order to visualize the differences between the predicted values and the actual values.

This tutorial provides examples of how to create this type of plot in base R and ggplot2.

Example 1: Plot of Predicted vs. Actual Values in Base R

The following code shows how to fit a multiple linear regression model in R and then create a plot of predicted vs. actual values:

#create data
df <- data.frame(x1=c(3, 4, 4, 5, 5, 6, 7, 8, 11, 12),
                 x2=c(6, 6, 7, 7, 8, 9, 11, 13, 14, 14),
                 y=c(22, 24, 24, 25, 25, 27, 29, 31, 32, 36))

#fit multiple linear regression model
model <- lm(y ~ x1 + x2, data=df)

#plot predicted vs. actual values
plot(x=predict(model), y=df$y,
     xlab='Predicted Values',
     ylab='Actual Values',
     main='Predicted vs. Actual Values')

#add diagonal line for estimated regression line
abline(a=0, b=1)

Plot of predicted values in R

The x-axis displays the predicted values from the model and the y-axis displays the actual values from the dataset. The diagonal line in the middle of the plot is the estimated regression line.

Since each of the data points lies fairly close to the estimated regression line, this tells us that the regression model does a pretty good job of fitting the data.

We can also create a data frame that shows the actual and predicted values for each data point:

#create data frame of actual and predicted values
values <- data.frame(actual=df$y, predicted=predict(model))

#view data frame
values

   actual predicted
1      22  22.54878
2      24  23.56707
3      24  23.96341
4      25  24.98171
5      25  25.37805
6      27  26.79268
7      29  28.60366
8      31  30.41463
9      32  33.86585
10     36  34.88415

Example 2: Plot of Predicted vs. Actual Values in ggplot2

The following code shows how to create a plot of predicted vs. actual values using the ggplot2 data visualization package:

library(ggplot2) 

#create data
df <- data.frame(x1=c(3, 4, 4, 5, 5, 6, 7, 8, 11, 12),
                 x2=c(6, 6, 7, 7, 8, 9, 11, 13, 14, 14),
                 y=c(22, 24, 24, 25, 25, 27, 29, 31, 32, 36))

#fit multiple linear regression model
model <- lm(y ~ x1 + x2, data=df)

#plot predicted vs. actual values
ggplot(df, aes(x=predict(model), y=y)) + 
  geom_point() +
  geom_abline(intercept=0, slope=1) +
  labs(x='Predicted Values', y='Actual Values', title='Predicted vs. Actual Values')

Once again, the x-axis displays the predicted values from the model and the y-axis displays the actual values from the dataset.

Additional Resources

How to Create a Residual Plot in R
How to Create a Histogram of Residuals in R
How to Calculate Standardized Residuals in R

Leave a Reply

Your email address will not be published.