Principal components analysis (PCA) is an unsupervised machine learning technique that seeks to find principal components that explain a large portion of the variation in a dataset.
To visualize the results of PCA for a given dataset we can create a biplot, which is a plot that displays every observation in a dataset on a plane that is formed by the first two principal components.
We can use the following basic syntax in R to create a biplot:
#perform PCA results <- princomp(df) #create biplot to visualize results of PCA biplot(results)
The following example shows how to use this syntax in practice.
Example: How to Create a Biplot in R
For this example we’ll use the built-in R dataset called USArrests:
#view first six rows of USArrests dataset head(USArrests) Murder Assault UrbanPop Rape Alabama 13.2 236 58 21.2 Alaska 10.0 263 48 44.5 Arizona 8.1 294 80 31.0 Arkansas 8.8 190 50 19.5 California 9.0 276 91 40.6 Colorado 7.9 204 78 38.7
We can use the following code to perform PCA and visualize the results in a biplot:
#perform PCA results <- princomp(USArrests) #visualize results of PCA in biplot biplot(results)
The x-axis displays the first principal component, the y-axis displays the second principal component, and the individual observations from the dataset are shown inside the plot along with the four variables shown in red.
Note that there are several arguments we can use within the biplot function to modify the appearance of the plot.
For example, we can use the following code to modify the colors, font size, axis limits, plot title, axis titles, and size of the arrows in the plot:
#create biplot with custom appearance biplot(results, col=c('blue', 'red'), cex=c(1, 1.3), xlim=c(-.4, .4), main='PCA Results', xlab='First Component', ylab='Second Component', expand=1.2)
This biplot is a bit easier to read than the previous one.
You can find a full list of arguments that you can use to modify the appearance of the biplot here.
The following tutorials provide additional information about principal components analysis:
A Quick Introduction to Supervised vs. Unsupervised Learning
Principal Components Analysis in R: Step-by-Step Example