How to Calculate Matthews Correlation Coefficient in R


Matthews correlation coefficient (MCC) is a metric we can use to assess the performance of a classification model.

It is calculated as:

MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)

where:

  • TP: Number of true positives
  • TN: Number of true negatives
  • FP: Number of false positives
  • FN: Number of false negatives

This metric is particularly useful when the two classes are imbalanced – that is, one class appears much more than the other.

The value for MCC ranges from -1 to 1 where:

  • -1 indicates total disagreement between predicted classes and actual classes
  • 0 is synonymous with completely random guessing
  • 1 indicates total agreement between predicted classes and actual classes

For example, suppose a sports analyst uses a logistic regression model to predict whether or not 400 different college basketball players get drafted into the NBA.

The following confusion matrix summarizes the predictions made by the model:

To calculate the MCC of the model, we can use the following formula:

  • MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)
  • MCC = (15*375-5*5) / √(15+5)(15+5)(375+5)(375+5)
  • MCC = 0.7368

Matthews correlation coefficient turns out to be 0.7368.

This value is somewhat close to one, which indicates that the model does a decent job of predicting whether or not players will get drafted.

The following example shows how to calculate MCC for this exact scenario using the mcc() function from the mltools package in R.

Example: Calculating Matthews Correlation Coefficient in R

The following code shows how to define a vector of predicted classes and a vector of actual classes, then calculate Matthews correlation coefficient using the mcc() function from the mltools package:

library(mltools)

#define vector of actual classes
actual <- rep(c(1, 0), times=c(20, 380))

#define vector of predicted classes
preds <- rep(c(1, 0, 1, 0), times=c(15, 5, 5, 375))

#calculate Matthews correlation coefficient
mcc(preds, actual)

[1] 0.7368421

Matthews correlation coefficient is 0.7368.

This matches the value that we calculated earlier by hand.

If you’d like to calculate Matthews correlation coefficient for a confusion matrix, you can use the confusionM argument as follows:

library(mltools)

#create confusion matrix
conf_matrix <- matrix(c(15, 5, 5, 375), nrow=2)

#view confusion matrix
conf_matrix

     [,1] [,2]
[1,]   15    5
[2,]    5  375

#calculate Matthews correlation coefficient for confusion matrix
mcc(confusionM = conf_matrix)

[1] 0.7368421

Once again, Matthews correlation coefficient is 0.7368

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Perform Logistic Regression in R
How to Plot a ROC Curve Using ggplot2
How to Calculate F1 Score in R

Leave a Reply

Your email address will not be published.