**Matthews correlation coefficient** (MCC) is a metric we can use to assess the performance of a classification model.

It is calculated as:

**MCC** = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)

where:

**TP**: Number of true positives**TN**: Number of true negatives**FP**: Number of false positives**FN**: Number of false negatives

This metric is particularly useful when the two classes are imbalanced – that is, one class appears much more than the other.

The value for MCC ranges from -1 to 1 where:

**-1**indicates total disagreement between predicted classes and actual classes**0**is synonymous with completely random guessing**1**indicates total agreement between predicted classes and actual classes

For example, suppose a sports analyst uses a logistic regression model to predict whether or not 400 different college basketball players get drafted into the NBA.

The following confusion matrix summarizes the predictions made by the model:

To calculate the MCC of the model, we can use the following formula:

**MCC**= (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)**MCC**= (15*375-5*5) / √(15+5)(15+5)(375+5)(375+5)**MCC**= 0.7368

Matthews correlation coefficient turns out to be **0.7368**.

This value is somewhat close to one, which indicates that the model does a decent job of predicting whether or not players will get drafted.

The following example shows how to calculate MCC for this exact scenario using the **mcc()** function from the **mltools** package in R.

**Example: Calculating Matthews Correlation Coefficient in R**

The following code shows how to define a vector of predicted classes and a vector of actual classes, then calculate Matthews correlation coefficient using the **mcc()** function from the **mltools** package:

library(mltools) #define vector of actual classes actual <- rep(c(1, 0), times=c(20, 380)) #define vector of predicted classes preds <- rep(c(1, 0, 1, 0), times=c(15, 5, 5, 375)) #calculate Matthews correlation coefficient mcc(preds, actual) [1] 0.7368421

Matthews correlation coefficient is **0.7368**.

This matches the value that we calculated earlier by hand.

If you’d like to calculate Matthews correlation coefficient for a confusion matrix, you can use the **confusionM** argument as follows:

library(mltools) #create confusion matrix conf_matrix <- matrix(c(15, 5, 5, 375), nrow=2) #view confusion matrix conf_matrix [,1] [,2] [1,] 15 5 [2,] 5 375 #calculate Matthews correlation coefficient for confusion matrix mcc(confusionM = conf_matrix) [1] 0.7368421

Once again, Matthews correlation coefficient is **0.7368**

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Perform Logistic Regression in R

How to Plot a ROC Curve Using ggplot2

How to Calculate F1 Score in R