**Matthews correlation coefficient** (MCC) is a metric we can use to assess the performance of a classification model.

It is calculated as:

**MCC** = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)

where:

**TP**: Number of true positives**TN**: Number of true negatives**FP**: Number of false positives**FN**: Number of false negatives

This metric is particularly useful when the two classes are imbalanced – that is, one class appears much more than the other.

The value for MCC ranges from -1 to 1 where:

**-1**indicates total disagreement between predicted classes and actual classes**0**is synonymous with completely random guessing**1**indicates total agreement between predicted classes and actual classes

For example, suppose a sports analyst uses a logistic regression model to predict whether or not 400 different college basketball players get drafted into the NBA.

The following confusion matrix summarizes the predictions made by the model:

To calculate the MCC of the model, we can use the following formula:

**MCC**= (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)**MCC**= (15*375-5*5) / √(15+5)(15+5)(375+5)(375+5)**MCC**= 0.7368

Matthews correlation coefficient turns out to be **0.7368**. This value is somewhat close to one, which indicates that the model does a decent job of predicting whether or not players will get drafted.

The following example shows how to calculate MCC for this exact scenario using the **matthews_corrcoef()** function from the **sklearn** library in Python.

**Example: Calculating Matthews Correlation Coefficient in Python**

The following code shows how to define an array of predicted classes and an array of actual classes, then calculate Matthews correlation coefficient of a model in Python:

import numpy as np from sklearn.metrics import matthews_corrcoef #define array of actual classes actual = np.repeat([1, 0], repeats=[20, 380]) #define array of predicted classes pred = np.repeat([1, 0, 1, 0], repeats=[15, 5, 5, 375]) #calculate Matthews correlation coefficient matthews_corrcoef(actual, pred) 0.7368421052631579

The MCC is **0.7368**. This matches the value that we calculated earlier by hand.

**Note**: You can find the complete documentation for the **matthews_corrcoef()** function here.

**Additional Resources**

The following tutorials explain how to calculate other common metrics for classification models in Python:

An Introduction to Logistic Regression in Python

How to Calculate F1 Score in Python

How to Calculate Balanced Accuracy in Python