# What is the Rand Index? (Definition & Examples)

The Rand index is a way to compare the similarity of results between two different clustering methods.

Often denoted R, the Rand Index is calculated as:

R = (a+b) / (nC2)

where:

• a: The number of times a pair of elements belongs to the same cluster across two clustering methods.
• b: The number of times a pair of elements belong to difference clusters across two clustering methods.
• nC2: The number of unordered pairs in a set of n elements.

The Rand index always takes on a value between 0 and 1 where:

• 0: Indicates that two clustering methods do not agree on the clustering of any pair of elements.
• 1: Indicates that two clustering methods perfectly agree on the clustering of every pair of elements.

The following example illustrates how to calculate the Rand index between two clustering methods for a simple dataset.

### Example: How to Calculate the Rand Index

Suppose we have the following dataset of five elements:

• Dataset: {A, B, C, D, E}

And suppose we use two clustering methods that place each element in the following clusters:

• Method 1 Clusters: {1, 1, 1, 2, 2}
• Method 2 Clusters: {1, 1, 2, 2, 3}

To calculate the Rand index between these clustering methods, we need to first write out every possible unordered pair in the dataset of five elements:

• Unordered pairs: {A, B}, {A, C}, {A, D}, {A, E}, {B, C}, {B, D}, {B, E}, {C, D}, {C, E}, {D, E}

There are 10 unordered pairs.

Next, we need to calculate a, which represents the number of unordered pairs that belong to the same cluster across both clustering methods:

• {A, B}

In this case, a = 1.

Next, we need to calculate b, which represents the number of unordered pairs that belong to different clusters across both clustering methods:

• {A, D}, {A, E}, {B, D}, {B, E}, {C, E}

In this case, b = 5.

Lastly, we can calculate the Rand index as:

• R = (a+b) / (nC2)
• R = (1+5) / 10
• R = 6/10

The Rand index is 0.6.

### How to Calculate the Rand Index in R

We can use the rand.index() function from the fossil package to calculate the Rand index between two clustering methods in R:

```library(fossil)

#define clusters
method1 <- c(1, 1, 1, 2, 2)
method2 <- c(1, 1, 2, 2, 3)

#calculate Rand index between clustering methods
rand.index(method1, method2)

[1] 0.6
```

The Rand index is 0.6. This matches the value that we calculated by hand.

### How to Calculate the Rand Index in Python

We can define the following function in Python to calculate the Rand index between two clusters:

```import numpy as np
from scipy.special import comb

#define Rand index function
def rand_index(actual, pred):

tp_plus_fp = comb(np.bincount(actual), 2).sum()
tp_plus_fn = comb(np.bincount(pred), 2).sum()
A = np.c_[(actual, pred)]
tp = sum(comb(np.bincount(A[A[:, 0] == i, 1]), 2).sum()
for i in set(actual))
fp = tp_plus_fp - tp
fn = tp_plus_fn - tp
tn = comb(len(A), 2) - tp - fp - fn
return (tp + tn) / (tp + fp + fn + tn)

#calculate Rand index
rand_index([1, 1, 1, 2, 2], [1, 1, 2, 2, 3])

0.6```

The Rand index turns out to be 0.6. This matches the value calculated in the previous examples.