Implementation of Rand index and Adjusted Rand index in Python

This code snippet demonstrates the use of the rand_score and adjusted_rand_score functions from the sklearn.metrics module in Python’s scikit-learn library.

We have taken example cluster labels. The parameter labels_true represents the true cluster assignments, while labels_pred represents the predicted cluster assignments produced by some clustering algorithm.

Python3

from sklearn.metrics import rand_score, adjusted_rand_score # Example labels_true and labels_pred labels_true = [0, 0, 1, 1, 1, 1] labels_pred = [0, 0, 1, 1, 2, 2] sklearn_rand_score = rand_score(labels_true, labels_pred) # Calculate Rand Score sklearn_adjusted_rand_score = adjusted_rand_score(labels_true, labels_pred) # Calculate Adjusted Rand Score print("Rand Score (sklearn):", sklearn_rand_score) print("Adjusted Rand Score (sklearn):", sklearn_adjusted_rand_score)

Output:

Rand Score (sklearn): 0.7333333333333333 Adjusted Rand Score (sklearn): 0.4444444444444444

  • Rand Score of 0.733 indicates a relatively high level of agreement between the clusters produced by the algorithm and some ground truth (if available).
  • An Adjusted Rand Score of 0.444 suggests a moderate level of agreement between the clusterings, considering chance agreement.

These scores indicate that the clustering algorithm has produced clusters that are somewhat similar to the ground truth (or some reference clustering) but there is still room for improvement, especially when considering chance agreement.

Rand-Index in Machine Learning

Cluster analysis, also known as clustering, is a method used in unsupervised learning to group similar objects or data points into clusters. It’s a fundamental technique in data mining, machine learning, pattern recognition, and exploratory data analysis.

To assess the quality of the clustering results, evaluation metrics are used. These metrics measure the coherence within clusters and the separation between clusters. Common evaluation metrics include the Rand Index, Adjusted Rand Index, Silhouette Score, Davies-Bouldin Index, and others.

In this article we’ll explore how rank index and adjusted rand index works in terms of cluster analysis.

Table of Content

  • What is Rand Index in Machine Learning?
  • Adjusted Rand Index in Machine Learning
  • Applications of Rand Index in Machine Learning
  • Implementation of Rand index and Adjusted Rand index in Python
  • Limitations of Rand Index
  • When to use: Rand Index vs Adjusted Rand Index

Similar Reads

What is Rand Index in Machine Learning?

Rand-Index is a metric to evaluate the quality of a clustering technique. Clustering is an unsupervised machine learning technique which is used to group the similar type of data into a single cluster so rand-index tells us how well a cluster is build. Basically It compares how pairs of data points are grouped together in the predicted cluster versus the true cluster. The Rand Index provides a single score that indicates the proportion of agreements between the two clusters....

Adjusted Rand Index in Machine Learning

The Adjusted Rand Index (ARI) is a variation of the Rand Index (RI) that adjusts for chance when evaluating the similarity between two clusterings of data. It’s a measure used in clustering analysis to assess how well the clusters produced by different methods or algorithms agree with each other or with a reference clustering (ground truth)....

Applications of Rand Index in Machine Learning

The Rand Index (RI) and its adjusted version (ARI) are widely used in machine learning for evaluating clustering algorithms and assessing the quality of clustering results. Here are some applications of the Rand Index in machine learning:...

Implementation of Rand index and Adjusted Rand index in Python

This code snippet demonstrates the use of the rand_score and adjusted_rand_score functions from the sklearn.metrics module in Python’s scikit-learn library....

Limitations of Rand Index

While the Rand Index (RI) and its adjusted version (ARI) are widely used metrics for evaluating clustering algorithms, they do have some limitations:...

When to use: Rand Index vs Adjusted Rand Index

Deciding whether to use the Rand Index (RI) or the Adjusted Rand Index (ARI) depends on the specific characteristics of clustering evaluation task and the presence of a ground truth clustering....