Implementation of Rand index and Adjusted Rand index in Python
This code snippet demonstrates the use of the rand_score
and adjusted_rand_score
functions from the sklearn.metrics
module in Python’s scikit-learn library.
We have taken example cluster labels. The parameter labels_true
represents the true cluster assignments, while labels_pred
represents the predicted cluster assignments produced by some clustering algorithm.
from sklearn.metrics import rand_score, adjusted_rand_score
# Example labels_true and labels_pred
labels_true = [0, 0, 1, 1, 1, 1]
labels_pred = [0, 0, 1, 1, 2, 2]
sklearn_rand_score = rand_score(labels_true, labels_pred) # Calculate Rand Score
sklearn_adjusted_rand_score = adjusted_rand_score(labels_true, labels_pred) # Calculate Adjusted Rand Score
print("Rand Score (sklearn):", sklearn_rand_score)
print("Adjusted Rand Score (sklearn):", sklearn_adjusted_rand_score)
Output:
Rand Score (sklearn): 0.7333333333333333 Adjusted Rand Score (sklearn): 0.4444444444444444
- Rand Score of 0.733 indicates a relatively high level of agreement between the clusters produced by the algorithm and some ground truth (if available).
- An Adjusted Rand Score of 0.444 suggests a moderate level of agreement between the clusterings, considering chance agreement.
These scores indicate that the clustering algorithm has produced clusters that are somewhat similar to the ground truth (or some reference clustering) but there is still room for improvement, especially when considering chance agreement.
Rand-Index in Machine Learning
Cluster analysis, also known as clustering, is a method used in unsupervised learning to group similar objects or data points into clusters. It’s a fundamental technique in data mining, machine learning, pattern recognition, and exploratory data analysis.
To assess the quality of the clustering results, evaluation metrics are used. These metrics measure the coherence within clusters and the separation between clusters. Common evaluation metrics include the Rand Index, Adjusted Rand Index, Silhouette Score, Davies-Bouldin Index, and others.
In this article we’ll explore how rank index and adjusted rand index works in terms of cluster analysis.
Table of Content
- What is Rand Index in Machine Learning?
- Adjusted Rand Index in Machine Learning
- Applications of Rand Index in Machine Learning
- Implementation of Rand index and Adjusted Rand index in Python
- Limitations of Rand Index
- When to use: Rand Index vs Adjusted Rand Index