Implementing K-Modes Clustering with Scikit-Learn
Scikit-Learn, a popular machine learning library in Python, provides a robust implementation of the K-Modes algorithm through the kmodes
package. Let’s walk through the steps to implement K-Modes clustering and reveal cluster features.
Step 1: Install Required Libraries
First, we ensure to have the necessary libraries installed. We can install the kmodes
package using pip:
pip install kmodes
Step 2: Import Libraries and Load Data
Next, import the required libraries and load your categorical dataset. For this example, we’ll use a sample dataset.
import pandas as pd
from kmodes.kmodes import KModes
data = {
'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red', 'Green', 'Red', 'Blue'],
'Shape': ['Circle', 'Square', 'Triangle', 'Circle', 'Square', 'Triangle', 'Circle', 'Square'],
'Size': ['Small', 'Large', 'Medium', 'Small', 'Large', 'Medium', 'Small', 'Large']
}
df = pd.DataFrame(data)
Step 3: Apply K-Modes Clustering
Now, apply the K-Modes algorithm to cluster the data. We’ll specify the number of clusters (k) and fit the model.
# Initialize K-Modes with 2 clusters
km = KModes(n_clusters=2, init='Huang', n_init=5, verbose=1)
clusters = km.fit_predict(df)
df['Cluster'] = clusters
Step 4: Reveal Cluster Features
To understand the characteristics of each cluster, we need to analyze the cluster centroids and the distribution of data points within each cluster.
# Cluster centroids
centroids = km.cluster_centroids_
print("Cluster Centroids:")
print(centroids)
# Cluster analysis
for cluster in range(km.n_clusters):
print(f"\nCluster {cluster}:")
cluster_data = df[df['Cluster'] == cluster]
print(cluster_data.describe(include='all'))
Output:
Cluster Centroids:
[['Red' 'Circle' 'Small']
['Blue' 'Square' 'Large']]
Cluster 0:
Color Shape Size Cluster
0 Red Circle Small 0
4 Red Square Large 0
6 Red Circle Small 0
Cluster 1:
Color Shape Size Cluster
1 Blue Square Large 1
3 Blue Circle Small 1
7 Blue Square Large 1
Revealing K-Modes Cluster Features with Scikit-Learn
Clustering is a powerful technique in unsupervised machine learning that helps in identifying patterns and structures in data. While K-Means is widely known for clustering numerical data, K-Modes is a variant specifically designed for categorical data. In this article, we will delve into the K-Modes algorithm, its implementation using Scikit-Learn, and how to reveal cluster features effectively.
Table of Content
- Understanding K-Modes Clustering
- Implementing K-Modes Clustering with Scikit-Learn
- Use-Cases and Applications of K-Modes Clustering
- Tips for Effective K-Modes Clustering