Implementation of the Elbow Method Usking Sklearn in Python

We will see how to implement the elbow method in 4 steps. At first, we will create random dataset points, then we will apply k-means on this dataset and calculate wcss value for k between 1 to 4.

Step 1: Importing the required libraries

Python3

from sklearn.cluster import KMeans
from sklearn import metrics
from scipy.spatial.distance import cdist
import numpy as np
import matplotlib.pyplot as plt

Step 2: Creating and Visualizing the data

We will create a random array and visualize its distribution

Python3

# Creating the data
x1 = np.array([3, 1, 1, 2, 1, 6, 6, 6, 5, 6,\
               7, 8, 9, 8, 9, 9, 8, 4, 4, 5, 4])
x2 = np.array([5, 4, 5, 6, 5, 8, 6, 7, 6, 7, \
               1, 2, 1, 2, 3, 2, 3, 9, 10, 9, 10])
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
 
# Visualizing the data
plt.plot()
plt.xlim([0, 10])
plt.ylim([0, 10])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()

Output:

Visualizing the data using the matplotlib library

From the above visualization, we can see that the optimal number of clusters should be around 3. But visualizing the data alone cannot always give the right answer. Hence we demonstrate the following steps.

We now define the following:-

Distortion: It is calculated as the average of the squared distances from the cluster centers of the respective clusters to each data point. Typically, the Euclidean distance metric is used.

                Distortion = 1/n * Σ(distance(point, centroid)^2)

Inertia: It is the sum of the squared distances of samples to their closest cluster center.

                 Inertia = Σ(distance(point, centroid)^2)

We iterate the values of k from 1 to n and calculate the values of distortions for each value of k and calculate the distortion and inertia for each value of k in the given range.
Step 3: Building the clustering model and calculating the values of the Distortion and Inertia:

Python3

distortions = []
inertias = []
mapping1 = {}
mapping2 = {}
K = range(1, 10)
 
for k in K:
    # Building and fitting the model
    kmeanModel = KMeans(n_clusters=k).fit(X)
    kmeanModel.fit(X)
 
    distortions.append(sum(np.min(cdist(X, kmeanModel.cluster_centers_,
                                        'euclidean'), axis=1)) / X.shape[0])
    inertias.append(kmeanModel.inertia_)
 
    mapping1[k] = sum(np.min(cdist(X, kmeanModel.cluster_centers_,
                                   'euclidean'), axis=1)) / X.shape[0]
    mapping2[k] = kmeanModel.inertia_

Step 4: Tabulating and Visualizing the Results
a) Using the different values of Distortion:

Python3

for key, val in mapping1.items():
    print(f'{key} : {val}')

Output:

1 : 3.625551331197001
2 : 2.0318238533112596
3 : 1.2423303391744152
4 : 0.8367738708386461
5 : 0.736979754424859
6 : 0.6898254810112422
7 : 0.6020311621770951
8 : 0.5234596363982826
9 : 0.4587221418509788

Next we will plot the graph of k versus WCSS

Python3

plt.plot(K, distortions, 'bx-')
plt.xlabel('Values of K')
plt.ylabel('Distortion')
plt.title('The Elbow Method using Distortion')
plt.show()

Output:

graph of k vs distortion in k-means

b) Using the different values of Inertia:

Python3

for key, val in mapping2.items():
    print(f'{key} : {val}')

Output:

1 : 312.95238095238096
2 : 108.07142857142856
3 : 39.51746031746031
4 : 17.978571428571428
5 : 14.445238095238096
6 : 11.416666666666668
7 : 9.266666666666667
8 : 7.25
9 : 6.5

Python3

plt.plot(K, inertias, 'bx-')
plt.xlabel('Values of K')
plt.ylabel('Inertia')
plt.title('The Elbow Method using Inertia')
plt.show()

Output:

graph of k versus insertion

To determine the optimal number of clusters, we have to select the value of k at the “elbow” ie the point after which the distortion/inertia starts decreasing in a linear fashion. Thus for the given data, we conclude that the optimal number of clusters for the data is 4.

Implementation of the Elbow Method Usking Sklearn in Python

Step 1: Importing the required libraries

Python3

Step 2: Creating and Visualizing the data

Python3

Python3

Python3

Python3

Python3

Python3

Elbow Method for optimal value of k in KMeans

Categories

Contact US

Implementation of the Elbow Method Usking Sklearn in Python

Step 1: Importing the required libraries

Python3

Step 2: Creating and Visualizing the data

Python3

Python3

Python3

Python3

Python3

Python3

Elbow Method for optimal value of k in KMeans

Similar Reads

Categories

Contact US