KMeans Clustering with Iris Dataset

K-means clustering is an Unsupervised machine learning algorithm.

First, choose the clusters K
Randomly select k centroids from the whole dataset
Assign all points to the closest cluster centroid
Recompute centroids again for new clusters
now repeat steps 3 and 4 until centroids converge

Python3

wcss = [] 
  
for i in range(1, 11): 
    kmeans = KMeans(n_clusters=i, 
                    init='k-means++', 
                    max_iter=300, 
                    n_init=10, 
                    random_state=0) 
    kmeans.fit(x) 
    wcss.append(kmeans.inertia_) 
      
# from above array with help of elbow method 
#we can get no of cluster to provide. 
kmeans = KMeans(n_clusters=3, 
                init='k-means++', 
                max_iter=300, 
                n_init=10, 
                random_state=0) 
y_kmeans = kmeans.fit_predict(x) 

In the above code, we have used the elbow method to get the optimized value of k. If we plot a graph for it we get a value of 3.

Visualizing the Clusters

Python3

# Visualising the clusters 
cols = iris.columns 
plt.scatter(X.loc[y_kmeans == 0, cols[0]], 
            X.loc[y_kmeans == 0, cols[1]], 
            s=100, c='purple', 
            label='Iris-setosa') 
plt.scatter(X.loc[y_kmeans == 1, cols[0]], 
            X.loc[y_kmeans == 1, cols[1]], 
            s=100, c='orange', 
            label='Iris-versicolour') 
plt.scatter(X.loc[y_kmeans == 2, cols[0]], 
            X.loc[y_kmeans == 2, cols[1]], 
            s=100, c='green', 
            label='Iris-virginica') 
  
# Plotting the centroids of the clusters 
plt.scatter(kmeans.cluster_centers_[:, 0], 
            kmeans.cluster_centers_[:, 1], 
            s=100, c='red', 
            label='Centroids') 
  
plt.legend() 

Output:

Clusters obtained by using the K-means algorithm

Accuracy and Performance of Model

Now let’s check the performance of the model.

Python3

pd.crosstab(iris.target, y_kmeans)

Output:

As the algorithm is an unsupervised algorithm we don’t have test data here to check the performance of the model on it. Setosa class is clustered perfectly. While Versicolor has only 2 misclassifications. Class virginica is getting overlapped Versicolor hence there is 14 misclassifications.

Analyzing Decision Tree and K-means Clustering using Iris dataset

Iris Dataset is one of best know datasets in pattern recognition literature. This dataset contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2 the latter are NOT linearly separable from each other.

Attribute Information:

Sepal Length in cm
Sepal Width in cm
Petal Length in cm
al Width in cm
Class:
- Iris Setosa
- Iris Versicolour
- Iris Virginica

Let’s perform Exploratory data analysis on the dataset to get our initial investigation right.

KMeans Clustering with Iris Dataset

Python3

Visualizing the Clusters

Python3

Accuracy and Performance of Model

Python3

Analyzing Decision Tree and K-means Clustering using Iris dataset

Attribute Information:

Categories

Contact US

KMeans Clustering with Iris Dataset

Python3

Visualizing the Clusters

Python3

Accuracy and Performance of Model

Python3

Analyzing Decision Tree and K-means Clustering using Iris dataset

Attribute Information:

Similar Reads

Categories

Contact US