Performing Hierarchical clustering on Dataset

Using Hierarchical Clustering algorithm on the dataset using hclust() which is pre-installed in stats package when R is installed.


# Finding distance matrix
distance_mat <- dist(mtcars, method = 'euclidean')
# Fitting Hierarchical clustering Model
# to training dataset
set.seed(240)  # Setting seed
Hierar_cl <- hclust(distance_mat, method = "average")
# Plotting dendrogram
# Choosing no. of clusters
# Cutting tree by height
abline(h = 110, col = "green")
# Cutting tree by no. of clusters
fit <- cutree(Hierar_cl, k = 3 )
rect.hclust(Hierar_cl, k = 3, border = "green")


  • Distance matrix:

  • The values are shown as per the distance matrix calculation with the method as euclidean.
  • Model Hierar_cl:

  • In the model, the cluster method is average, distance is euclidean and no. of objects are 32.
  • Plot dendrogram:

  • The plot dendrogram is shown with x-axis as distance matrix and y-axis as height.
  • Cutted tree:

  • So, Tree is cut where k = 3 and each category represents its number of clusters.
  • Plotting dendrogram after cutting:

  • The plot denotes dendrogram after being cut. The green lines show the number of clusters as per the thumb rule.

Hierarchical Clustering in R Programming

Hierarchical clustering in R Programming Language is an Unsupervised non-linear algorithm in which clusters are created such that they have a hierarchy(or a pre-determined ordering). For example, consider a family of up to three generations. A grandfather and mother have their children that become father and mother of their children. So, they all are grouped together to the same family i.e they form a hierarchy.

Similar Reads

R – Hierarchical Clustering

Hierarchical clustering is of two types:...

The Dataset

mtcars(motor trend car road test) comprise fuel consumption, performance, and 10 aspects of automobile design for 32 automobiles. It comes pre-installed with dplyr package in R....

Performing Hierarchical clustering on Dataset
