R – Hierarchical Clustering

Hierarchical clustering is of two types: 

  • Agglomerative Hierarchical clustering: It starts at individual leaves and successfully merges clusters together. Its a Bottom-up approach.
  • Divisive Hierarchical clustering: It starts at the root and recursively split the clusters. It’s a top-down approach.

Theory:

In hierarchical clustering, Objects are categorized into a hierarchy similar to a tree-shaped structure which is used to interpret hierarchical clustering models. The algorithm is as follows:  

  1. Make each data point in a single point cluster that forms N clusters.
  2. Take the two closest data points and make them one cluster that forms N-1 clusters.
  3. Take the two closest clusters and make them one cluster that forms N-2 clusters.
  4. Repeat steps 3 until there is only one cluster.

Dendrogram is a hierarchy of clusters in which distances are converted into heights. It clusters n units or objects each with p feature into smaller groups. Units in the same cluster are joined by a horizontal line. The leaves at the bottom represent individual units. It provides a visual representation of clusters.
Thumb Rule: Largest vertical distance which doesn’t cut any horizontal line defines the optimal number of clusters.

Hierarchical Clustering in R Programming

Hierarchical clustering in R Programming Language is an Unsupervised non-linear algorithm in which clusters are created such that they have a hierarchy(or a pre-determined ordering). For example, consider a family of up to three generations. A grandfather and mother have their children that become father and mother of their children. So, they all are grouped together to the same family i.e they form a hierarchy.

Similar Reads

R – Hierarchical Clustering

Hierarchical clustering is of two types:...

The Dataset

mtcars(motor trend car road test) comprise fuel consumption, performance, and 10 aspects of automobile design for 32 automobiles. It comes pre-installed with dplyr package in R....

Performing Hierarchical clustering on Dataset

...