Performing Hierarchical clustering on Dataset
Using Hierarchical Clustering algorithm on the dataset using hclust() which is pre-installed in stats package when R is installed.
R
# Finding distance matrix distance_mat <- dist (mtcars, method = 'euclidean' ) distance_mat # Fitting Hierarchical clustering Model # to training dataset set.seed (240) # Setting seed Hierar_cl <- hclust (distance_mat, method = "average" ) Hierar_cl # Plotting dendrogram plot (Hierar_cl) # Choosing no. of clusters # Cutting tree by height abline (h = 110, col = "green" ) # Cutting tree by no. of clusters fit <- cutree (Hierar_cl, k = 3 ) fit table (fit) rect.hclust (Hierar_cl, k = 3, border = "green" ) |
Output:
- Distance matrix:
- The values are shown as per the distance matrix calculation with the method as euclidean.
- Model Hierar_cl:
- In the model, the cluster method is average, distance is euclidean and no. of objects are 32.
- Plot dendrogram:
- The plot dendrogram is shown with x-axis as distance matrix and y-axis as height.
- Cutted tree:
- So, Tree is cut where k = 3 and each category represents its number of clusters.
- Plotting dendrogram after cutting:
- The plot denotes dendrogram after being cut. The green lines show the number of clusters as per the thumb rule.
Hierarchical Clustering in R Programming
Hierarchical clustering in R Programming Language is an Unsupervised non-linear algorithm in which clusters are created such that they have a hierarchy(or a pre-determined ordering). For example, consider a family of up to three generations. A grandfather and mother have their children that become father and mother of their children. So, they all are grouped together to the same family i.e they form a hierarchy.