Introduction to Hierarchical Clustering
Hierarchical clustering is a method of cluster analysis that seeks to build a hierarchy of clusters. It is particularly useful when the number of clusters is not known beforehand. The main idea is to create a tree-like structure (dendrogram) that represents the nested grouping of data points.
Types of Hierarchical Clustering
There are two main types of hierarchical clustering:
- Agglomerative Clustering: Agglomerative clustering is a “bottom-up” approach. It starts with each data point as a single cluster and merges the closest pairs of clusters iteratively until all points are in a single cluster or a stopping criterion is met.
- Divisive Clustering: Divisive clustering is a “top-down” approach. It starts with all data points in a single cluster and recursively splits the clusters into smaller ones until each data point is in its own cluster or a stopping criterion is met.
In this article, we will focus on agglomerative clustering, as it is more commonly used and is well-supported by Scikit-Learn.
Hierarchical Clustering with Scikit-Learn
Hierarchical clustering is a popular method in data science for grouping similar data points into clusters. Unlike other clustering techniques like K-means, hierarchical clustering does not require the number of clusters to be specified in advance. Instead, it builds a hierarchy of clusters that can be visualized as a dendrogram. In this article, we will explore hierarchical clustering using Scikit-Learn, a powerful Python library for machine learning.
Table of Content
- Introduction to Hierarchical Clustering
- How Hierarchical Clustering Works?
- Dendrograms: Visualizing Hierarchical Clustering
- How to Read a Dendrogram?
- Implementing Hierarchical Clustering with Scikit-Learn
- Advantages and Disadvantages of Hierarchical Clustering