A few types of clustering algorithms
Researchers designed a large number of clustering algorithms over the last few decades. While there are many different types of clustering algorithms, scientists and professionals use the following three major types of clustering algorithms widely.
- Partitional clustering,
- Hierarchical clustering, and
- Density-based clustering.
Partitional clustering
Partitional clustering algorithms attempt to partition the data space into k divisions forming k clusters. Points in each division form a cluster. The following figure shows three partitions formed in a two-dimensional space representing a two-dimensional dataset. The points in each of the partitions form a cluster.
Partitional clustering algorithms deal with the data space and focus on creating a certain number of divisions of the space. The most popular partitional clustering algorithm — and probably the most used clustering algorithm in the history of humankind — is k-means clustering algorithm k-means clustering algorithm, which is a topic of the next lesson.
A limitation of a partitional clustering algorithm is that an explicit vector space is required for the clustering algorithm. Objects for which the data dimensions in the mathematical space cannot be explicitly defined, even if there exists a distance measure for objects, a partitional clustering algorithm is not a good choice. For example, if we plan to cluster tweets (messages with short strings), a partitional clustering algorithm is not a good choice. A partitional clustering algorithm goes well with numerical data and not so well with strings data.
Hierarchical clustering
Hierarchical clustering constructs a hierarchy representing how clusters of objects can be developed gradually either by merging clusters starting from many or by splitting clusters starting from one. The approach that merges clusters gradually to form a large cluster is called an agglomerative approach. The approach that splits clusters in each iteration is called a divisive approach or a top-down approach.
The agglomerative approach is more commonly used by practitioners than the divisive approach. We will discuss a standard hierarchical clustering algorithm named Hierarchical Agglomerative Clustering in one of the future lessons.
Density-based clustering algorithm
A density-based clustering algorithm can generate arbitrary shaped clusters. Density-based clustering algorithms are used when the points in the space form different dense regions instead of most points being surrounding a point. A distribution where most points surround a mean, and the density of the points reduces as they drift from the mean is called a normal distribution. Partitional clustering algorithms work well when points in clusters are normally distributed. A density-based clustering algorithm is designed to work well even when points are not normally distributed.
In the figure above, notice that the clusters of points on the left side of arbitrary shapes. It is difficult to create a partition of the space to separate the three clusters of points. A density-based clustering algorithm is a suitable algorithm to separate the arbitrary-shaped clusters.
On the other hand, the points in the clusters on the right side are normally distributed (that is, more points in the center area of a cluster and lesser number of points in the boundary.) Compared to the clusters of the left side, it is easier to create three partitions of the space to separate the three clusters of the right side. A partitional clustering algorithm is suitable for the data illustrated on the right side of the figure.
A widely used density-based clustering algorithm is DBSCAN. DBSCAN stands for Density-based Spatial Clustering of Applications with Noise. Here is the link for the original paper on DBSCAN: Density-based Spatial Clustering of Applications with Noise. We will study DBSCAN in a future lesson.
4 Comments
Nice
Assalamu Alaikum, Sir.
The strengths of your lectures are simplicity and effectiveness. I have been enjoying these lessons, as I had enjoyed your Operating System course in SUST, a couple of decades ago!
Thanks
You are welcome!