• Home
  • Data science
  • Workshops
  • Java Lectures
  • Blog
  • Fun-Videos
  • Contact
Have any question?
computeadmin [at the rate of] computing4all.com
RegisterLogin
Computing for All
  • Home
  • Data science
  • Workshops
  • Java Lectures
  • Blog
  • Fun-Videos
  • Contact

Workshops

  • Home
  • Blog
  • Workshops
  • Data Science Workshop 3 (Part 2): Choosing the number of clusters

Data Science Workshop 3 (Part 2): Choosing the number of clusters

  • Posted by M. Shahriar Hossain
  • Categories Workshops
  • Date July 7, 2021
  • Comments 0 comment
Workshop 3 Part 2: Finding optimal number of clusters

Today’s video discusses a way to find the optimal number of clusters, especially when we do not have any benchmark data. The question here is — Given a dataset and no supervision, how can we figure out what number of clusters, k, is giving us the best results?

Contents

  • 1 The YouTube Video
  • 2 Dataset
  • 3 Clustering evaluation technique used in the video
  • 4 The Notebook Code

The YouTube Video

The YouTube video is here.

Dataset

We applied the k-means clustering algorithm to the Pecan Yield Data.

Clustering evaluation technique used in the video

Using a score called silhouette coefficient, we evaluated the clustering result to find the optimal number of clusters.

There are many other mechanisms to evaluate clusters. Please note that whatever evaluation metrics you use, it is always better to look at data points from different clusters to check why the data points are in different clusters. Clustering helps us get an initial idea about a dataset. Many times, clustering is used for exploratory data analysis.

The following link contains the description of Silhouette score or coefficient along with many other clustering evaluation techniques: https://computing4all.com/courses/introductory-data-science/lessons/evaluation-of-clustering-results/

The Notebook Code

You can download the notebook file from this zip file. After extracting the file, open it with Jupyter Notebook or Jupyter Lab. Keep the Pecan.csv file and the notebook file in the same directory because the read_csv function in the notebook assumes that the Pecan.csv file is in the current directory.

The notebook code is as follows.

  • Share:
author avatar
M. Shahriar Hossain

I am an Associate Professor in the Department of Computer Science at the University of Texas at El Paso (UTEP). My specialization is Data Science (Data Mining and Machine Learning.)

Previous post

Data Science Workshop 3 (Part 1): Exploratory Data Analysis using Pandas in Python Programming
July 7, 2021

Next post

Data Science Workshop 4 (Part 1): Prediction using Linear Regression-Based Models
July 14, 2021

Leave A Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Categories

  • Blog
  • Computing
  • Data Science
  • Education
  • Java
  • Programming
  • Workshops

Computing For All by Computing4All.

Login with your site account

Lost your password?

Not a member yet? Register now

Register a new account

Are you a member? Login now