Data Science Workshop 2 (Part 1): What is Machine Learning?
We are starting Workshop 2. Starting from this video, we will be discussing machine learning topics using the Python programming language. As I said early in this workshop series, these videos come from workshops that I ran for experienced professionals in the software and information technology industry. Of course, for privacy reasons, I do not include the conversations I had with the audience. I create a voice-over for each of these videos.
Here is the video presentation answering the question — What is machine learning?
Contents
- 1 What we discussed in the previous videos
- 2 What is machine learning?
- 3 An Example: Machine learning in detecting spam emails
- 4 A few more examples of the use of machine learning
- 5 Is machine learning a form of artificial intelligence?
- 6 Types of machine learning algorithms
- 7 Supervised learning
- 8 Unsupervised learning
- 9 Reinforcement learning
What we discussed in the previous videos
In the previous videos, you watched what I discussed with the audience regarding NumPy and Pandas in the Python programming language.
We are moving to some core Machine learning algorithms using Python. Before moving to core machine learning algorithms, I explained what machine learning is to the audience.
What is machine learning?
Machine learning refers to a set of “smart” algorithms that can learn from data to solve computational problems. You can consider that these algorithms become experienced by looking into data.
An Example: Machine learning in detecting spam emails
Please allow me to give a familiar and simple example. You have your emails. Some of your emails are detected as spam. They move to the Spam folder automatically.
If a spam email arrives in your inbox, you probably report it as spam.
Sometimes if a legitimate email is automatically moved to the Spam folder, you probably report it as not spam.
In other words, you can report an email as spam as well as you can un-spam a good email if that good email was marked as spam mistakenly.
A spam detection software learns what types of emails are spam and what are not spam for you. That is, an email that is spam for you might be a legitimate email for another person.
The spam detection algorithm learns from what they observe from your actions. The data created over time regarding indications of spam and not-spam can be used to make the algorithm more intelligent over time.
A few more examples of the use of machine learning
A few more examples are — recognizing people’s faces or recognizing the type of detected objects in images or videos.
Such smart algorithms can be used to diagnose cancer regions, tumors, and many more complicated health-related situations.
Machine learning algorithms are widely used for automation and many times for prediction. The use of Machine learning in programming robots, predicting stock markets, and even automatic analysis of public sentiments from social media data are no more a dream.
Is machine learning a form of artificial intelligence?
That’s an excellent question. Machine learning is considered a part of Artificial Intelligence. One can say that Machine Learning is data-driven artificial intelligence.
Machine learning also has considerable overlap with computational statistics.
Also, the phrases data mining and machine learning are sometimes used interchangeably, even though some differences can be listed between them.
Whatever the terminology is, Data Science is all about open-minded exploration of data to solve computational problems. Therefore, you will see that data scientists come up with many different types of algorithms that resemble concepts from various disciplines.
Types of machine learning algorithms
There are many machine learning algorithms depending on the relevant tasks you need to pursue. The three most prominent types of learning algorithms are:
- Supervised learning,
- Unsupervised learning, and
- Reinforcement learning
Supervised learning
Supervised learning is appropriate when you have supervised information with the data.
For example, you have a bunch of emails. You have a table where each row represents how many times a word was seen in that email. Additionally, you have the information of whether this email is spam or non-spam. In this example, the labels — spam and not spam — are the supervisions. A machine learning algorithm can be designed to learn what words are prominent for spam and what words are prominent for non-spam.
Then for any new email, the algorithm can automatically declare the email as a spam email or not a spam email, based on what words are prominent in the new email. That is, the algorithm can apply its learning to detect spam.
This type of solution is called classification.
In the example of spam detection, there were two classes — spam and not-spam. The algorithm learns how spam emails look like and how the not-spam emails look like. Then it applies its learned knowledge to identify spam and not-spam for new unseen emails.
The faces detected on your social media images might have name tags, and these names are sometimes set automatically. Those name tags are sometimes results of supervised learning. The algorithm already received the supervision samples in the past from some other images. It trained itself using what was observed in the past. For the faces in a new image, the algorithm can say which face is who.
Now, let us talk about unsupervised learning.
Unsupervised learning
In unsupervised learning, we do not have labels. That is, the data will not come with any supervision, and hence the name “unsupervised learning” gets justified.
One unsupervised learning task is — finding groups of objects in data. Finding groups of things from data is called a clustering task.
For example, let’s assume that we have a collection of images. A clustering algorithm will be able to group the photos based on their similarities. The dogs should go in one cluster in the following set of images, and the cats should be in another cluster when a clustering algorithm is applied.
As humans, we know that one cluster contains all the images of dogs in the example above, and another cluster contains all the images of cats.
However, the clustering algorithm does not know that the first cluster it created has all dogs. The algorithm only knows that these images have something similar. Also, the algorithm does not know that the other cluster contains all cats. The algorithm only knows that those images have similar entities inside them.
The algorithm was never told what a dog looks like or a cat looks like. Hence the algorithm was never supervised. The clustering algorithm just figured out, on its own, that there are two groups or clusters in these images.
I have a more detailed article and a video on clustering. The article and the video can be found here.
Some other examples of unsupervised learning are, finding which commodities in a shop are purchased together most frequently.
Also, problems such as if you have too many features, how to reduce the number of features to make computations more feasible. This problem is called dimensionality reduction.
Now, what is reinforcement learning?
Reinforcement learning
Reinforcement learning is a reward-based learning mechanism. It is neither supervised nor unsupervised. How does it work then? Consider how children learn. Children learn with trial and error. When a child puts a toy inside the mouth, parents become anxious and immediately say, “no no no, you should not do it.”
When children try to draw something, parents praise them. That is a reward. When a child tries to walk, she gets a reward based on how she steps on the ground. The reward is being able to walk, and failure is falling on the floor.
Similarly, a reinforcement learning algorithm learns from a reward mechanism. The reward mechanism is problem-specific. The reward is generally based on whether the state of the problem is closer to a goal state or not.
Well. We have some ideas about supervised learning, unsupervised learning, and reinforcement learning.
We will start our machine learning python coding using unsupervised machine learning. We will create a synthetic dataset and then apply a clustering algorithm.