Attend FREE Webinar on Data Science for Career Growth Register Now

5 Essential Machine Learning Techniques To Help You Get Started

5 (100%) 3 votes

Introduction

Enough time has passed for us to know what Machine Learning is. But, Machine learning is one broad area. Within it, there are several Machine Learning techniques you can use to analyse your data. However; before learning the advanced concepts, we must learn some common Machine Learning tools and techniques to understand what is really happening in much-hyped Machine Learning world. In this post we are going to explore different types of Machine Learning techniques.

But before diving into that, let’s talk about terminology. Here I am going to use three different terms: Techniques, Algorithms, and Models. Let me explain you each one.

Techniques in Machine Learning refers to the way of solving problems. For example, Regression (which we will see later on) is a technique to predict a value. To do some regression, a Data Scientist would have to apply specific Algorithms like linear regression in order to get job done. And finally, having applied an algorithm to some data, the end result would be a trained model on which you can use new data to generate new results. Don’t worry if you didn’t get any of there, it will be lucid as you read on.

5 Essential Machine Learning Techniques

Let’s get right through some of the techniques in Machine Learning.

#1 Regression

In regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. To exemplify, given data about the size of houses on the real estate market, try to predict their price. Another example would be, given a picture of a person, we have to predict their age or gender.

Linear Regression is one of the widely used and well understood regression algorithm in Machine Learning as well as Statistics. It is simply estimating real values based on continuous variable(s). In more technical terms, we establish relation between independent and dependent variables by fitting a best line (real estate example). This line is known as regression line, which is represented by a linear equation, Y = a*X + B, where, Y is Dependent variable, a is Slope, X is Independent variable, b is Intercept.

Also, there are sever other techniques to create a linear regression model. Which includes, simple linear regression, Ordinary least square, Gradient descent, and Regularization. Regularization techniques in Machine Learning comes in handy when there is overfitting in your input training data.

#2 Classification

Classification is basically predicting the class, or to which class our data points belong. As we know, our community of Machine Learning is not always good at giving names to methods. So, the class is sometimes referred to as targets, labels, or categories. Classification is in the same category as the Regression, which is supervised learning. For instance, spam detection in emails can be identified as a classification problem. This is a binary classification since there are only two classes, spam or no spam. The application of classification is wide. It can be useful in domains such as credit approval, medical diagnosis, target marketing and so on.

However; diving into a little deeper, there are two types of learners in classification, lazy learners and eager learners. Lazy learners store the training data and wait until the testing data appears on the surface. Hence, they have less time in training but more in predicting. For example, k-nearest neighbour. On the other hand, Eager learners construct a classification model based on training data before classification. Examples of eager learners are, Decision tree and Naïve Bayes.

Data Analytics Course by Digital Vidya

Free Data Analytics Webinar

Date: 13th Dec, 2018 (Thursday)
Time: 3 PM to 4 PM (IST/GMT +5:30)

#3 Clustering

Clustering is a common unsupervised technique in Machine Learning which groups the data points of same category. Given some data points, we can use any clustering algorithm to specify each data point into a specific group. In Data Science, it is highly used to gain valuable insights from our data. There are mainly five algorithms in clustering. Most popular and widely used algorithm for solving clustering problems is K-means clustering.

The diagram you see above is the real-world diagram. In which, you can see that data points are clustered into 5 categories. In case you are wondering, the black arrows represent part of the process of calculating clusters and their boundaries. This approach is frequently used for customer segmentation. You can evaluate credit risk or you can even do things like finding similarities between written documents. Basically, if you have large amount of data and don’t know where to start, clustering data points of same group is a good way to start.

#4 Regularization

One major aspect of training your model is avoiding overfitting. The model will have low accuracy if it is overfitting. Overfitting happens because your model is trying to hard to capture noise in your training dataset. There are several methods to deal with overfitting, such as cross validation. However, another one is Regularization. It is a technique which encourages learning a more complex and flexible model, and to avoid risk of overfitting.

So, what do we achieve from Regularization? If you know your way around Variance-Bias trade-off, regularization significantly reduces the variance of the model without substantially increasing its bias. A popular library for implementing all these algorithms is scikit-learn, find yourself some data and play with it to get better idea of how these things work.

#5 Anomaly detection

Sometimes you don’t want to group things or classify them into categories. Instead what you are looking for is something unusual, something that stands out in some way. That approach is Anomaly Detection. It a classic technique in data mining practical world because in real life finding outliers is the most tedious task. Anomalies can broadly categorize as:

Point Anomalies- Single instance of data is anomalous. For example, detecting credit card spend based on amount spend.

Contextual Anomalies- When the abnormality is context specific. For example, spending $100 on food everyday during holiday season is alright but not otherwise.

Collective Anomalies- Set of collective data instances helps in detecting anomalies. For example, a potential cyber attack flagged because someone is trying to copy data from remote machine to the local host.

Endnotes

These are the five, but not limited to, Machine Learning techniques I find very basic to start with. There are endless application and advantages of Machine Learning techniques which goes from detecting a cancer, there are Machine Learning techniques for stock prediction, for self-driving cars and so on. Finding the right technique to solve the right problem is the key to success in Machine Learning. If you have any doubts about these techniques, do let us know in the comments. We will be happy to resolve your doubts.

Happy learning.

Guest Blogger (Data Science) at Digital Vidya. A Data passionate who loves reading and diving deeper into the Machine Learning and Data Science arts. Always eager to learn about new research and new ways to solve problems using ML and AI.

  • Data-Analytics

  • Your Comment

    Your email address will not be published.