Data Analytics Blog

Data Analytics Case Studies, WhyTos, HowTos, Interviews, News, Events, Jobs and more...# Different Types Of Machine Learning

## What is Machine Learning really?

As you probably know, there are several Machine Learning definitions available on the internet, one reliable of them all is: “the field of study that gives computers ability to learn without being explicitly programmed.” However, this is an older, informal definition.

Tom Mitchell provides a more modern definition, which is: “*A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.*“

Scratching your head? Don’t worry, let’s take an example and break it down, for instance, playing checkers, where,

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

If the *performance of machine* playing checkers, *measured by* how many *games it* *wins*, *improves* with playing many games of checkers, we can say that the machine is learning by itself, which is Machine Learning.

So, how to identify problems of Machine Learning? In general, there are two types of machine learning algorithms, *Supervised Machine Learning* and *Unsupervised Machine Learning*. In addition, new categories evolve with development in the field which can be identified as *reinforcement learning*. Let’s dive into what these categories are and how they work.

## Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. Supervised learning problems are further categorized into *regression* and *classification* problems.

*Regression *

In regression problem, we are trying to *predict* results within a continuous output, meaning that we are trying to map input variables to some continuous function. To exemplify, given data about the size of houses on the real estate market, try to predict their price. Another example would be, given a picture of a person, we have to *predict* their age or gender.

*Classification*

Classification, on the other hand, is finding the category of the input variable, or in more academic terms, mapping input variables into discrete categories. Ideal sentence to find a classification problem would be, *whether this or that*, like, yes or no, 0 or 1, true or false. For example, from the example of house price given above, if we change the output to “Sells for more or less than asking price,” then it is a classification problem. Another example is, given a patient with tumour, we have to predict *whether *the tumour is *malignant or benign*.

*How Supervised learning works*

To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis. Seen pictorially, the process is therefore like this:

When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.

*Algorithms*

There are plenty of different algorithms to solve different kind of problems. There is no right or wrong in the algorithms, it is just some apply to some problems better than the others. Supervised machine learning algorithms include Linear regression, Logistic regression, Random forest, KNN, Decision tree and so on. Let’s understand how these machine learning supervised learning algorithm works,

#### Linear Regression

Linear regression is simply estimating real values based on continuous variable(s). In more technical terms, we establish relation between independent and dependent variables by fitting a best line (real estate example). This line is known as regression line, which is represented by a linear equation, Y = a*X + B, where,

Y— Dependent variable

a— Slope

X— Independent variable

b— Intercept

Moreover, Linear regression is of mainly two types, simple and multiple. In simple, there is only one independent variable, whereas in multiple, as the name suggest, there are more than one independent variables.

#### Logistic Regression

Logistic Regression is a classification algorithm, don’t confuse with its name. It estimates discrete values based on independent variable(s). Since it predicts the probability of occurrence of a particular event by fitting data to a logistic function, output is a s expected between 0 and 1.

#### Decision Tree

This is the most favourite algorithm of all times. It is used mainly for classification problems and is of course supervised algorithm having pre-defined target variable. In this algorithm, we split the sample into two or more sub-parts based on most significant differentiator in input variables, which is done by various techniques like Gini, Chi-square, entropy etc.

## Unsupervised Learning

On the contrary to Supervised learning, Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data. With Unsupervised learning there is no feedback based on the prediction results. For example, take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on. This is a good example of clustering. Whereas, for a non-clustering problem such as “*Cocktail Party Problem”, *it helps in identifying voices music from a mesh of sounds at a cocktail party.

*Algorithms*

Unsupervised learning algorithms helps in wide range of problems such as Social Network Analysis, Astronomical Data Analysis, and so on. Google news is using this approach as well. Neural networks are a part of unsupervised learning. Let’s understand how few of them works.

#### K-means (Clustering)

The goal of clustering is to create groups of data points such that points in different clusters are dissimilar while points within a cluster are similar. With k-means clustering, we want to cluster our data points into k groups. A larger k creates smaller groups with more granularity, a lower k means larger groups and less granularity.

## Reinforcement Learning

Reinforcement Learning is, when exposed to an environment, how the machine train itself using trial and error. Machine mainly learns from past experiences and tries to perform best possible solution to a certain problem. In past couple of years, a lot of improvements in this particular area has been seen. Main example includes DeepMind’s Alpha Go, beating the champion of the game Go in 2016.

*The Reinforcement Learning Process*

Let’s understand the learning process of machine/agent by the example of agent learning to play Super Mario Bros. The process can be modelled a s a loop that works like this,

- Agent receives
*state S0*from the environment, which in our case is, the first frame of our game. - Based on that
*state S0*, agent takes an action A0, moving right/foreword. - Right after that the environment transit to a new
*state S1*, which is basically a new frame. - Environment gives some
*reward R1*to agent (not dead: +1)

#### Endnotes

By now, I am sure that you have enough idea about the different machine learning types and algorithms to get you started. Machine Learning is a field in which you learn 4 times faster by doing it rather than studying it. I would suggest take up small problems and develop your idea about how you can solve the same with Machine Learning, then find an appropriate algorithm to solve it and have fun. Do let us know in the comments if you have any doubt regarding anything written up there, we are happy to help.

Happy learning.

Data Science aspirant