Tutorial on Python Linear Regression With Example

by | Oct 25, 2018 | Data Analytics

7 Min Read. |


The world is growing rapidly and so does technology. Each day you see something of which you haven’t heard of. Machine Learning is one of that field in which new advancements take place every day. However; in order to reach the summit—in our case mastering Machine Learning techniques—you must start from the bottom. Linear regression is one of the earliest and most used algorithms in Machine Learning and a good start for novice Machine Learning wizards.

Therefore, in this tutorial of linear regression using python, we will see the model representation of the linear regression problem followed by a representation of the hypothesis. After that, we will dive into understanding how cost function works and a brief idea about what gradient descent is before ending our tutorial with an example. Let’s get started.

Model Representation

To narrate the linear regression problem more technically, given a training set, our goal is, to learn a function h: X -> Y so that h(x) is a good predictor for the corresponding value of y. Because we, people in Machine Learning, are not good at determining terms, this ‘h’ is called the hypothesis, which is basically a function. To visualize it, look down at the diagram.


When the target variable is continuous, such as our y, the problem is regression problem and we use regression algorithms to solve that, whereas if y can take only discrete values, we call it classification problem.

How do we represent the Hypothesis?

Now that we know what our hypothesis is, let’s take a look at how it works. As I mentioned earlier, linear regression has two methods— Univariate and Multivariate, the hypothesis for both is different. For those who are aware of linear algebra must know that the equation for the straight line would be,

Y = x0 + m(X1) + constant

Where m is the slope of the line, whereas c is constant. Our hypothesis is the same thing in one way or another,

h(x) = theta0 + theta1(x)  — Univariate

h(x) = theta0 + theta1(x1) + theta2(x2) …. — Multivariable

Where x is our input variable. And, theta0 and theta1 are what we call parameters. As you can see from the image below, the line is our hypothesis, linear in nature. You must be wondering, what about theta0 and theta1? What are they? What is their value? Where do we use them? Right? Let’s get to it.

Now that you have the equation for the hypothesis, you should select the values of theta0 and theta1 such as the line, which we are plotting on the data, fits perfect and can give us the output we desire. The question remains, how do we find those values? So, for that, there are several techniques which we will see later in this post.

Let’s parrot back what we have seen so far, linear regression classifies as a Supervised Learning problem. Given the data we predict the value for something, for example, given the data of area and prices of 50 houses, we predict what will be the price of our house. In order to predict we need a hypothesis or function to which we will feed the data and it will give us the output. That hypothesis must fit through the data to give us the most accurate output. Clear until now? If now I’d suggest you go back and read it once more. If you are clear, let’s roll to understand how we will derive theta0 and theta1.

Cost Function

The idea behind the cost function is that we choose the values of theta0 and theta1 such that the h(x), our hypothesis or you can say the output of the function, is close to y— output variable— for our training example (x, y). The cost function is also called Squared error function.


The equation is as shown above. Don’t mind my bad handwriting. Also, please note that m= number of the training set and ½ is taken for the sake of simplicity in the calculation for the later stage. Let’s take an example to understand what we have seen up until now to get the picture even more clear. Our goal for this example is to find the values of theta0 and theta1 such that it can give us a global minimum.

  1. Let’s assume theta0 = 0 and theta1 = 1, give that if we plot our hypothesis, it would look something like this,


Now that we have theta0 and theta1 let’s find our J(theta), meaning cost function. All output points and our hypothesis perfectly match therefore the difference between them would be, of course, zero. Which would look like the image on the right side, the value of J(theta1), which is J(1) is 1.

  1. Now let’s find the value for theta0 = 0 and theta1 = 0.5, for that the plots would look like this,


As you can see, there’s a difference between the output value and out predicted value. Now if we find the difference and put it in our cost function equation it will give us value near 0.58 as it is marked.

  1. For the third example, let’s take theta0 and theta1 both equal to zero. That a line on the x-axis.


That will give us a value near 2.3 as marked. Now plotting different charts would give us something like this,

From which you can tell that the value of J(theta) is minimum at J(1) so we will select that as our value for theta. Now you know all there is to know about Linear regression machine-learning python.

A Step Further

Gradient Descent— It is a first-derivative optimization algorithm. Now that we have our hypothesis and cost function, to optimize it we will use a derivative of our cost function. The slope defines which way to move and the size of each step is determined by the learning parameter alpha.

Download Detailed Curriculum and Get Complimentary access to Orientation Session

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)
  • This field is for validation purposes and should be left unchanged.


Let’s hit some python regression python code to get a more lucid idea of what we saw earlier. The best way to implement any machine learning algorithm is to use the scikit-learn library. To know more about scikit-learn visit their official website.


import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Load the diabetes dataset
diabetes = datasets.load_diabetes()

# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]

# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print("Mean squared error: %.2f"
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))

# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)




Coefficients: [938.23786125]
Mean squared error: 2548.07
Variance score: 0.47



If you don’t know where to start, learning Machine Learning would be a tedious task. But now that I have provided all you need to start with your first algorithm that is Linear regression Machine Learning python, you are set to embark on your journey to become a Data Science or AI wizard. However; if any part of the article isn’t clear to you, feel free to leave a comment down in the comment box and we will make sure you are not stuck there.

Happy Learning.

Register for FREE Orientation Class on Data Science & Analytics for Career Growth

Date: 13th Feb, 2021 (Saturday)
Time: 10:30 AM - 11:30 AM (IST/GMT +5:30)

  • This field is for validation purposes and should be left unchanged.

You May Also Like…

Linear Programming and its Uses

Linear Programming and its Uses

Optimization is the new need of the hour. Everything in this world revolves around the concept of optimization.  It...

An overview of Anomaly Detection

An overview of Anomaly Detection

Companies produce massive amounts of data every day. If this data is processed correctly, it can help the business to...


Submit a Comment

Your email address will not be published. Required fields are marked *