Before discussing Linear regression, we will discuss Machine learning in brief and where does the Linear regression will fit in Machine learning.

Machine Learning is divided into two parts. Supervised Learning and Unsupervised Learning. Every part is again divided into two parts, classification, and regression.


What is Supervised and Unsupervised Learning?


Supervised learning is something where the data is divided into x and y variables. All the x variables will result in y. 

Unsupervised learning is something where we will have only x variables.

In layman terms we can say in supervised learning human supervision is needed in the initial stage to identify the Y variables. unsupervised learning supervision is not required, the machine will find out the Y variable.

In classification, we need to predict a class i.e., yes or no. 

In prediction, we need to predict a value.

Linear regression comes under Prediction (we will predict a value every time).


Linear Regression:


The basic assumption in linear regression is all X and Y variables are linearly dependent on each other i.e., for every change in X there is a change in Y.

the equation if LR is Y=mX+c

Here we are trying to find the values of m and c so that we can predict the value of Y for given X values.


Steps in Linear regression:


1. The first step is always understanding the data through EDA.

2. we need to treat the null's in the dataset.

  As a beginner we can go with the following rules:

  If we have columns with more than 40% or 50% null's we can remove those columns.

  If we have fewer nulls we can replace them with mean, mode(if categorical), or median values.

Note: this is the not correct method to replace nulls. since we are beginners we are going with this.

3. We need to scale the values if required. some of the scaling techniques are z score, log, square, and cube.

5. Divide the data as X variables and Y variables.

6. Now we need to split the data into test and train. The training data is used to train the model and the test data is to test the performance of the model. while splitting the data we need to mention split ratio and a random seed.

These two are the hyperparameters. which means by changing these two the accuracy of the model will change.

7. we train the model with train data. 

8. To measure the model we use R square and Adjusted R square as metrics.