Latest News

## Thursday, February 28, 2019

### LINEAR REGRESSION

LINEAR REGRESSION
Regression analysis:
Regression analysis is a statistical tool for investing the relationships between dependent and independent variables.

Regression Modeling:
Establishing a relationship between a set of explanatory or independent variables X1,X2,X2.. with the Response or Dependent Variable Y.
Y=f(X1,X2,X3..)
There are two types of Linear Regression :
1.      Simple linear regression
2.      Multiple linear regression

1.Simple Linear Regression:
Finding the statistical relationship between two continuous variables is called simple linear regression.
Here one is predictor or independent variable and other is response or dependent variable.

Y=B0+B1X

2.Multiple Linear Regression.
The multiple linear regression is used to explain the relationship between one continuous dependent  variable and two or more independent variable.

Y=B0+B1X1+B2X2+…+BNXn

Assumptions
•  Linearity
•  No multicollinearity
•  Normal distribution of residuals
•  Homoscedasticity
1. Linearity
There should be linear relationship between dependent variable and  independent variables.
How to check:
Plot scatter plot between X (Independent Variable) and Y (dependent variable).  Scatter plot can show whether there is linear or curvilinear relationship.
Draw scatter plot of residuals (y axis) and and y values(X axis) . scatter plot can show whether there is linear/curvilinear relationship.
2. Multicollinearity:
Multiple linear regression assumes that  independent variables are independent of each other and not highly correlated.
How To Check:
1.VIF factor(Variance Inflation Factor)
2.Correlation Matrix

3 Normality:
Error between observed and predicted values(residuals)  should be normally distributed.
Plot a  histogram or Q-Q plot of residuals .If the residuals are not skewed ,that means assumption is satisfied.

4. Homoscedasticity
Homoscedasticity means “Having the same scatter” .  All random variables in the sequence or vector have the same finite variance.
How to check:
Draw a scatter plot between residuals vs predicted variable  There should be no pattern in the distribution ,if there is a cone-shaped pattern the data is heteroscedasticity.

Optimizing algorithms  to calculate weight coefficients of Linear regression model:
·
•     OLS(Oridinary Least Square)

OLS is applicable for small dataset and small number of features.
In OLS The number of data points should be greater than the number of features.
Limitation of  OLS.
·
Computational Speed: for nXn matrix
·         Generic optimization –Regularization(Ex: Lasso)
·         No close form solution
For high dimensions we always go for gradient descent method.
What is OLS?
OLS: Ordinary Lease Square

OLS is a statistical method of analysis that estimates the relationship between one or more independent variable and a dependent variable by minimizing the sum of squares in the difference between the observed and predicted values of the dependent variable configured as a straight line.
Simple linear Regression analysis:

The relationship between a continuous response variable(Y) and a continuous explanatory variable(X) may be represented using a line of best fit.
Where Y=Predicted value
X =Independent variable
The relationship can be represented using a linear equation.
Simple linear equation:
Y=B0+B1X+E
Multiple linear equation:
Y=B0+B 1X 1 +B 2 X 2+B 3X 3+…+BnXn+E

Simple Linear Regression Equation:
Multiple Linear  Regression Equation :

Draw a line through the scatter plot in a way to minimize the deviations of the single observations from the line.
Derivation Of OLS Parameters:

Minimize the sum of all squared deviations from the line(squared residuals)
So to calculate coefficients ,we consider the minimal value of sum of all squared residuals.

Square of residuals is necessary since positive and negative deviation do not cancel each other.So take the partial derivative of the above equation WTR(WITH RESPECT TO) alpha and beta respectively to estimate the coefficients. So we will get below 2 equations:

Calculation of Intercept (alpha value):

Calculation of coefficient(Beta):