Writen by
TeaSpecialist
1:58 AM

0
Comments
ðŸ‘‰LINEAR REGRESSIONðŸ‘‡
Regression analysis:ðŸ˜‡
Regression
analysis is a statistical tool for investing the relationships between dependent
and independent variables.
Regression Modeling:
Establishing
a relationship between a set of explanatory or independent variables X1,X2,X2..
with the Response or Dependent Variable Y.
Y=f(X1,X2,X3..)
There are
two types of Linear Regression :
1. Simple linear regression
2. Multiple linear regression
1.Simple Linear Regression:
Finding the
statistical relationship between two continuous variables is called simple
linear regression.
Here one is
predictor or independent variable and other is response or dependent variable.
Y=B0+B1X
2.Multiple Linear Regression.
The multiple
linear regression is used to explain the relationship between one continuous dependent
variable and two or more independent
variable.
Y=B0+B1X1+B2X2+…+BNXn
Assumptions
 Linearity
 No multicollinearity
 Normal distribution of residuals
 Homoscedasticity
1.Linearity
There should
be linear relationship between dependent variable and independent variables.
How to
check:
Plot scatter
plot between X (Independent Variable) and Y (dependent variable). Scatter plot can show whether there is linear
or curvilinear relationship.
Draw scatter
plot of residuals (y axis) and and y values(X axis) . scatter plot can show
whether there is linear/curvilinear relationship.
2.Multicollinearity:
Multiple
linear regression assumes that
independent variables are independent of each other and not highly correlated.
How To
Check:
1.VIF
factor(Variance Inflation Factor)
2.Correlation
Matrix
3 Normality:
Error between observed and predicted values(residuals) should be normally distributed.
Plot a histogram or QQ plot of residuals .If the
residuals are not skewed ,that means assumption is satisfied.
4.Homoscedasticity
Homoscedasticity
means “Having the same scatter” . All
random variables in the sequence or vector have the same finite variance.
How to
check:
Draw a
scatter plot between residuals vs predicted variable There should be no pattern in the distribution
,if there is a coneshaped pattern the data is heteroscedasticity.
Optimizing algorithms to calculate weight coefficients of Linear
regression model:
·
 OLS(Oridinary Least Square)
 · Gradient Descent Method
OLS VS Gradient descent Method:
OLS is applicable for small dataset and small number of features.
In OLS The
number of data points should be greater than the number of features.
Limitation of OLS.
·
Computational Speed: for nXn matrix
Computational Speed: for nXn matrix
·
Generic
optimization –Regularization(Ex: Lasso)
·
No
close form solution
For high dimensions
we always go for gradient descent method.
What is OLS?
OLS:
Ordinary Lease Square
OLS is a statistical
method of analysis that estimates the relationship between one or more
independent variable and a dependent variable by minimizing the sum of squares
in the difference between the observed and predicted values of the dependent
variable configured as a straight line.
Simple linear Regression analysis:
The relationship between a continuous response variable(Y) and a continuous explanatory variable(X) may be represented using a line of best fit.
Where
Y=Predicted value
X
=Independent variable
The
relationship can be represented using a linear equation.
Simple
linear equation:
Y=B0+B1X+E
Multiple
linear equation:
Y=B_{0}+B
_{1}X _{1} +B _{2} X _{2}+B _{3}X _{3}+…+B_{n}X_{n}+E
Simple Linear Regression Equation:
Multiple Linear Regression Equation :
Draw a line
through the scatter plot in a way to minimize the deviations of the single observations
from the line.
Derivation Of OLS Parameters:
Minimize the sum of all squared deviations from the line(squared residuals)
So
to calculate coefficients ,we consider the minimal value of sum of all squared
residuals.
Square of residuals is necessary since positive and negative deviation do not cancel each other.So take the partial derivative of the above equation WTR(WITH RESPECT TO) alpha and beta respectively to estimate the coefficients. So we will get below 2 equations:
Calculation of coefficient(Beta):
No comments
Post a Comment