WHY ARE THERE TWO REGRESSION LINES?
There may exist two regression lines in certain circumstances.
When the variables X and Y are interchangeable with related to
causal effects, one can consider X as independent variable and Y as
dependent variable (or) Y as independent variable and X as
dependent variable. As the result, we have (1) the regression line of Y on X and (2) the regression line of X on Y.
Both are valid regression lines. But we must judicially select the
one regression equation which is suitable to the given environment.
Note: If, X only causes Y, then there is only one regression line, of Y on X.
In the general form of the simple linear regression equation of Y
on X
Y= a + bX + e
the constants ‘a’ and ‘b’ are generally called as
the regression coefficients.
The coefficient ‘b’ represents the rate of change in the
value of the mean of Y due to every unit change in the value of X.
When the range of X includes ‘0’, then the intercept ‘a’ is E(Y|X
= 0). If the range of X does not include ‘0’, then ‘a’ does not
have practical interpretation.
If (xi,yi), i = 1, 2, ..., n
is a set of n-pairs of observations made on (X, Y), then
fitting of the above regression equation means finding the estimates ‘a’
and ‘b’ for ‘a’ and ‘b’ respectively.
These estimates are determined based on the following general
assumptions:
(i) the relationship between Y and X
is linear (approximately).
(ii) the error term ‘e’ is a random
variable with mean zero.
(iii) the error term ‘e’ has constant variance.
There are other assumptions on ‘e’, which are not required
at this level of study.
Before going for further study, the following points are to be
kept in mind.
Both the independent and dependent variables
must be measured at the interval scale.
There must be linear relationship
between independent and dependent variables.
Linear Regression is very sensitive to Outliers (extreme observations). It can affect
the regression line extremely and eventually the estimated values of Y
too.
Based on the assumption (ii), the response variable Y is
also a random variable with mean
E(Y|X=x) = a + bx
In regression analysis, the main objective is finding the line of
best fit, which provides the fitted equation of Y on X.
The line of ‘best fit‘ is the line (straight line equation) which
minimizes the error in the estimation of the dependent variable Y, for
any specified value of the independent variable X from its range.
The regression equation E(Y|X=x) = a +bx
represents a family of straight lines for different values of the coefficients
‘a’ and ‘b’. The problem is to determine the estimates of ‘a’
and ‘b’ by minimizing the error in the estimation of Y so that
the line is a best fit. This necessitates to find the suitable values of the
estimates of ‘a’ and ‘b’.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.