Complete Linear Regression Analysis

 Much of mathematics is devoted to studying variables that are deterministically related. Saying that x and y are related in this manner means that once we are told the value of x, the value of y is completely specified.
Equation for linear relationship between x and y : \begin{equation} y = \beta_0 + \beta_1x \end{equation}

However, there are many variables that would appear to be related to one another, but not in a deterministic fashion that is even n for a fixed value of x, there is uncertainty in the value of y. In this case, x will be called the independent, predictor, or explanatory variable

Regression Analysis is the part of statistics that investigates the relationship between two or more variables related in a non-deterministic fashion. The linear regression model has the form : \begin{equation} f(X) = \beta _0 + \sum_{j=1}^{n}X_j\beta_j \end{equation}  

The variable whose value is fixed will be denoted by x and will be called the independent predictor, or explanatory variable. For fixed x, the second variable will be random; we denote this random variable by Y and its observed values y_i and refer to it as the dependent or response variable. The observations can be made for 

• a number of settings of the independent variable denoted by \( x_1, x_2, ...., x_n\)
• an input vector \(X^T = (x_1, x_2, . . . , x_n)\)  

The available bivariate data then consists of the n pairs \((x_1,y_1),(x_2,y_2), ... ,(x_n,y_n)\) that forms the training dataset from which we estimate the parameters \(\beta\). The most popular estimation method is to minimize the residual sum of squares: \begin{equation} RSS = \sum_{i=1}^{N}(y_i-f(x_i))^2\end{equation} \begin{equation}= \sum_{i=1}^{N}(y_i-\beta _0 - \sum_{j=1}^{n}X_j\beta_j)^2\end{equation}. By minimizing the above equation, we get the values \begin{equation} \beta_1 = \frac{\sum_{i=1}^{n}(x_i-\overline{x})(y_i-\overline{y})}{\sum_{i=1}^{n}(x_i-\overline{x})^2}\end{equation}, \begin{equation} \beta_0 = \overline{y} - \beta_1\overline{x} \end{equation} where \(\overline{y} = \frac{1}{n}\sum_{i=1}^{n}y_i\) and \(\overline{x} = \frac{1}{n}\sum_{i=1}^{n}x_i\) are the sample means.  

 RSE and \(R^2\) Statistics :

The quality of a linear regression fit is typically assessed using two related quantities: the residual standard error (RSE) and the \(R^2\) statistics. 

Due to the presence of error terms associated with each observations, , even if we knew the true regression line, we would not be able to perfectly predict Y from X.

Diagram for least squares criterion showing the vertical deviations whose sum of squares is minimized

The RSE is an estimate of the standard deviation of the error term \epsilon . It is calculated using the formula : \[RSE =\sqrt{ \frac{1}{n-2}RSS}\]

The RSE provides an absolute measure of lack of fit of the model to the data. But since it is measured in the units of Y , it is not always clear what constitutes a good RSE. The \(R^2\)  statistic provides an alternative measure of fit. It takes the form of a proportion—the proportion of variance explained—and so it always takes on a value between 0 and 1, and is independent of the scale of Y.

 Formula to calculate \(R^2\) :

  \begin{equation}R^2 = \frac{RSS-TSS}{TSS} = 1-\frac{RSS}{TSS}\end{equation} where \(TSS=\sum(y_i - \overline{y})^2 \) which is the total sum of squares.

 
Pearson Correlation Coefficient

The Pearson Correlation Coefficient is also a measure of the linear relationship between the independent predictor and the dependent variable. It is defined as : \begin{equation} Cor(X,Y) = \frac{\sum_{i=1}^{n}(x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum_{i=1}^{n}(x_i-\overline{x})^2}\sqrt{\sum_{i=1}^{n}(y_i-\overline{y})^2}}\end{equation}  


In conclusion, linear regression is a very simple approach for supervised machine learning. It is simple and often provide an adequate and interpretable description of how the inputs affect the output. For prediction purposes it can sometimes outperform fancier nonlinear models, especially in situations with small numbers of training cases, low signal-to-noise ratio or sparse data. 




Comments

Popular posts from this blog

Mechanism of Principal Component Analysis