
Simple linear regression
For simple linear regression, we will cover its methodology, diagnostics and application.
How is sales (in thousands of units) for a particular product related to advertising budgets (in thousands of dollars) for TV, radio or newspaper?
How accurately can sales amount be predicted by advertising budgets for TV, radio or newspaper?
Is splitting advertising budgets for radio and TV better than allocating it only to TV (or radio) in terms of increasing sales amount?

How is Balance of credit card users related to Gender or marital status (Married or not)?
How is Balance of credit card users related to Age or Income?
Do users of different Gender in a given Income range have different Balance?

How is Balance of a credit card user related to Age or Income? 
For the two motivating examples, simple linear models can be proposed as follows:
Sales \(\approx \beta_0\) + \(\beta_1 \times\) TV for some constants \(\beta_0\) and \(\beta_1\)Balance \(\approx \beta_0\) + \(\beta_1 \times\) Income for some constants \(\beta_0\) and \(\beta_1\)Balance \(\approx \beta_{0,0}\) + \(\beta_{1,0} \times\) Income for some constants \(\beta_{0,0}\) and \(\beta_{1,0}\) if Gender=FemaleBalance \(\approx \beta_{0,1}\) + \(\beta_{1,1} \times\) Income for some constants \(\beta_{0,1}\) and \(\beta_{1,1}\) if Gender=MaleThe model \[ \textsf{Sales} \approx \beta_0 + \beta_1 \times \textsf{TV}\] can be interpreated as \[E(Y) = \beta_0 + \beta_1 X\] with \(Y\)=Sales and \(X\)=TV.
Note: \(E(Y)\) is modelled as a linear function of \(X\).
The model \[\textsf{Balance} \approx \beta_{0,0} + \beta_{1,0} \times \textsf{Income} \quad \text{if} \quad \textsf{Gender=Female}\] can be interpreated as \[E(Y) = \beta_{0,0} + \beta_{1,0} X \quad \text{if} \quad X_1=0\] with \(Y\)=Balance and \(X\)=Income if \(X_1\)=Gender and \(X_1=0\), where Female is coded as \(0\).
Simple linear regression with a quantitative predictor is used to model the mean of a quantitative random variable (called response variable) as a linear function of another quantitative random variable (called dependent variable).
For two quantitative random varibles \(Y\) and \(X\), a simple linear model is \[E(Y) = \beta_0 + \beta_1 X,\] where \(\beta_0\) (intercept) and \(\beta_1\) (slope) are unknown, true model parameters (or coefficients), and \(\beta_1\) is called the regression coefficient.
The above model is equivalent to \[Y = \beta_0 + \beta_1 X + \varepsilon \quad \text{ with } \quad E(\varepsilon)=0,\] which is called the population regression line.
Why a model \[Y = \beta_0^{\prime} + \beta_1^{\prime} X + \varepsilon^{\prime} \quad \text{ with } \quad E(\varepsilon^{\prime}) \ne 0\] can always be written into \[Y = \beta_0 + \beta_1 X + \varepsilon \quad \text{ with } \quad E(\varepsilon) = 0\] when \(\varepsilon\) is independent of \(X\)?
What does \(\varepsilon\) represent?
The model \[E(Y) = \beta_0 + \beta_1 X\] and its equivalent \[Y = \beta_0 + \beta_1 X + \varepsilon \quad \text{ with } \quad E(\varepsilon)=0\] postulate the following:
With observations \((x_i,y_i),i=1,\ldots,n\) for \((X,Y)\),
the model postulates \[y_i = \beta_0 + \beta_1 x_i + \varepsilon_i,\] where \(\varepsilon_i\) is an unobservable realization of \(\varepsilon\);
an estimated model is \[\hat{y}=\hat{\beta}_0 + \hat{\beta_1} x,\] where \(\left(\hat{\beta}_0,\hat{\beta}_1\right)\) is an estimate of \(({\beta}_0,{\beta}_1)\), and \(\left(\hat{\beta}_0,\hat{\beta}_1\right)\) is a function of the data \((x_i,y_i)\) for \(i=1,\ldots,n\).
Based on an estimated model \(\hat{y} = \hat{\beta}_0 + \hat{\beta_1} x\),
Note: Almost all information on \(\varepsilon\) is contained in the \(e_i\)’s.
TV and \(Y\)=salesCheck correlation:
> cor(adData$sales,adData$TV)
[1] 0.7822244
Target: Estimate the model \(E(Y) = \beta_0 + \beta_1 X\).
Scatterplot of \(X\)=TV and \(Y\)=sales:

An estimated model \(\hat{y} = 5 + 0.05 x\):

Since an estimate \(\left(\hat{\beta}_0,\hat{\beta}_1\right)\) is a function of the observations \((x_i,y_i)\) for \(i=1,\ldots,n\), there are infinitely many lines \[\hat{l}(X)=\hat{\beta}_0 + \hat{\beta}_1X\] with \(\hat{\beta}_0,\hat{\beta}_1 \in \mathbb{R}\) that can be used as an estimate of \(E(Y) = \beta_0 + \beta_1 X\).
Which line to should we use?
Estimate based on least squares \(\hat{y} = 7.03+0.047x\):

An estimate and an optimal estimate:

Each fitted (or estimated) model \(\hat{y}=\hat{\beta}_0 + \hat{\beta}_1 x\) produces
So, we seek for \(\left(\hat{\beta}_0,\hat{\beta}_1\right)\) for which the corresponding residual sum of squares (RSS) \[\textsf{RSS}=\sum_{i=1}^n e_i^2=\sum_{i=1}^n \left[y_i - \left(\hat{\beta}_0 + \hat{\beta}_1 x_i\right)\right]^2\] is minimized. This is called the least squares method, and \(\left(\hat{\beta}_0,\hat{\beta}_1\right)\) the least squares estimate of \(\left({\beta}_0,{\beta}_1\right)\).
The LS method gives the least squares estimate (LSE):
\(\hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\) with \(\bar{y}=n^{-1}\sum_{i=1}^n y_i\)
\(\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}\) with \(\bar{x}=n^{-1}\sum_{i=1}^n x_i\)
Namely, the fitted model is \[\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x,\] also called the least squares line.
Note: “fitted model” is the same as “estimated model”.

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] knitr_1.21
loaded via a namespace (and not attached):
[1] compiler_3.5.0 magrittr_1.5 tools_3.5.0
[4] htmltools_0.3.6 revealjs_0.9 yaml_2.2.0
[7] Rcpp_1.0.0 stringi_1.2.4 rmarkdown_1.11
[10] stringr_1.3.1 xfun_0.4 digest_0.6.18
[13] evaluate_0.12