
Since the observations \((x_i,y_i),i=1,\ldots,n\) for \((X,Y)\) are random, so is the LSE \(\left(\hat{\beta}_0,\hat{\beta}_1\right)\).
\(\left(\hat{\beta}_0,\hat{\beta}_1\right)\) is unbiased, i.e., \(E(\hat{\beta}_0)=\beta_0\) and \(E(\hat{\beta}_1)=\beta_1\), when \(\varepsilon_i\)’s are uncorrelated
How accurate is \(\left(\hat{\beta}_0,\hat{\beta}_1\right)\) with respect to \(({\beta}_0,{\beta}_1)\)?
Recall \[\hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\] and \(\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}\).
When \(\varepsilon_i\)’s are uncorrelated,
How does the sample size \(n\) and the sample variance of \(x_i\)’s affect these variances?
Variances of \(\hat{\beta_0}\) and \(\hat{\beta_1}\) contain information on the accuracy of \(\hat{\beta}_0\) and \(\hat{\beta}_1\)
Without information on \(\sigma\), variances of \(\hat{\beta_0}\) and \(\hat{\beta_1}\) cannot be accurately assessed
An estimate of \(\sigma\) is
\[RSE = \sqrt{RSS/(n-2)} = \sqrt{\frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n-2}}\]
The approximate \(95\%\) confidence interval (CI)
Note: the above follows a general principle for constructing a CI when the distribution of “estimate minus parameter” is symmetric around \(0\)
Caution: The random error \(\varepsilon\) may not be independent of \(X\) and \(Y\) due to latent dependence, which is common in genetics studies.
Note: When \(H_0: \beta_1=0\) holds, \(t\) approximately has a \(t\)-distribution with \(n-2\) degrees of freedom when \(\varepsilon_i\)’s are not much dependent on each other; when \(n\) is large, a \(t\)-distribution will be close to a Gaussian distribution.
Model: \(E\)(sales) = \(\beta_0\) + \(\beta_1\) TV:
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 7.03 0.458 15.4 1.41e-35
2 TV 0.0475 0.00269 17.7 1.47e-42
Model \(E\)(mpg) = \(\beta_0\) + \(\beta_1\) horsepower:
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 39.9 0.717 55.7 1.22e-187
2 horsepower -0.158 0.00645 -24.5 7.03e- 81
If the linear model \[Y=\beta_0 + \beta_1 X + \varepsilon\] is plausible, then
So, the residuals \(e_i\)’s should contain no specific pattern on the fitted values \(\hat{y}_i\)’s or \(x_i\)’s if the model is plausible.
Model: \(E\)(sales) = \(\beta_0\) + \(\beta_1\) TV:

Model \(E\)(mpg) = \(\beta_0\) + \(\beta_1\) horsepower:

Relatively accurate inference requires \(e_i\)’s to
Otherwise, the formulae for \(SE(\hat{\beta_0})\) and \(SE(\hat{\beta_1})\) are (usually) invalid, leading to invalid inference.
Model: \(E\)(sales) = \(\beta_0\) + \(\beta_1\) TV:

Model \(E\)(mpg) = \(\beta_0\) + \(\beta_1\) horsepower:

Model: \(E\)(sales) = \(\beta_0\) + \(\beta_1\) TV:

Model: \(E\)(sales) = \(\beta_0\) + \(\beta_1\) TV:

Model \(E\)(mpg) = \(\beta_0\) + \(\beta_1\) horsepower:

Model \(E\)(mpg) = \(\beta_0\) + \(\beta_1\) horsepower:

Model: \(E\)(sales) = \(\beta_0\) + \(\beta_1\) TV (\(p=1\),\(n=397\), \(\tilde{h}=(p+1)/n \approx 0.005\)): 
Model \(E\)(mpg) = \(\beta_0\) + \(\beta_1\) horsepower (\(p=1\),\(n=200\),\(\tilde{h}=(p+1)/n=0.01\)): 
A Q-Q (quantile-quantile) plot plots observed quantiles against the quantiles of a theoretical distribution, and hence provides information on whether observations under investigation have a distribution that matches this theoretical distribution
sales) = \(\beta_0\) + \(\beta_1\) TV
mpg) = \(\beta_0\) + \(\beta_1\) horsepower
sales) = \(\beta_0\) + \(\beta_1\) TVFit1 is the object obtained from fitting the model
One-sample Kolmogorov-Smirnov test
data: Fit1$residuals
D = 0.041533, p-value = 0.8806
alternative hypothesis: two-sided
mpg) = \(\beta_0\) + \(\beta_1\) horsepowerFit2 is the object obtained from fitting the model
One-sample Kolmogorov-Smirnov test
data: Fit2$residuals
D = 0.060525, p-value = 0.1131
alternative hypothesis: two-sided
It is extremely important that the error terms are uncorrelated. Correlated error terms often present in time series data and in data with latent variables. Such correlation affects
The true model is \[Y = 1 + 2X + \varepsilon\] with \(n=1000\) observations \[y_{i} = 1+ 2 x_{i} + \varepsilon_{i}\]
The random errors are equally correlated, such that \[\varepsilon_{i} = \sqrt{1-0.3}X_{i} + \sqrt{0.3} X_{0}\] and \(X_{0}, X_{1}, \ldots, X_{1000}\) are i.i.d. standard Normal.
Fit a simple linear model and obtain fitted values and residuals

For this example, when trying to check if the random errors are independent or uncorrelated by a visual check, we see the following:
For two quantitative random variables \(Y\) and \(X\), a simple linear model is \[E(Y) = \beta_0 + \beta_1 X,\] where \(\beta_0\) (intercept) and \(\beta_1\) (slope) are unknown, true model parameters (or coefficients), and \(\beta_1\) is called the regression coefficient.
The above model is equivalent to \[Y = \beta_0 + \beta_1 X + \varepsilon \quad \text{ with } \quad E(\varepsilon)=0,Var(\varepsilon)=\sigma^2\] which is called the population regression line.
With observations \((x_i,y_i),i=1,\ldots,n\) for \((X,Y)\), the LS method gives the least squares estimate (LSE):
\(\hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\) with \(\bar{y}=n^{-1}\sum_{i=1}^n y_i\)
\(\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}\) with \(\bar{x}=n^{-1}\sum_{i=1}^n x_i\)
Namely, the fitted model is \[\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x,\] also called the least squares line.
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] ggplot2_3.1.0 broom_0.5.1 knitr_1.21
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 plyr_1.8.4 pillar_1.3.1
[4] compiler_3.5.0 tools_3.5.0 digest_0.6.18
[7] evaluate_0.12 tibble_2.1.3 nlme_3.1-137
[10] gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.2
[13] rlang_0.4.4 cli_1.0.1 rstudioapi_0.8
[16] yaml_2.2.0 xfun_0.4 withr_2.1.2
[19] dplyr_0.8.4 stringr_1.3.1 generics_0.0.2
[22] revealjs_0.9 grid_3.5.0 tidyselect_0.2.5
[25] glue_1.3.0 R6_2.3.0 fansi_0.4.0
[28] rmarkdown_1.11 purrr_0.2.5 tidyr_0.8.2
[31] magrittr_1.5 backports_1.1.3 scales_1.0.0
[34] htmltools_0.3.6 assertthat_0.2.0 colorspace_1.3-2
[37] labeling_0.3 utf8_1.1.4 stringi_1.2.4
[40] lazyeval_0.2.1 munsell_0.5.0 crayon_1.3.4