Stat 435 Lecture Notes 4

Xiongzhi Chen

Washington State University

Bootstrap: motivation

Overview

  • The bootsrtap is mainly used to estimate and quantify the uncertainty associated with a given estimate or statistical learning methed

  • For example, it can be used to estimate the standard error of an estimate (such as an estimated coefficient in a regression model)

  • The bootstrap may not work well when sample size is small or when sample comes from a relatively small region of the distribution of an unknown data generating process

Illustration I: problem

Problem formulation:

  • suppose we wish to invest a fixed sum of money into two financial assets that yield (random) returns of \(X\) and \(Y\), respectively
  • we will invest a fraction \(\alpha\) of our money in \(X\), and the rest \(1-\alpha\) in \(Y\)
  • we need to choose \(\alpha\) that minimizes the risk, or variance, of our investment.

Namely, we need to find \(\alpha\) that minimizes \[\textrm{Var}(\alpha X + (1-\alpha)Y)\]

Illustration I: solution

  • By calculus, we know that \[ \alpha=\frac{\sigma_{Y}^{2}-\sigma_{XY}}{\sigma_{X}^{2}+\sigma_{Y}^{2}-2\sigma_{XY}} \] minimizes \[\textrm{Var}(\alpha X + (1-\alpha)Y),\] where \(\sigma_{X}^{2}=\textrm{Var}(X)\), \(\sigma_{Y}^{2}=\textrm{Var}(Y)\) and \(\sigma_{XY}=\textrm{Cov}(X,Y)\)

  • However, in reality, the quantities \(\sigma_{X}^{2}\), \(\sigma_{Y}^{2}\) and \(\sigma_{XY}\) are unknown, and need to be estimated

Illustration I: estimate

  • With estimates \(\hat{\sigma}_{X}^{2}\), \(\hat{\sigma}_{Y}^{2}\) and \(\hat{\sigma}_{XY}\) for \(\sigma_{X}^{2}\), \(\sigma_{Y}^{2}\) and \(\sigma_{XY}\), respectively, we have the plug-in estimate \[\hat{\alpha}=\frac{\hat{\sigma}_{Y}^{2}-\hat{\sigma}_{XY}}{\hat{\sigma}_{X}^{2}+\hat{\sigma}_{Y}^{2}-2\hat{\sigma}_{XY}}\] for the optimal but unknown solution \[ \alpha=\frac{\sigma_{Y}^{2}-\sigma_{XY}}{\sigma_{X}^{2}+\sigma_{Y}^{2}-2\sigma_{XY}} \]
  • How accurate is \(\hat{\alpha}\)? Can we estimate the standard error of \(\hat{\alpha}\)?

Illustration I: estimate

  • If \(\hat{\sigma}_{X}^{2}\), \(\hat{\sigma}_{Y}^{2}\) and \(\hat{\sigma}_{XY}\) are accurate, then so should be \(\hat{\alpha}\)

  • How to assess the accuracy of \(\hat{\alpha}\) (via the accuracy of \(\hat{\sigma}_{X}^{2}\), \(\hat{\sigma}_{Y}^{2}\) and \(\hat{\sigma}_{XY}\)) if we have only a sample of size \(n\) at hand?

  • Mini discussion on the question above: Case 1 “\(n\) small”, Case 2 “\(n\) moderate”, and case 3 “\(n\) large”

Illustration I: simulated samples

If we know the population distribution, we can simulate samples:

Illustration I: simulated samples

  • Suppose we simulate \(B=1000\) independent samples for \((X,Y)\) (if we knew the truth), we will have \(B\) estimates \(\hat{\alpha}_{j},j=1,\ldots,B\) of \(\alpha\)

  • The sample mean \(\bar{\alpha}=\frac{1}{B}\sum_{j=1}^{B}\hat{\alpha}_{j}\) (of \(\hat{\alpha}_{j}\)’s) should be close to \(\alpha\)

  • The sample standard deviation \[s\left( \hat{\alpha}\right) =\sqrt{\frac{1}{B-1}\sum_{j=1}^{B}\left( \hat{\alpha}_{j}-\bar{\alpha}\right) ^{2}}\] (of of \(\hat{\alpha}_{j}\)’s) should be close to \(\sigma_{\hat{\alpha}}=\sqrt{\textrm{Var}(\hat{\alpha)}}\)

Illustration I: truth and estimate

  • Truth: \(\sigma_{X}^{2}=1\), \(\sigma_{Y}^{2}=1.25\), \(\sigma_{XY}=0.5\) and \(\alpha=0.6\)

  • Estimates based on \(B=1000\) simulated, independent samples: \(\bar{\alpha}=0.5996\) and \(s\left( \hat{\alpha}\right)=0.083\)

  • Interpretation: for a random sample from the population, we would expect \(\hat{\alpha}\) to differ from \(\alpha\) by approximately \(0.08\) on average

Note: is the “1 standard deviation” rule sensible?

Bootstrap: definition and applications

Simulation and double-dipping

  • Simulated from the truth: when we know a data generating process, we can simulate samples to estimate a statistic on the process. However, if we know the truth, why do we need to estimate the statistic?
  • Simulated from the estimate: with a sample from a data generating process, we can estimate the process, use the estimated process to generate samples, and use the generated samples to estimate a statistic
  • Resampling from the sample: sample randomly from a sample from a data generating process, regard the sampled observations as a new data set, and use them to estimate a statistic

Bootstrap: definition

In order to assess the distributional properties of an estimate of a statistic, the bootstrap

  • takes a subset of a given data set as if it is a set of new observations independent of the given data set
  • uses the subset to obtain an estimate of the statistic
  • does so repeatedly and independently using different subests of the given data set
  • take the empirical distribution of estimates obtained from these subsets as an estimate of the distribution of the estimate of the statistic

Bootstrap: procedure

  • Given a sample of size \(n\), let \(\hat{\alpha}\) be an estimate of a statistic \(\alpha\) obtained from the sample
  • Sample randomly with replacement from the sample to obtain \(n\) observations, and do this independently \(B\) times to obtain \(B\) bootstrap samples \(S_{j},j=1,\ldots,B\)
  • Let \(\hat{\alpha}_{j}\) be the estimate of \(\alpha\) obtained from \(S_{j}\). Then the empirical distribution \(G\) of \(\hat{\alpha}_{j},j=1,\ldots,B\) is used as the (true) distribution of \(\hat{\alpha}\), and statistics about \(\hat{\alpha}\) are obtained from \(G\)

Bootstrap: graphical illustration

Bootstrap: statistics

  • The (bootstrap) estimated mean of \(\hat{\alpha}\) is \(\bar{\alpha}=\frac{1}{B}\sum_{j=1}^{B}\hat{\alpha}_{j}\)
  • The (bootstrap) estimated variance of \(\hat{\alpha}\) is \[ \textrm{SE}^2\left( \hat{\alpha}\right) = {(B-1)}^{-1}\sum_{j=1}^{B}\left( \hat{\alpha}_{j}-\bar{\alpha}\right)^{2} \]
  • For \(\alpha \in (0,1)\), the (bootstrap) \((1-\alpha)\times 100\) percent confidence interval for \(\hat{\alpha}\) is \((c_L,c_U)\), where \(c_L\) is the \(\{0.5\alpha\times 100\}\)th percentile of \(G\), and \(c_L\) is the \(\{(100-0.5\alpha)\times 100\}\)th percentile of \(G\)

Illustration of bootstrap

Bootstrap applied to a sample; \(\textrm{SE}^2\left( \hat{\alpha}\right)=0.087\): illusion or excellence?

Boostrapping linear regression

Linear regression

  • Model: \(Y=\beta_0+\beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_p X_p + \varepsilon\) with \(E(\varepsilon)=0\) and \(\textrm{Var}(\varepsilon)=\sigma^2\)

  • Observations: \((y_i,x_{1i},x_{2i},\ldots,x_{pi}),i=1,\ldots,n\), where \(x_{ji}\) is the \(i\)th observation for \(X_j\)

  • Estimate: \(\hat{y}=\hat{\beta}_0+\hat{\beta}_1 X_1 + \hat{\beta}_2 X_2 + \ldots + \hat{\beta}_p X_p\)

  • Fit: \(\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1 x_{1i} + \hat{\beta}_2 x_{2i} + \ldots + \hat{\beta}_p x_{pi}+\varepsilon_i\)

  • Residuals: \(e_i = y_i - \hat{y}_i\)

Bootstrapping from sample

Set \({\boldsymbol{\beta}}=({\beta}_0,{\beta}_1,\ldots,{\beta}_p)\) and \(\hat{\boldsymbol{\beta}}=(\hat{\beta}_0,\hat{\beta}_1,\ldots,\hat{\beta}_p)\)

  • Sample from data generating process: \[S=\{\mathbf{z}_i=(y_i,x_{1i},x_{2i},\ldots,x_{pi}),i=1,\ldots,n\}\]
  • Sample with replacement \(n\) observations from \(S\) and repeat this independently to obtain \(B\) subsets \(S_j,j=1,\ldots,B\)
  • Obtain \(\hat{\boldsymbol{\beta}}_j\) from \(S_j\) for each \(j=1,\ldots,B\)
  • Use the empirical distribution of \(\hat{\boldsymbol{\beta}}_j,j=1,\ldots,B\) as the distribution of \(\hat{\boldsymbol{\beta}}\)

Bootstrapping residuals

  • Residuals: \(R=\{e_i=y_i-\hat{y}_i,i=1,\ldots,n\}\)
  • Sample with replacement \(n\) observations from \(R\) to obtain \(B\) sets of residuals \(R_j=\{e_i^{(j)},i=1,\ldots,n\}\)
  • For each \(j\), set \(y_i^{(j)}=\hat{y}_i+e_i^{(j)}\) and fit the model with observations \[S_j=\{\mathbf{z}_i^{(j)}=(y_i^{(j)},x_{1i},x_{2i},\ldots,x_{pi}),i=1,\ldots,n\}\] and obtain estimate \(\hat{\boldsymbol{\beta}}_j\)
  • Use the empirical distribution of \(\hat{\boldsymbol{\beta}}_j,j=1,\ldots,B\) as the distribution of \(\hat{\boldsymbol{\beta}}\)

Bootstrapping samples or residuals

  • Asymptotically (and under some conditions), bootstrapping samples and bootstrapping residuals are equivalent

  • Bootstrapping samples is less sensitive to model misspecification

  • Bootstrapping samples may be less sensitive to the assumptions concerning independence or exchangeability of the error terms

Boostrap: failures

Boostrap failures

Bootstrap can fail

  • when sample size is too small
  • for estimating extremal statistics
  • when observations are dependent
  • for survey sampling

Note: the book “Bootstrap methods: a guide for practitioners and researchers” by Michael R. Chernick contains more information on this.

License and session Information

License

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] knitr_1.21

loaded via a namespace (and not attached):
 [1] compiler_3.5.0  magrittr_1.5    tools_3.5.0    
 [4] htmltools_0.3.6 revealjs_0.9    yaml_2.2.0     
 [7] Rcpp_1.0.3      stringi_1.2.4   rmarkdown_1.11 
[10] stringr_1.3.1   xfun_0.4        digest_0.6.18  
[13] evaluate_0.12