Stat 435 Lecture Notes 5b

Xiongzhi Chen

Washington State University

Different Error Measures

Hypothesis testing

Null hypothesis \(H_0\): natural status
Alternative hypothesis \(H_1\): status under intervention; complement to \(H_0\)
Test statistic \(T\), rejection region \(\mathcal{T}\) and rejection rule \(\mathcal{R}\): reject \(H_0\) if \(T \in \mathcal{T}\)
Type I error \(\alpha\): probability of rejecting \(H_0\) when it is true
Type II error \(\gamma\): probability of not rejecting \(H_0\) when \(H_0\) is false
Power: \(1- \gamma\)

Example 1: One Hypothesis

Linear model: \(E(Y) = \beta_0 + \beta_1 X\)
\(H_0: \beta_1 =0\) versus \(H_1:\beta_1 \ne 0\)
Pick type I error level \(\alpha \in (0,1)\)
Test statistic \(T\); reject \(H_0\) if \(\vert T \vert > c_{\alpha}\)
How to interpret \(\alpha\)?

Example 2: Many Hypotheses

Linear model: \(E(Y) = \beta_0 + \beta_1 X + \cdots + \beta_{m} X_{m}\)
\(H_{i0}: \beta_i =0\) versus \(H_{i1}:\beta_i \ne 0\), \(i=1,\ldots,m\)
Test all \(H_{i0},i=1,\ldots,m\) simultaneously
If each \(H_{i0}, i=1,\ldots,m\) is tested individually at type I error level \(\alpha\), what will happen to the number of rejected true null hypotheses?

Family-wise error rate (FWER)

\(H_{i0}: \beta_i =0\) versus \(H_{i1}:\beta_i \ne 0\), \(i=1,\ldots,m\)
Rejection of a true null hypothesis is called “false rejection’’
\(V\): number of false rejections, i.e., rejected true \(H_{i0}\)’s
Family-wise error rate (FWER): \[\Pr(V \ge 1) = 1 - \Pr(V=0)\]
Control FWER: \(\Pr(V \ge 1) \le \alpha\)

Family-wise error rate (FWER)

Widely used, e.g., by FDA
Good when there are only a few hypotheses to test simultaneously
Too stringent when there are many hypotheses to test simultaneously, and hence may suffer loss in power
What about controlling “k-FWER”, i.e., \[\Pr(V \ge k) \le \alpha ?\]

False Discovery Rate

False discovery rate

Allow false rejections, and hence much less stringent than FWER
Modern standard on testing many hypotheses simultaneously
Scalable to many, many hypotheses simultaneously
- GWAS study with a few millions of hypotheses to test simultaneously
- Gene expression study with a few thousand hypotheses to test simultaneously
A standard criterion in model/variable selection

Classification table

\(H_{i0}: \beta_i =0\) versus \(H_{i1}:\beta_i \ne 0\), \(i=1,\ldots,m\)

False discovery rate (FDR)

\(\mathcal{R}\): decision rule
\(V\): number of false rejections
\(R\): number of rejections
False discovery proportion: \(\mathrm{FDP}(\mathcal{R}) = V/\max\{R,1\}\)
False discovery rate: \[\mathrm{FDR}(\mathcal{R}) = E\left[\frac{V}{\max\{R,1\}}\right]\]

FDR: Example

\(m=5\) hypothesis \(H_{i0}: \beta_i =0, i=1,2,3,4,5\)
\(H_{i0},i=1,2,3\) are true nulls
Decision rule \(\mathcal{R}\) rejects \(H_{i1}, H_{i2},H_{i4}, H_{i5}\)
What is the false discovery proportion?

Control false discovery rate

Computing exact false discovery rate can be quite difficult
Control FDR at a nominal level:
- Pick a nominal level \(\alpha \in (0,1)\)
- Find a decision rule \(\mathcal{R}\), such that \[ \mathrm{FDR}(\mathcal{R}) \le \alpha \]
Find such a decision rule can be hard in general

Benjamini-Hochberg procedure

Given: \(m\) null hypotheses \(H_{i0}\)
Given: \(m\) p-values; \(p_i\) for testing \(H_{i0}\)
Benjamini-Hochberg (BH) procedure
- Order p-values into \(p_{(1)} \le p_{(2)} \le \cdots \le p_{(m)}\)
- Set \(r = \max \left\{ k \in \{1,\ldots,m \}: p_{(i)} \le i \alpha m^{-1} \right\}\)
- Rejection rule:
  - If \(r\) is defined, reject the \(r\) \(H_{i0}\)’s corresponding to \(p_{(1)},\ldots,p_{(r)}\)
  - If \(r\) is not defined, do not reject any \(H_{i0}\)
Under some conditions, FDR of BH procedure is upper bounded by \(\alpha\)

BH procedure: example

\(5\) hypotheses \(H_{10}, H_{20},H_{30},H_{40},H_{50}\)
\(p_1 = 0.03\), \(p_2 = 0.1\), \(p_3 = 0.02\), \(p_4 = 0.05\), \(p_5 = 0.02\)
Implement BH procedure at nominal FDR level \(\alpha =0.05\)

Post-selection inference

Linear model and LASSO

Model: \[Y=\beta_0+\beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_p X_p + \varepsilon\]
The LASSO estimate \(\hat{\boldsymbol{\beta}}^L_{\lambda}=(\hat{\beta}_1,\ldots,\hat{\beta}_p)\) is the \({\boldsymbol{\beta}}=({\beta}_1,\ldots,{\beta}_p)\) that minimizes \[ L_1(\beta_0,\boldsymbol{\beta},\lambda)= \frac{1}{2}\sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{i=1}^p\vert \beta_i \vert \]
The optimal value \(\lambda^{\ast}\) of the tuning parameter \(\lambda\) is often determined by \(k\)-fold cross-validation

Linear Ridge regression

Model: \[Y=\beta_0+\beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_p X_p + \varepsilon\]
The LASSO estimate \(\hat{\boldsymbol{\beta}}^R_{\lambda}=(\hat{\beta}_1,\ldots,\hat{\beta}_p)\) is the \({\boldsymbol{\beta}}=({\beta}_1,\ldots,{\beta}_p)\) that minimizes \[ L_1(\beta_0,\boldsymbol{\beta},\lambda)= \frac{1}{2}\sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{i=1}^p \beta_i^2 \]
The optimal value \(\lambda^{\ast}\) of the tuning parameter \(\lambda\) is often determined by \(k\)-fold cross-validation

Penalized logistic regression

LASSO logistic regression when some \(\beta_{i}\)’s are zero: \[ \hat{\beta}=\left( \hat{\beta}_{0},\hat{\beta}_{1},\ldots,\hat{\beta} _{m}\right) \in\operatorname*{argmin}_{\beta}\left[ -\log L\left( \beta\right) +\lambda\sum_{j=1}^{m}\left\vert \beta_{j}\right\vert \right] \]
Ridge logistic regression: \[ \hat{\beta}=\left( \hat{\beta}_{0},\hat{\beta}_{1},\ldots,\hat{\beta} _{m}\right) \in\operatorname*{argmin}_{\beta}\left[ - \log L\left( \beta\right) +\lambda\sum_{j=1}^{m}\left\vert \beta_{j}\right\vert ^{2}\right] \]
The optimal value \(\lambda^{\ast}\) of the tuning parameter \(\lambda\) is often determined by \(k\)-fold cross-validation

Note: \(\operatorname*{argmin}_{\beta}\) refers to optimal \(\beta^{\ast}\) which minimizes the corresponding objective function

Post-selection inference

Post-selection inference often aims at controlling false discovery rate (FDR)
Benjamini-Hochberg procedure is used to control FDR
Bias correction method or knock-off method can be used

Note: Please see practice files

License and session Information

License

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] knitr_1.21

loaded via a namespace (and not attached):
 [1] compiler_3.5.0  magrittr_1.5    tools_3.5.0    
 [4] htmltools_0.3.6 revealjs_0.9    yaml_2.2.0     
 [7] Rcpp_1.0.12     stringi_1.2.4   rmarkdown_1.11 
[10] stringr_1.3.1   xfun_0.4        digest_0.6.18  
[13] evaluate_0.13