
Human cancer microarray data:
The data matrix has dimension \(6830 \times 64\), each of whose columns is an observation for all features.
We will pick 3 caner types “BREAST”, “LEUKEMIA” and “COLON”, and randomly select the same set of \(50\) genes for each cancer type
> library(ElemStatLearn) # library containing data
> data(nci); n0=dim(nci)[2]; p0=dim(nci)[1] #get dimensions
> set.seed(123)
> rSel = sample(1:p0, size=50, replace = FALSE)
> chk = colnames(nci) %in% c("BREAST", "LEUKEMIA", "COLON")
> cSel = which(chk ==TRUE)
> ncia = nci[rSel,cSel]
> colnames(ncia) = colnames(nci)[cSel]
colnames(nci): each column name is the cancer type for an observationchk: check if a cancer type is “BREAST”, “LEUKEMIA” or “COLON”nci, and take samples (i.e., observations for features) from columns of nci> # transposed subset data as `tmp`
> tmp = data.frame(t(ncia)); tmp$Class=colnames(ncia)
> rownames(tmp)=NULL; dim(tmp)
[1] 20 51
> table(tmp$Class)
BREAST COLON LEUKEMIA
7 7 6
> tmp[1:3,(ncol(tmp)-3):ncol(tmp)]
X48 X49 X50 Class
1 -0.415 -0.735 1.935 BREAST
2 -0.040 -0.780 0.060 BREAST
3 0.750 -0.320 1.800 BREAST
tmp is the cancer type of an observation, and the rest in each such row is an observation on the featuresIn terms of class memberships, observations overlap each other much?

Each observation on the \(p=50\) features (is in \(\mathbb{R}^{50}\) and) is a 50-dimensional vector.
LDA with 50 feature variables, 3 classes, and sample size 20:
> library(MASS)
> lda.fit = lda(Class~.,data=tmp)
Warning in lda.default(x, grouping, ...): variables are
collinear
> lda.pred = predict(lda.fit,tmp)
> TrueClassLabel=tmp$Class
> LDAEstimtedClassLabel=lda.pred$class
> table(LDAEstimtedClassLabel,TrueClassLabel)
TrueClassLabel
LDAEstimtedClassLabel BREAST COLON LEUKEMIA
BREAST 6 0 0
COLON 1 7 0
LEUKEMIA 0 0 6
Classwise error rates: “BREAST”, “LEUKEMIA” and “COLON”
[1] 0.1428571 0.0000000 0.0000000
The component covariance matrices may not be equal since gene profiles for different cancer types may have different dependence structures. So, let us apply QDA:
> library(MASS)
> qda.fit = qda(Class~.,data=tmp)
> # Error in qda.default(x, grouping, ...) :
> # some group is too small for 'qda'
Caution: when \(p >n\), \(\hat{\Sigma}\) and all \(\hat{\Sigma}_k\) are all singular, not being valid covariance matrices or forcing some feature variables to be linearly dependent!
When \(p >n\), we can use the regularized estimate proposed by “Friedman, J.H. (1989): Regularized Discriminant Analysis” and implemented by R library klaR.
The regularized estimate of \(\Sigma_k\) is \[ \hat{\Sigma}_{k,\lambda,\gamma}= (1-\gamma) \hat{\Sigma}_{k,\lambda} + \gamma p^{-1} \text{tr}(\hat{\Sigma}_{k,\lambda}) \mathbf{I}, \gamma \in [0,1] \] with \[\hat{\Sigma}_{k,\lambda}=(1-\lambda) \hat{\Sigma}_k + \lambda \hat{\Sigma}, \lambda \in [0,1]\] where \(\hat{\Sigma}\) is a pooled covariance matrix, \(\hat{\Sigma}_k\) is a suitable estimate of \(\Sigma_k\), and \(\text{tr}(A)\) is the trace of the square matrix \(A\), i.e., the sum of the diagonal entries of \(A\)
klaRWe need R library klaR, and command rda{klaR} for which the basic syntax is
rda(formula, data, gamma = 0.05, lambda = 0.2,
crossval = FALSE, ...)
formula: the same as in lda or qdagamma, lambda: same as \(\gamma\) and \(\lambda\) in \(\hat{\Sigma}_{k,\lambda,\gamma}\). If unspecified, either or both are determined by minimizing the estimated error ratecrossval: logical. If TRUE, then the error rate is estimated by Cross-Validation; otherwise, by drawing several training- and test-samples.klaRrda{klaR} returns a list of class rda containing the following components:
regularization: vector containing the two regularization parameters (gamma, lambda).classes and prior: the names of the classes, and the prior probabilities for the classes.error.rate: apparent error rate (if computation was not suppressed), and, if any optimization took place, the final (cross-validated or bootstrapped) error rate estimate as well.klaRrda{klaR} returns a list of class rda containing the following components:
means, covariances and covpooled: group means, array of group covariances, and pooled covariance.converged: (Logical) indicator of convergence (only for Nelder-Mead).iter: number of iterations actually performed (only for Nelder-Mead).klaRRDA on the subset data (\(p=50\) and \(n=20\)):
> library(klaR)
> rda.fit = rda(Class~.,data=tmp, gamma = 0.05, lambda = 0.2)
> rda.pred = predict(rda.fit, tmp) # classify obs in `tmp`
> TrueClassLabel=tmp$Class
> rDAEstimtedClassLabel=rda.pred$class
> table(rDAEstimtedClassLabel,TrueClassLabel)
TrueClassLabel
rDAEstimtedClassLabel BREAST COLON LEUKEMIA
BREAST 7 0 0
COLON 0 7 0
LEUKEMIA 0 0 6
predict acts on an object of class rda similarly as MASS::predict on an object of class lda or qdagamma or lambda via cross-validation due to n=20> # extract estimated covariance matrices
> hatSigma1 = rda.fit$covariances[,,1]
> hatSigma2 = rda.fit$covariances[,,2]
> hatSigma3 = rda.fit$covariances[,,3]
rda(...)$covariances: a \(p \times p \times K\) array. To obtain the estimated covariance matrix, \(\hat{\Sigma}_{k,\lambda,\gamma}\), of observations in Class \(k\), use rda(...)$covariances[,,k]. Note that class (and label) ordering is determined by rda using rules of R.> # convert estimated covariance matrices for plotting
> library(reshape2)
> melted_hatSigma1 = melt(hatSigma1)
> melted_hatSigma2 = melt(hatSigma2)
> melted_hatSigma3 = melt(hatSigma3)
> EstSigmaAll=rbind(melted_hatSigma1,melted_hatSigma2,
+ melted_hatSigma3)
> EstSigmaAll$Cancer=rep(c("BREAST","COLON","LEUKEMIA"),
+ each=nrow(melted_hatSigma1))
> EstSigmaAll$Cancer=factor(EstSigmaAll$Cancer)
melt(data,value.name ="value",...): convert data into a molten data.frame, and by default store entries of data as "value" in the molten data.frame, as if values in data are values of a function that are obtained on a 2D grid on two variables> library(ggplot2)
> ggplot(data=EstSigmaAll, aes(x=Var1, y=Var2, fill=value)) +
+ geom_tile()+scale_fill_gradient2(low="blue", high="red")+
+ facet_grid(~Cancer)+xlab("")+ylab("")+
+ ggtitle("Estimated covariance matrices")+
+ theme(plot.title = element_text(hjust = 0.5))
ggplot(...,aes(x=, y=, fill=value)) should be used with xlab("")+ylab("") since no x-axis or y-axis label is neededMuch less dependence among genes for “COLON” cancer than for “BREAST” or “LEUKEMIA” cancer

Data set Smarket{ISLR} contains observations for S&P 500 stock index over 1,250 days, from the beginning of 2001 until the end of 2005, such as
Lag1 through Lag5Direction (whether market was Up or Down on this date)> library(ISLR); dim(Smarket)
[1] 1250 9
> Smarket[1,]
Year Lag1 Lag2 Lag3 Lag4 Lag5 Volume Today Direction
1 2001 0.381 -0.192 -2.624 -1.055 5.01 1.1913 0.959 Up
> levels(Smarket$Direction)
[1] "Down" "Up"
Target: predict market Direction using Lag1 and Lag2
Model: 2 features \(X_1=\)Lag1 and \(X_2=\)Lag2; 2 component bivariate Gaussian densities \(f_1 \sim \text{Gaussian}(\mu_1,\Sigma_1)\) and \(f_2 \sim \text{Gaussian}(\mu_2,\Sigma_2)\); 2 mixing proportions (as priors) \(\pi_1\) and \(\pi_2\); 2 classes Down and Up
> train=(Smarket$Year<2005); Smarket.2005=Smarket[!train,]
> Direction.2005=Smarket$Direction[!train]
> library(MASS)
> lda.fit=lda(Direction~Lag1+Lag2,data=Smarket,subset=train)
subset=train in ldaEstimated prior (i.e., mixing) probabilities and mean vectors:
Prior probabilities of groups:
Down Up
0.491984 0.508016
Group means:
Lag1 Lag2
Down 0.04279022 0.03389409
Up -0.03954635 -0.03132544
Direction=Down; Class 2 is Direction=UpRecall the estimated linear discriminant function as \[\hat{\delta}_{k}(x)=x^T \underbrace{\hat{\Sigma}^{-1}\hat{\mu}_k}_{\hat{c}_k} -\frac{1}{2} \hat{\mu}_k^T \underbrace{\Sigma^{-1}\hat{\mu}_k}_{\hat{c}_k} + \log \hat{\pi}_k, k \in \mathcal{G}\]
Coefficients of linear discriminants:
LD1
Lag1 -0.6420190
Lag2 -0.5135293
Direction=Up and Up \(\mapsto 1\) (and Class 1 as Direction=Down and Down\(\mapsto 0\))Define estimated linear discriminant as \(\hat{d}_{k}(x)=x^T \hat{c}_k,k \in \mathcal{G}\)

Classification on test set Smarket.2005:
> dim(Smarket.2005)
[1] 252 9
> lda.pred=predict(lda.fit, Smarket.2005)
> names(lda.pred)
[1] "class" "posterior" "x"
> lda.class=lda.pred$class
> table(lda.class,Direction.2005)
Direction.2005
lda.class Down Up
Down 35 35
Up 76 106
> mean(lda.class==Direction.2005)
[1] 0.5595238
lda.pred$x: vector of estimated linear discriminant for each observationDown and Up respectively[1] 0.6846847 0.2482270
> qda.fit=qda(Direction~Lag1+Lag2,data=Smarket,subset=train)
> qda.fit
Call:
qda(Direction ~ Lag1 + Lag2, data = Smarket, subset = train)
Prior probabilities of groups:
Down Up
0.491984 0.508016
Group means:
Lag1 Lag2
Down 0.04279022 0.03389409
Up -0.03954635 -0.03132544
qda does not produce “coefficients of linear discriminants” since it uses a quadratic decision boundaryClassification on test set Smarket.2005:
> qda.class=predict(qda.fit,Smarket.2005)$class
> table(qda.class,Direction.2005)
Direction.2005
qda.class Down Up
Down 30 20
Up 81 121
> mean(qda.class==Direction.2005)
[1] 0.5992063
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] knitr_1.21
loaded via a namespace (and not attached):
[1] compiler_3.5.0 magrittr_1.5 tools_3.5.0
[4] htmltools_0.3.6 revealjs_0.9 yaml_2.2.0
[7] Rcpp_1.0.0 stringi_1.2.4 rmarkdown_1.11
[10] highr_0.7 stringr_1.3.1 xfun_0.4
[13] digest_0.6.18 evaluate_0.12