Stat 437 Lecture Notes 5b

Xiongzhi Chen

Washington State University

2D density estimation and visualization, and plotting bivariate functions: software implementation

R software and commands

R commands and libraries needed:

R library mvtnorm and command rmvnorm{mvtnorm} to generate multivariate Gaussian (i.e., normal) observations
R library MASS and command kde2d{MASS} for 2D density estimation
R built-in command persp to visualize 2D density function
R library ggplot2 and command stat_density_2d{ggplot2} for contours of 2D density
R library plot3D and command persp3D{plot3D} for 3D plots

Command `rmvnorm`

rmvnorm{mvtnorm} is a random number generator for the multivariate normal distribution with mean vector mean and covariance matrix sigma. Its basic syntax is

rmvnorm(n, mean = rep(0, nrow(sigma)),
        sigma = diag(length(mean)))

n: number of observations.
mean: mean vector; default is rep(0, nrow(sigma)).
sigma: covariance matrix; default is diag(length(mean)).

Command `kde2d{MASS}`

kde2d{MASS} implements two-dimensional kernel density estimation with an axis-aligned bivariate normal kernel, evaluated on a square grid. Its basic syntax is

kde2d(x, y, h, n = 25, lims = c(range(x), range(y)))

x and y: x- and y-coordinate of data, respectively.
h: vector of bandwidths for x and y directions. It defaults to normal reference bandwidth. A scalar value will be taken to apply to both directions.
n: number of grid points in each direction. It can be scalar or a length-2 integer vector c(n[1],n[2]).

Command `kde2d{MASS}`

Basic syntax:

kde2d(x, y, h, n = 25, lims = c(range(x), range(y)))

lims: limits of the rectangle covered by the grid as c(xl, xu, yl, yu)

kde2d{MASS} returns a list of three components:

x and y: x and y coordinates of the grid points, as vectors of length n.
z: an n[1] by n[2] matrix of the estimated density; z’s rows correspond to the value of x, and z’s columns to the value of y.

Command `persp`

persp{graphics} draws perspective plots of a surface over the x-y plane. persp is a generic function. Its basic syntax is

persp(z, xlab = NULL, ylab = NULL, zlab = NULL,
      main = NULL, sub = NULL,
      theta = 0, phi = 15, r = sqrt(3), d = 1,
      col = "white",ticktype = "simple")

z: a matrix containing the values to be plotted.
xlab, ylab, zlab: titles for the axes. These must be character strings; expressions are not accepted.
main, sub: main, sub title, as for axes title.

Command `persp`

Basic syntax:

persp(z, xlab = NULL, ylab = NULL, zlab = NULL,
      main = NULL, sub = NULL,
      theta = 0, phi = 15, r = sqrt(3), d = 1,
      col = "white",ticktype = "simple")

theta, phi: angles defining the viewing direction. theta gives the azimuthal direction, and phi the colatitude.
r: distance of the eyepoint from the centre of the plotting box.
d: a value which can be used to vary the strength of the perspective transformation. Values of d greater than 1 will lessen the perspective effect, and values less 1 will exaggerate it.

Command `persp`

Basic syntax:

persp(z, xlab = NULL, ylab = NULL, zlab = NULL,
      main = NULL, sub = NULL,
      theta = 0, phi = 15, r = sqrt(3), d = 1,
      col = "white",ticktype = "simple")

col: the color(s) of the surface facets. Transparent colours are ignored. This is recycled to the (nx-1)(ny-1) facets.
ticktype: a character; its value “simple” draws just an arrow parallel to the axis to indicate direction of increase, whereas its value “detailed” draws normal ticks as per 2D plots.

`stat_density_2d{ggplot2}`

stat_density_2d{ggplot2} draws probability contours for a 2D density. Its basic syntax is

library(ggplot2)
ggplot(data, aes(x,y))+stat_density_2d(aes(fill = ..level..),
         geom="polygon", colour="white")

data, x, y: data is a “data.frame” or matrix that contains x, y coordinate values
level: heights of contours, i.e., values of a density
stat_density_2d(aes(fill = ..level..): fill in plot with color gradient based on level
geom: curve type to be used to draw contours
colour: color of contours

Command `persp3D{plot3D}`

persp3D{plot3D} extends R’s persp function. Its basic syntax is

persp3D (x = seq(0, 1, length.out = nrow(z)), 
       y = seq(0, 1, length.out = ncol(z)), z, 
       contour=FALSE, phi = 40, theta = 40)

z: matrix (2-D) containing the values to be plotted as a persp plot.
x, y: vectors or matrices with x and y values. If they are vectors, then x should be of length equal to nrow(z) and y should be of length equal to ncol(z). If they are matrices, x and y should have the same dimension as z.

Command `persp3D{plot3D}`

Basic syntax:

persp3D (x = seq(0, 1, length.out = nrow(z)), 
       y = seq(0, 1, length.out = ncol(z)), z, 
       contour=FALSE, phi = 40, theta = 40)

theta, phi: angles defining the viewing direction. theta gives the azimuthal direction, and phi the colatitude, as in persp.
contour: logical; if it takes value TRUE, a contour will be plotted at the bottom.

Note: plot3D also has command hist3D

2D density estimation and visualization, and plotting bivariate functions: illustrations

Gaussian mixture: K=3,p=2

Gaussian mixture for $K=3$ classes and $p=2$ features with

Class 1, 2 and 3 mixing proportions: $\pi_1=0.4$, $\pi_2=0.3$, $\pi_3=0.3$
Class 1, 2 and 3 components: \[f_1(x)\sim \text{Gaussian}(\mu_1=(0,0)^T,\Sigma_1=\mathbf{I}),\] \[f_2(x)\sim \text{Gaussian}\left(\mu_2=(-1,-1)^T,\Sigma_2=\left(\begin{array}{cc} 1 & 0.7\\ 0.7 & 1 \end{array}\right)\right)\] and \[f_3(x)\sim \text{Gaussian}\left(\mu_3=(1,1)^T,\Sigma_3=0.25\mathbf{I}\right)\]
marginal: $f(x)= \sum_{k=1}^3 \pi_k f_k(x)$ (not Gaussian)

Generate data

Randomly generate $n=10^5$ observations:

> set.seed(2)
> n=100000; cpb=c(0.4,0.3,0.3)
> clbs = sample(1:3,n,replace = T, prob=cpb)
> n1=length(clbs[clbs==1]); n2=length(clbs[clbs==2]); 
> n3=length(clbs[clbs==3])
> library(mvtnorm)
> x1=rmvnorm(n1, mean=c(0,0), sigma=diag(2));
> Sg2= matrix(c(1,0.7,0.7,1),nrow=2);
> x2=rmvnorm(n2, mean=c(-1,-1), sigma=Sg2); 
> x3=rmvnorm(n3, mean=c(1,1), sigma=0.25*diag(2))

cpb: vector of mixing proportions
clbs= sample(1:3,n, replace=T, prob=cpb): class labels for observations in each class
x1, x2, x3: matrices whose rows are observations from Class 1, 2, 3, respectively

Generate data

Create data.frame for observations:

> proportion = c(rep(cpb[1],n1),rep(cpb[2],n2),rep(cpb[3],n3))
> component=c(rep(1,n1),rep(2,n2),rep(3,n3))
> observation = rbind(x1,x2,x3)
> colnames(observation)=c("X1","X2")
> df1=data.frame(cbind(observation,proportion,component))
> cpStg=c(expression(f[1]),expression(f[2]),expression(f[3]))
> df1$component = factor(df1$component,labels=cpStg)
> df1[1,]
         X1         X2 proportion component
1 0.5158212 -0.1061886        0.4      f[1]

cpStg: vector of math expressions $f_1,f_2,f_3$
X1, X2: feature variables

Density estimation

> library(MASS)
> # extract obs. from each class
> df1a=df1[df1$component=="f[1]",]
> df1b=df1[df1$component=="f[2]",]
> df1c=df1[df1$component=="f[3]",]
> # 2D density estimate
> fde1 = with(df1a, MASS::kde2d(X1, X2, n = 50),
+  lims=c(min(df1a$X1),max(df1a$X1),min(df1a$X2),max(df1a$X2)))
> fde2 = with(df1b, MASS::kde2d(X1, X2, n = 50),
+  lims=c(min(df1b$X1),max(df1b$X1),min(df1b$X2),max(df1b$X2)))
> fde3 = with(df1c, MASS::kde2d(X1, X2, n = 50),
+  lims=c(min(df1c$X1),max(df1c$X1),min(df1c$X2),max(df1c$X2)))

fde1, fde2, fde3: estimated 2D density for component $f_1,f_2,f_3$, respectively
2D density plot will be created using the z component of fde1, fde2, fde3, respectively

Create color scheme

> # Color palette (100 colors)
> col.pal = colorRampPalette(c("yellow", "red"))
> colors = col.pal(100)
> # centers of surface facets 
> nrz = ncz = 50
> z1.facet.center = (fde1$z[-1, -1] + fde1$z[-1, -ncz] +
+                   fde1$z[-nrz, -1] + fde1$z[-nrz, -ncz])/4
> # Range of colors
> z1.facet.range = cut(z1.facet.center, 100)
> # do the same for f2
> z2.facet.center = (fde2$z[-1, -1] + fde2$z[-1, -ncz] + 
+                   fde2$z[-nrz, -1] + fde2$z[-nrz, -ncz])/4
> z2.facet.range = cut(z2.facet.center, 100)
> # do the same for f3
> z3.facet.center = (fde3$z[-1, -1] + fde3$z[-1, -ncz] +
+                  fde3$z[-nrz, -1] + fde3$z[-nrz, -ncz])/4
> z3.facet.range = cut(z3.facet.center, 100)

*.facet.center: centers of surface facets for 2D density plot
*.facet.range: range of colors

Create density plot

> par(mfrow=c(2,2),mar = c(.6,0.5,.8,.5),oma=c(.3,.3,.3,.3))
> persp(fde1,phi=30,theta=20,d=5,xlab="X1",ylab="X2",
+       zlab="density",main=expression(f[1]),r = sqrt(.5),
+       ticktype="detailed",col=colors[z1.facet.range]) 
> persp(fde2,phi=30,theta=20,d=5,xlab="X1",ylab="X2",
+       zlab="density",main=expression(f[2]),r = sqrt(0.5),
+       ticktype="detailed",col=colors[z2.facet.range])
> persp(fde3,phi=30,theta=20,d=5,xlab="X1",ylab="X2",
+       zlab="density",main=expression(f[2]),r = sqrt(0.5),
+       ticktype="detailed",col=colors[z2.facet.range])

par(mfrow=...,mar=...,oma=...): set layout via mfrow, figure margins via mar, and outer margins oma
mar = c(bottom, left, top, right) and oma = c(bottom, left, top, right)
mfrow: vector c(nr, nc). Subsequent figures will be drawn in an nr-by-nc array on the device by rows

3D plot of density

Contours of 2D density

> library(ggplot2)
> ggplot(df1, aes(X1, X2))+theme_bw()+
+   facet_grid(~component,labeller = label_parsed)+
+   stat_density_2d(aes(fill = ..level..), 
+                   geom = "polygon", colour="white")+
+  theme(legend.position="top", legend.direction="horizontal")

labeller = label_parsed: show strip text (that in this example is designated by component as f[1], etc) as math expression ($f_1$, etc)

Contours of 2D density

Plot marginal density

> library(MASS)
> fde <- with(df1, MASS::kde2d(X1, X2, n = 50),
+   lims = c(min(df1$X1),max(df1$X1),min(df1$X2),max(df1$X2)))
> # create coloring scheme
> col.pal <- colorRampPalette(c("yellow", "red"))
> colors <- col.pal(100)
> nrz = ncz = 50
> z.facet.center <- (fde$z[-1, -1] + fde$z[-1, -ncz] +
+                 fde$z[-nrz, -1] + fde$z[-nrz, -ncz])/4
> z.facet.range <- cut(z.facet.center, 100)
> # create perspective plot
> par(mfrow=c(1,1),mar = c(.5,.5,.8,.5),oma=c(.3,.0,.3,.0))
> persp(fde, phi = 30, theta = 20, d = 5,xlab = "X1", 
+       ylab = "X2",zlab="density",main=expression(f), 
+   r=sqrt(.5),ticktype="detailed",col=colors[z.facet.range])

same coding strategy and commands as for component densities
almost the same codes as for component densities

Marginal density: 3D plot

Obtain contours of marginal

> library(ggplot2)
> ggplot(df1, aes(X1, X2))+theme_bw()+
+   stat_density_2d(aes(fill = ..level..),
+         geom = "polygon", colour="white")

same coding strategy and commands as for component densities
almost the same codes as for component densities

Marginal density: contours

A bivariate function

Function \[ z = f(x,y) = 10 \times \frac{\sin \left(\sqrt{x^2+y^2}\right)}{\sqrt{x^2+y^2}} \] on \[ D = \{(x,y): -10 \le x \le 10, -10 \le y \le10\} \]

Note: \[f(0,0):=\lim_{x \to 0,y \to 0}f(x,y)=10\] since $\lim_{t \to 0} (\sin t)/t=1$

3D plot of bivariate function

> library(plot3D)
> # range of values for x and y
> # range is discretized 
> y <- x <- seq(-10, 10, length=60)
> # create function f(x,y)
> f <- function(x,y) { r <- sqrt(x^2+y^2); 10 * sin(r)/r }
> # evaluate f on grid x \otimes y
> z <- outer(x, y, f)
> # create plot
> persp3D(x, y, z, theta = 30, phi = 30)

outer is a very useful command; its basic syntax is outer(X, Y, FUN = "*", ...)
z[i,j]$= f(\tilde{x}_i,\tilde{y}_j)$ for $(\tilde{x}_i,\tilde{y}_j)$ in the 60-by-60 grid

3D plot of bivariate function

Discriminant analysis: software implementation

R software and commands

R commands and libraries needed:

R library MASS, command lda{MASS} for LDA, and command qda{MASS} for QDA
R command predict{MASS} to classify observations using results from lda or qda
R library ROCR to create ROC curve

Command `lda{MASS}`

lda{MASS} implements linear discriminant analysis. Its basic syntax is

lda(formula, data, ..., subset, na.action)

formula: of the form groups ~ x1 + x2 + ..., i.e., the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.
data: data frame from which variables specified in formula are preferentially to be taken.
subset: an index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)

Command `lda{MASS}`

Basic syntax:

lda(formula, data, ..., subset, na.action)

na.action: a function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

lda{MASS} returns an object of class lda containing the following components:

prior: the prior probabilities used.
means: the group means.

Command `qda{MASS}`

qda{MASS} implements quadratic discriminant analysis. Its basic syntax is

qda(formula, data, ..., subset, na.action)

all arguments are the same as those for lda

qda{MASS} returns an object of class qda containing the same components as the return of lda{MASS}

Command `predict{MASS}`

predict{MASS} has 2 versions: MASS::predict.lda for classification using lda and MASS::predict.qda for classification using qda. Its basic syntax is

predict(object, newdata, prior = object$prior, dimen,
        method = c("plug-in", "predictive", "debiased"), ...)

object: object of class lda or qda
newdata: data frame of cases to be classified or, if object has a formula, a data frame with columns of the same names as the variables used. A vector will be interpreted as a row vector. If newdata is missing, an attempt will be made to retrieve the data used to fit the lda or qda object.

Command `predict{MASS}`

Basic syntax:

predict(object, newdata, prior = object$prior, dimen,
        method = c("plug-in", "predictive", "debiased"), ...)

prior: prior probabilities of the classes, which by default are the proportions in the training set or what was set in the call to lda.
method: this determines how the parameter estimation is handled. With “plug-in” (the default) the usual unbiased parameter estimates are used and assumed to be correct. With “debiased” an unbiased estimator of the log posterior probabilities is used, and with “predictive” the parameter estimates are integrated out using a vague prior.

Command `predict{MASS}`

Basic syntax:

predict(object, newdata, prior = object$prior, dimen,
        method = c("plug-in", "predictive", "debiased"), ...)

predict{MASS} returns a list with components:

class: the MAP (i.e., “Maximum a posteriori”) classification (a factor). Namely, an observation $x$ is assigned to class $g$ if $\Pr(Y=g|X=x)$ is the largest among $\Pr(Y=k|X=x),k\in \mathcal{G}$
posterior: posterior probabilities for classes.

Command `predict{MASS}`

Basic syntax:

predict(object, newdata, prior = object$prior, dimen,
        method = c("plug-in", "predictive", "debiased"), ...)

The returned posterior is an n-by-$K$ matrix, where $K$ is the number of classes and n is the number of observations in newdata; column $k$ of posterior contains the posterior probability of an observation belonging to Class $k$, where ordering of class labels is determined by R

Cmd `prediction{ROCR}`

prediction{ROCR} creates a prediction object. This function is used to transform the input data (which can be in vector, matrix, data.frame, or list form) into a standardized format. Its basic syntax is

prediction(predictions, labels)

predictions: a vector, matrix, data.frame, or list containing the predictions.
labels: a vector, matrix, data.frame, or list containing the true class labels; it must have the same dimensions as predictions.

Cmd `performance{ROCR}`

performance{ROCR} creates performance objects that contain several kinds of predictor evaluations. Its basic syntax is

performance(prediction.obj, measure, x.measure="cutoff")

prediction.obj: an object of class prediction.
measure: performance measure to use for the evaluation. A complete list of the performance measures that are available for measure and x.measure can be obtained from the help file for this command.

Cmd `performance{ROCR}`

Basic syntax:

performance(prediction.obj, measure, x.measure="cutoff")

x.measure: a second performance measure. If it is different from the default “cutoff”, then a two-dimensional curve, with x.measure taken to be the unit in direction of the x axis, and measure to be the unit in direction of the y axis, is created. This curve is parametrized with the cutoff.

Cmd `performance{ROCR}`

Here is how to call performance to create some standard evaluation plots:

ROC curves: measure=“tpr”, x.measure=“fpr”, i.e., via
```
performance(prediction.obj,measure="tpr",x.measure="fpr")
```
Sensitivity/specificity plots: measure=“sens”, x.measure=“spec”, i.e., via
```
performance(prediction.obj,measure="sens",x.measure="spec")
```

Command `plot{ROCR}`

plot{ROCR} executes plot.performance{ROCR} and plots all objects of class performance. Its basic syntax is

plot(x, avg="none", spread.estimate="none",
   colorize=FALSE, colorkey=colorize, ...)

x: an object of class performance
avg: if the performance object describes several curves (from cross-validation runs or bootstrap evaluations of one particular method), the curves from each of the runs can be averaged. Allowed values for avg are none (plot all curves separately), horizontal (horizontal averaging), vertical (vertical averaging), and threshold (threshold (=cutoff) averaging).

Command `plot{ROCR}`

Basic syntax:

plot(x, avg="none", spread.estimate="none",
   colorize=FALSE, colorkey=colorize, ...)

avg: (cont’d) Note that while threshold averaging is always feasible, vertical and horizontal averaging are not well-defined if the graph cannot be represented as a function x->y and y->x, respectively.
spread.estimate: when curve averaging is enabled, the variation around the average curve can be visualized as standard error bars (stderror), standard deviation bars (stddev), or by using box plots (boxplot).

Command `plot{ROCR}`

Basic syntax:

plot(x, avg="none", spread.estimate="none",
   colorize=FALSE, colorkey=colorize, ...)

spread.estimate: (cont’d) Note that the function plotCI, which is used internally by ROCR to draw error bars, might raise a warning if the spread of the curves at certain positions is 0.
colorize: logical; it determines whether the curve(s) should be colorized according to cutoff.
colorkey: if “TRUE”, a color key is drawn into the 4% border region of the plot. The color key visualizes the mapping from cutoffs to colors.

Integrated way to get ROC curve

Three commands combined into a user defined function rocplot to get ROC curve:

library(ROCR)
rocplot=function(pred, truth, ...){
  predob = prediction(pred, truth)
  perf = performance(predob, "tpr", "fpr")
  plot(perf,colorize=TRUE,...)}

Pipe to get AUC

To get the AUC of the MAP classification rule, use

library(dplyr)
ROCR::prediction(pred$posterior[,2], labels) %>%
  ROCR::performance(measure = "auc") %>% .@y.values

pred: object created by MASS::predict.lda or MASS::predict.qda.
labels: a vector, matrix, data.frame, or list containing the true class labels. It must have the same dimensions as pred$posterior[,2].

Caution: this command does not apply when there are more than 2 classes

Discriminant analysis: Example 1

Example 1: model

Gaussian mixture for $K=3$ classes and $p=2$ features with

Class 1, 2 and 3 mixing proportions: $\pi_1=0.4$, $\pi_2=0.3$, $\pi_3=0.3$
Class 1, 2 and 3 components: \[f_1(x)\sim \text{Gaussian}(\mu_1=(0,0)^T,\Sigma_1=\mathbf{I}),\] \[f_2(x)\sim \text{Gaussian}\left(\mu_2=(-1,-1)^T,\Sigma_2=\left(\begin{array}{cc} 1 & 0.7\\ 0.7 & 1 \end{array}\right)\right)\] and \[f_3(x)\sim Gaussian\left(\mu_3=(1,1)^T,\Sigma_3=0.25\mathbf{I}\right)\]
marginal: $f(x)= \sum_{k=1}^3 \pi_k f_k(x)$ (not Gaussian)

Example 1a: data

Generate $n=200$ observations from the Gaussian mixture:

> set.seed(2); n=200; cpb=c(0.4,0.3,0.3)
> clbs = sample(1:3,n,replace = T, prob=cpb)
> n1=length(clbs[clbs==1]); n2=length(clbs[clbs==2]); 
> n3=length(clbs[clbs==3])
> library(mvtnorm)
> x1=rmvnorm(n1, mean=c(0,0), sigma=diag(2));
> Sg2= matrix(c(1,0.7,0.7,1),nrow=2);
> x2=rmvnorm(n2, mean=c(-1,-1), sigma=Sg2); 
> x3=rmvnorm(n3, mean=c(1,1), sigma=0.25*diag(2))
> proportion=c(rep(cpb[1],n1),rep(cpb[2],n2),rep(cpb[3],n3))
> component=c(rep(1,n1),rep(2,n2),rep(3,n3))
> observation = rbind(x1,x2,x3)
> colnames(observation)=c("X1","X2")
> df2=data.frame(cbind(observation,proportion,component))
> cpStg=c(expression(f[1]),expression(f[2]),expression(f[3]))
> df2$component = factor(df2$component,labels=cpStg)

codes have been explained earlier

Example 1a: scatter plot

Scatter plot of the observations:

> library(ggplot2)
> ggplot(df2, aes(X1, X2, color=component,shape=component))+
+  geom_point()+theme_bw() +
+  theme(legend.position="top",legend.direction="horizontal")+
+  scale_color_discrete(labels=cpStg)+
+  scale_shape_discrete(labels =cpStg)

note scale_*_discrete for manually specifying color and shape
cpStg: vector of math expressions $f_1,f_2,f_3$

Example 1a: scatter plot

Much overlap among the 200 observations

Example 1a: LDA

> library(MASS)
> lda.fit = lda(component~X1+X2,data=df2)
> lda.pred = predict(lda.fit,df2)
> TrueClassLabel=df2$component
> LDAEstimtedClassLabel=lda.pred$class
> table(LDAEstimtedClassLabel,TrueClassLabel)
                     TrueClassLabel
LDAEstimtedClassLabel f[1] f[2] f[3]
                 f[1]   69   29   24
                 f[2]   14   32    0
                 f[3]   10    1   21

Large error rates: Class 1, 2, 3 error rates respectively are

[1] 0.2580645 0.4838710 0.5333333

LDA not suitable since component covariance matrices are different

Example 1b: LDA

LDA on $n=10^5$ observations from the mixture

> library(MASS)
> lda.fit = lda(component~X1+X2,data=df1)
> lda.pred = predict(lda.fit,df1)
> TrueClassLabel=df1$component
> LDAEstimtedClassLabel=lda.pred$class
> table(LDAEstimtedClassLabel,TrueClassLabel)
                     TrueClassLabel
LDAEstimtedClassLabel  f[1]  f[2]  f[3]
                 f[1] 25341  9397  4702
                 f[2]  7277 19776     0
                 f[3]  7220  1110 25177

Large error rates: Class 1, 2, 3 error rates respectively are

[1] 0.3638988 0.3469603 0.1573681

LDA not suitable; larger sample size does not necessarily help when model is mis-applied

Example 1c: QDA

QDA on the 200 observations:

> library(MASS)
> qda.fit = qda(component~X1+X2,data=df2)
> qda.pred = predict(qda.fit,df2)
> TrueClassLabel=df2$component
> QDAEstimtedClassLabel=qda.pred$class
> table(QDAEstimtedClassLabel,TrueClassLabel)
                     TrueClassLabel
QDAEstimtedClassLabel f[1] f[2] f[3]
                 f[1]   63   26    5
                 f[2]   15   33    0
                 f[3]   15    3   40

Large error rates: Class 1, 2, 3 error rates respectively are

[1] 0.3225806 0.4677419 0.1111111

QDA suitable but considerable overlap among observations

Example 1d

QDA on $n=10^5$ observations from the mixture:

> library(MASS)
> qda.fit = qda(component~X1+X2,data=df1)
> qda.pred = predict(qda.fit,df1)
> TrueClassLabel=df1$component
> QDAEstimtedClassLabel=qda.pred$class
> table(QDAEstimtedClassLabel,TrueClassLabel)
                     TrueClassLabel
QDAEstimtedClassLabel  f[1]  f[2]  f[3]
                 f[1] 27336  8683  3105
                 f[2]  5828 20084     0
                 f[3]  6674  1516 26774

Large error rates: Class 1, 2, 3 error rates respectively are

[1] 0.3138210 0.3367896 0.1039191

QDA suitable; considerable overlap among observations; larger sample size improves classification accuracy

Example 1e: model

Gaussian mixture for $K=2$ classes and $p=2$ features with

Class 1 and 2 mixing proportions: $\pi_1=\pi_2=0.5$
Class 1 and 2 components: \[f_1(x)\sim \text{Gaussian}(\mu_1=(0,0)^T,\Sigma_1=\mathbf{I}),\] \[f_2(x)\sim \text{Gaussian}\left(\mu_2=(-2,-2)^T,\Sigma_2=\left(\begin{array}{cc} 1 & 0.7\\ 0.7 & 1 \end{array}\right)\right)\]
marginal: $f(x)= \sum_{k=1}^2 \pi_k f_k(x)$ (not Gaussian)

Example 1e: generate data

Randomly generate $1000$ observations:

> set.seed(2)
> n=1000; cpb=c(0.5,0.5)
> clbs = sample(1:2,n,replace = T, prob=cpb)
> n1=length(clbs[clbs==1]); n2=length(clbs[clbs==2]); 
> library(mvtnorm)
> x1=rmvnorm(n1, mean=c(0,0), sigma=diag(2));
> Sg2= matrix(c(1,0.7,0.7,1),nrow=2);
> x2=rmvnorm(n2, mean=c(-2,-2), sigma=Sg2); 
> proportion = c(rep(cpb[1],n1),rep(cpb[2],n2))
> component=c(rep(1,n1),rep(2,n2))
> observation = rbind(x1,x2)
> colnames(observation)=c("X1","X2")
> df3=data.frame(cbind(observation,proportion,component))
> cpStg=c(expression(f[1]),expression(f[2]))
> df3$component = factor(df3$component,labels=cpStg)

Example 1e: scallter plot

A bit overlap among observations ($n=1000$):

Example 1e: QDA

QDA results:

                     TrueClassLabel
QDAEstimtedClassLabel f[1] f[2]
                 f[1]  461   54
                 f[2]   36  449

Error rates: Class 1, 2 error rates respectively are

> c1er=1-sum(qda.pred$class[df3$component=="f[1]"] =="f[1]")/
+          sum(df3$component=="f[1]")
> c2er=1-sum(qda.pred$class[df3$component=="f[2]"] =="f[2]")/
+          sum(df3$component=="f[2]")
> c(c1er,c2er)
[1] 0.07243461 0.10735586

QDA suitable; a bit overlap among observations; larger sample size improves classification

Example 1e: QDA ROC

> rocplot(qda.pred$posterior[,2],TrueClassLabel)

Example 1e: QDA AUC

AUC:

> library(dplyr); library(ROCR)
> aucStdThr = 
+  ROCR::prediction(qda.pred$posterior[,2],TrueClassLabel) %>%
+   ROCR::performance(measure = "auc") %>%
+   .@y.values
> as.numeric(aucStdThr)
[1] 0.9642907

ROCR::performance(measure = "auc"): obtain performance via AUC

Discriminant analysis: Example 2

`Default` data

Default data set (in R library ISLR) on credit card users:

2 classes as statuses of default of a user on credit card payments, where default takes value Yes or No
3 features on each user: annual income, monthly credit card balance, user being student (with value Yes) or not (with value No)
$n=10,000$ training observations: 333 observations are for default=Yes, and the rest default=No

Target: apply DA to classify a user’s status of default using some of the features

Model assumption

Check Gaussian assumption: student hidden in contours

QDA: classification

Classification results of QDA with features balance and income on training set and “0.5-threshold rule” of default=Yes if $\widehat{\Pr}(\text{default}=\text{Yes}|X=x)>0.5$:

> library(MASS)
> qda.fit = qda(default~balance+income,data=Default)
> qda.pred = predict(qda.fit,Default)
> TrueClassLabel=Default$default
> QDAEstimtedClassLabel=qda.pred$class
> table(QDAEstimtedClassLabel,TrueClassLabel)
                     TrueClassLabel
QDAEstimtedClassLabel   No  Yes
                  No  9637  241
                  Yes   30   92

AUC:

[1] 0.9489247

QDA: ROC curve by CV

Average of 10 ROC curves obtained by 10-fold cross-validation:

> par(mfrow=c(1,1),mar=c(4.6,4.5,.0,.5),oma=c(1.3,1.3,0.3,1.3))
> predqda = prediction(qda.pred$posterior[,2], TrueClassLabel)
> perfroc = performance(predqda, "tpr", "fpr")
> plot(perfroc,avg="threshold",spread.estimate="boxplot",
+      colorize=TRUE)

avg="threshold": averaging via cutoff 10 ROC curves obtained by 10-fold cross-validation
spread.estimate="boxplot": provide boxplot as variation around the average curve
colorize=TRUE: show cutoff values via color key

QDA: ROC curve by CV

Average of 10 ROC curves obtained by 10-fold cross-validation; color key for threshold value

Modified QDA

Modified decision rule: $\widehat{\Pr}(\text{default}=\text{Yes}|X=x)>0.2$

> library(ISLR); library(MASS)
> qda.fit = qda(default~balance+income,data=Default)
> qda.pred = predict(qda.fit,Default)
> TrueClassLabel=Default$default
> # create estimated class labels vector of all "No"'s
> mQDAEstimtedClassLabel = rep("No",nrow(Default))
> # Pr(default=Yes|X) > 0.2 as threshold rule
> YesIdx=which(qda.pred$posterior[,2]>0.2)
> mQDAEstimtedClassLabel[YesIdx] = "Yes"
> table(mQDAEstimtedClassLabel,TrueClassLabel)
                      TrueClassLabel
mQDAEstimtedClassLabel   No  Yes
                   No  9348  123
                   Yes  319  210

YesIdx=...: R takes class default=Yes as the 2nd class and qda.pred$posterior[,2] contains posterior probabilities for the 2nd class

License and session Information

License

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] knitr_1.21

loaded via a namespace (and not attached):
 [1] compiler_3.5.0  magrittr_1.5    tools_3.5.0    
 [4] htmltools_0.3.6 revealjs_0.9    yaml_2.2.0     
 [7] Rcpp_1.0.0      stringi_1.2.4   rmarkdown_1.11 
[10] highr_0.7       stringr_1.3.1   xfun_0.4       
[13] digest_0.6.18   evaluate_0.12

Stat 437 Lecture Notes 5b

Xiongzhi Chen

Washington State University

2D density estimation and visualization, and plotting bivariate functions: software implementation

R software and commands

Command rmvnorm

Command kde2d{MASS}

Command kde2d{MASS}

Command persp

Command persp

Command persp

stat_density_2d{ggplot2}

Command persp3D{plot3D}

Command persp3D{plot3D}

2D density estimation and visualization, and plotting bivariate functions: illustrations

Gaussian mixture: K=3,p=2

Generate data

Generate data

Density estimation

Create color scheme

Create density plot

3D plot of density

Contours of 2D density

Contours of 2D density

Plot marginal density

Marginal density: 3D plot

Obtain contours of marginal

Marginal density: contours

A bivariate function

3D plot of bivariate function

3D plot of bivariate function

Discriminant analysis: software implementation

R software and commands

Command lda{MASS}

Command lda{MASS}

Command qda{MASS}

Command predict{MASS}

Command predict{MASS}

Command predict{MASS}

Command predict{MASS}

Cmd prediction{ROCR}

Cmd performance{ROCR}

Cmd performance{ROCR}

Cmd performance{ROCR}

Command plot{ROCR}

Command plot{ROCR}

Command plot{ROCR}

Integrated way to get ROC curve

Pipe to get AUC

Discriminant analysis: Example 1

Example 1: model

Example 1a: data

Example 1a: scatter plot

Example 1a: scatter plot

Example 1a: LDA

Example 1b: LDA

Example 1c: QDA

Example 1d

Example 1e: model

Example 1e: generate data

Example 1e: scallter plot

Example 1e: QDA

Example 1e: QDA ROC

Example 1e: QDA AUC

Discriminant analysis: Example 2

Default data

Model assumption

QDA: classification

QDA: ROC curve by CV

QDA: ROC curve by CV

Modified QDA

License and session Information

Command `rmvnorm`

Command `kde2d{MASS}`

Command `kde2d{MASS}`

Command `persp`

Command `persp`

Command `persp`

`stat_density_2d{ggplot2}`

Command `persp3D{plot3D}`

Command `persp3D{plot3D}`

Command `lda{MASS}`

Command `lda{MASS}`

Command `qda{MASS}`

Command `predict{MASS}`

Command `predict{MASS}`

Command `predict{MASS}`

Command `predict{MASS}`

Cmd `prediction{ROCR}`

Cmd `performance{ROCR}`

Cmd `performance{ROCR}`

Cmd `performance{ROCR}`

Command `plot{ROCR}`

Command `plot{ROCR}`

Command `plot{ROCR}`

`Default` data