Logistic regression: diagnostics
Recap on model and assumptions
- \(Y\) is a Bernoulli random variable and \(E\left( Y=1|X\right) =p\left( X\right)\)
- Population model:
\[\begin{equation}
\operatorname{logit}\left( p\left( X\right) \right) =\log\left(
\frac{p\left( X\right) }{1-p\left( X\right) }\right) =\beta_{0}+\beta_{1}X
\end{equation}\]
or equivalently
\[\begin{equation}
p\left( X\right) =\frac{\exp\left( \beta_{0}+\beta_{1}X\right) }
{1+\exp\left( \beta_{0}+\beta_{1}X\right) }=\frac{1}{1+\exp\left(
-\beta_{0}-\beta_{1}X\right) }
\end{equation}\]
- Sample version of the model: \[
\operatorname{logit}\left( p\left( x_{i}\right) \right) =\beta_{0}+\beta_{1}x_{i}
\] for the \(i\)th observation \(\left( x_{i},y_{i}\right)\) of \(\left(X,Y\right)\).
Recap on assumptions
The assumptions of the model are:
Each \(y_{i}\) is Bernoulli random variable with expectation \(\mu_{i}=p\left( x_{i}\right)\)
\(\operatorname{logit}\left( p\left( X\right) \right)\) is a linear function of \(X\), i.e., \(\operatorname{logit}\left( p\left( x_{i}\right)\right)\) is a linear function of \(x_{i}\) for all \(1\leq i\leq n\)
The pairs \(\left\{ \left( x_{i},y_{i}\right) \right\} _{i=1}^{n}\) are independent
Fitted values
Once an estimate \(\left( \hat{\beta}_{0},\hat{\beta}_{1}\right)\) of \(\left( \beta_{0},\beta_{1}\right)\) is obtained, we have the estimated model \[
\operatorname{logit}\left( \hat{p}\left( x_{i}\right) \right) =\hat{\beta}_{0}+\hat{\beta}_{1}x_{i},
\] where the mean or probability of success \(\mu_i\) for each \(y_i\) is estimated by \[
\hat{\mu}_{i}=\hat{p}\left( x_{i}\right) =\frac{1}{1+\exp\left( -\hat
{\beta}_{0}-\hat{\beta}_{1}x_{i}\right) }
\]
Ordinary residuals
Formula: \[
e_{i}=\left\{
\begin{array}
{lll}
1-\hat{\mu}_{i} & if & y_{i}=1\\
-\hat{\mu}_{i} & if & y_{i}=0
\end{array}
\right.
\]
Note: the distribution of \(\left\{ e_{i}\right\} _{i=1}^{n}\) is unknown even if the fitted model is correct
Pearson residuals
- Formula: \[
r_{i}=\frac{y_{i}-\hat{\mu}_{i}}{\sqrt{\hat{\mu}_{i}\left( 1-\hat{\mu}_{i}\right) }}.
\] Note that \(\sqrt{\hat{\mu}_{i}\left( 1-\hat{\mu}_{i}\right) }\) can be thought of as an estimate of the standard deviation of \(y_{i}-\hat{\mu}_{i}\)
- Studentized Pearson residuals: \[
\tilde{r}_{i}=\frac{r_{i}}{\sqrt{1-h_{ii}}},
\] where \(h_{ii}\) is the \(i\)-th diagonal entry of \[
H=W^{1/2}X\left( X^{\prime}WX\right) ^{-1}X^{\prime}W^{1/2}
\] and \(W\) is the \(n\times n\) diagonal matrix with entries \(\hat{\mu}_{i}\left(1-\hat{\mu}_{i}\right)\) and \(X\) is the \(n\times p\) design matrix
Deviance residuals
- The “probability mass function (PMF)” of \(y_{i}\), or the likelihood function of \(y_i\), is \[
f_i\left( y_{i}\right) =p_{i}^{y_{i}}\left( 1-p_{i}\right) ^{1-y_{i}}
\]
- Let the estimated log-likelihood function of \(y_i\) be denoted by \[\ln \hat{f}_i (y_i)= y_{i}\ln\hat{\mu}_{i}+\left( 1-y_{i}\right) \ln\left( 1-\hat{\mu}_{i}\right)\]
- The model deviance is
\[\begin{equation}
D=-2\sum_{i=1}^{n} \ln \hat{f}_i (y_i) =\sum_{i=1}^{n}d_{i}^{2},
\end{equation}\]
where the \(i\)th deviance residual is \[
d_{i}=\operatorname{sgn}\left( y_{i}-\hat{\mu}_{i}\right) \sqrt{-2 \ln \hat{f}_i (y_i) }
\]
Deviance residuals
The model deviance \(D\) is equal to \[
-2\times\left[ \ln L\left( \text{Reduced model}\right) -\ln L\left(
\text{Full model}\right) \right] ,
\] where the “Reduced model” is the model under consideration and the “Full model” is the full nonparametric model of \(E\left( y_{i}\right) =\mu_{i} = \Pr(Y=1)\)
Under appropriate conditions, \(D\) approximately has a chi-square distribution when the Reduced model is correct. \(D\) itself can be used to check if the postulated linear model is appropriate.
Checking model fit

Checking model fit
- If the logistic model is correct, then \(E\left( y_{i}\right) =\mu_{i}\) and asymptotically \[
E\left( y_{i}-\hat{\mu}_{i}\right) =0
\]
- Thus, if the model is correct, a lowess smooth fit of the plot of the residuals against the estimated probabilities (or against the linear predictor \(\hat{\eta}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{i}\)) should result approximately in a horizontal line with zero intercept
- Any significant departure from this suggests that the model may be inadequate. In practice, either the ordinary residuals, the Pearson residuals, or the studentized residuals can be used for this purpose
Checking model fit

Checking influential observations
- Recall Pearson chi-square goodness of fit statistic, in the setting of logistic regression, as \[
\xi^{2}=\sum_{i=1}^{n}\frac{\left( y_{i}-\hat{\mu}_{i}\right) ^{2}}{\hat
{\mu}_{i}\left( 1-\hat{\mu}_{i}\right) }=\sum_{i=1}^{n}r_{i}^{2}
\]
- Denote by \(\xi_{\left( i\right) }^{2}\) the value of the Pearson chi-square goodness of fit statistic obtained when observation \(i\) is deleted
- Denote by \(D_{\left( i\right) }\) the value of the model deviance obtained when observation \(i\) is deleted
Checking influential observations
- Define the “delta chi-square statistic” as \[
\Delta\xi_{\left( i\right) }^{2}=\xi^{2}-\xi_{\left( i\right) }^{2}
\] and “delta deviance statistic” as \[
\Delta D_{\left( i\right) }=D-D_{\left( i\right) }.
\] These quantities measure the influence of the \(i\)th observation on model fitting
- Usually, these statistics are plotted against the observation number \(i\), against \(\hat{\mu}_{i}\), or against \(\hat{\eta}_{i}=\hat{\beta}_{0} +\hat{\beta}_{1}x_{i}\). Extreme values appear as spikes when plotted against \(i\), or as outliers in the upper corners of the plot when plotted against \(\hat{\mu}_{i}\) or \(\hat{\eta}_{i}\)
Checking influential observations

Checking influential observations
Cook’s distance is defined as \[
\delta_{i}=\frac{r_{i}^{2}h_{ii}}{p\left( 1-h_{ii}\right) ^{2}}
\] where
- \(h_{ii}\) is the \(i\)-th diagonal entry of \[
H=W^{1/2}X\left( X^{\prime}WX\right) ^{-1}X^{\prime}W^{1/2}
\]
- \(W\) is the \(n\times n\) diagonal matrix with entries \(\hat{\mu}_{i}\left( 1-\hat{\mu}_{i}\right)\) and \(X\) is the \(n\times p\) design matrix.