Stat 203 Lecture 29
Recall our assumptions for linear regression models:
Goals:
The distances \(y_i - \hat{\mu}_i\) are called the response residuals and are the basis for residuals in linear regression.
These are insufficient for GLMs because in general the variance depends on the mean.
Idea: handle the non-constant variance by dividing out its effects.
\[ r_P = \frac{y - \hat{\mu}}{\sqrt{V(\hat{\mu})/w}}, \]
where \(V\) is the variance function.
Define
\[ r_D = \text{sign}(y-\hat{\mu})\sqrt{w d(y,\hat{\mu})} \]
An alternative to Pearson and deviance residuals are the quantile residuals. They are exactly normally distributed apart from the sampling variability in estimating \(\mu\) and \(\phi\), assuming that the correct EDM is used.
Fitted in R
with qresid()
from the statmod
package.
Let \(\mathcal{F}(y; \mu, \phi)\) be the CDF of a random variable \(y\). The quantile residuals are
\[ r_Q = \Phi^{-1}\left( \mathcal{F}(y; \hat{\mu}, \phi\right)), \]
where \(\Phi\) is the CDF of the standard normal distribution. If \(\phi\) is unknown, use the Pearson estimator of \(\phi\).