Assumptions for GLMs / Residuals

Stat 203 Lecture 29

Dr. Janssen

Setup

Linear regression case

Recall our assumptions for linear regression models:

  • Linearity
  • Normality
  • Lack of outliers
  • Independence

Assumptions for GLMs

Goals:

  • Lack of outliers
  • Link function
  • Linearity
  • Variance function
  • Dispersion parameter
  • Independence
  • Distribution

Residuals for GLMs

Response residuals are insufficient

The distances \(y_i - \hat{\mu}_i\) are called the response residuals and are the basis for residuals in linear regression.

These are insufficient for GLMs because in general the variance depends on the mean.

Pearson Residuals

Idea: handle the non-constant variance by dividing out its effects.

\[ r_P = \frac{y - \hat{\mu}}{\sqrt{V(\hat{\mu})/w}}, \]

where \(V\) is the variance function.

resid(fitted_glm, type="pearson")

Deviance residuals

Define

\[ r_D = \text{sign}(y-\hat{\mu})\sqrt{w d(y,\hat{\mu})} \]

resid(fitted_glm)

Quantile residuals

An alternative to Pearson and deviance residuals are the quantile residuals. They are exactly normally distributed apart from the sampling variability in estimating \(\mu\) and \(\phi\), assuming that the correct EDM is used.

Fitted in R with qresid() from the statmod package.

Example

y <- 1.2; mu <- 3
cprob <- pexp(y, rate=1/mu); cprob
[1] 0.32968

Example

y <- 1.2; mu <- 3
cprob <- pexp(y, rate=1/mu); cprob
[1] 0.32968
rq <- qnorm(cprob); rq
[1] -0.4407971

More formally

Let \(\mathcal{F}(y; \mu, \phi)\) be the CDF of a random variable \(y\). The quantile residuals are

\[ r_Q = \Phi^{-1}\left( \mathcal{F}(y; \hat{\mu}, \phi\right)), \]

where \(\Phi\) is the CDF of the standard normal distribution. If \(\phi\) is unknown, use the Pearson estimator of \(\phi\).

The discrete case

y <- 1; mu <- 2.6
a <- ppois(y-1, mu); b <- ppois(y, mu)
c(a, b)
[1] 0.07427358 0.26738488

The discrete case

y <- 1; mu <- 2.6
a <- ppois(y-1, mu); b <- ppois(y, mu)
c(a, b)
[1] 0.07427358 0.26738488
u <- runif(1, a, b); u
[1] 0.2516115
rq <- qnorm( u ); rq
[1] -0.6694273

Checking model assumptions

Plots for model assumptions

  • Independence best assessed by understanding data collection
  • Data collected over time can be assessed for independence by plotting residuals against lagged residuals
  • Systematic component can be assessed by plotting residuals against fitted values \(\hat{\mu}\), or against \(x_j\).
  • Trends indicate the systematic component can be improved, either by changing the link function, transforming the explanatory variables, or adding explanatory variables
  • Constant variance in these plots indicate a correct random component
  • Q–Q plots can be used to assess the choice of random component