The Systematic Component of a GLM / GLMs defined

Stat 203 Lecture 22

Dr. Janssen

Linking the linear predictor

Recall that the linear predictor is

\[ \eta = \beta_0 + \sum\limits_{j=1}^p \beta_j x_j. \]

The systematic component of a GLM connects \(\eta\) is to the mean \(\mu\) through a link function \(g\) so that \(g(\mu) = \eta\).

The canonical link function is a special link function such that

\[ \eta = \theta = g(\mu). \]

Example: Normal distribution

For the normal distribution, \(\theta = \mu\), so the canonical link function is the identity, \(g(\mu) = \mu\). Thus, \(\eta = \mu\).

Example: Poisson distribution

For the Poisson distribution, \(\theta = \log\mu\), so the canonical link is \(g(\mu) = \log \mu = \eta\).

Offsets

Definition and Example

Definition 1 An offset is a term \(\beta_j x_j\) in the linear predictor that requires no estimation. In this case, \(\beta_j\) is known a priori.

Example 1 Consider modelling the annual number of hospital births in various cities. The annual number of births is discrete, so a Poisson distribution may be appropriate. But what might \(\mu_i\) (in city \(i\)) depend on?

GLMs Defined

Two Components

Definition 2 A generalized linear model consists of two components:

  • Random component: The observations \(y_i\) come independently from a specified EDM such that \(y_i \sim \text{EDM}(\mu_i, \phi/w_i)\) for \(i = 1, 2, \ldots, n\). The \(w_i\) are known non-negative prior weights, which potentially weight each observation \(i\) differently. Commonly, \(w_i = 1\).
  • Systematic component: A linear predictor \(\eta_i = o_i + \beta_0 + \sum_{j=1}^p \beta_j x_{ji}\), where the \(o_i\) are offsets that are often equal to zero, and \(g(\mu) = \eta\) is a known, monotonic, differentiable link function.

The GLM is therefore

\[ \begin{cases} y_i \sim \text{EDM}(\mu_i, \phi/w_i) \\ g(\mu_i) = o_i + \beta_0 + \sum_{j=1}^p \beta_j x_{ji} \end{cases} \]

Total Deviance

Unit to Total

Recall that the unit deviance is

\[ d(y,\mu) = 2\left(t(y,y) - t(y,\mu)\right). \]

The function

\[ D(y,\mu) = \sum\limits_{i=1}^n w_i d(y_i, \mu_i) \]

is called the deviance function. Its value is called the deviance or total deviance.

Scaled Deviance

We can scale \(D\) by the dispersion parameter:

\[ D^*(y,\mu) = D(y,\mu)/\phi. \]

Its value is called the scaled deviance or scaled total deviance.

Log-likelihood for GLMs

By using the dispersion model form of the EDM, the log-likelihood function is

\[ \begin{align*} \ell(\mu; y) &= \sum\limits_{i=1}^n \log b(y_i, \phi/w_i) - \frac{1}{2\phi} \sum\limits_{i=1}^n w_i d(y_i, \mu_i)\\ &= \sum\limits_{i=1}^n \log b(y_i, \phi/w_i) - \frac{D(y,\mu)}{2\phi} \end{align*} \]

Comparing to Transformations

Recall: Transforming the response

In Chapter 3, we explored transformations of the response, \(y\). What were our goals?

Using \(V(\mu)\)

When \(V(\mu)\) represents the true mean-variance relationship for the responses, we can relate \(V(\mu)\) to a variance-stabilizing transformation.

Thus, we can think of using linear regression after a transformation of \(y\) as roughly equivalent to a GLM with variance \(V(\mu) = 1/h'(\mu)^2\) and link function \(g(\mu) = h(\mu)\). In this way, transforming the response is an approximation of fitting a GLM.

Caveats

  • Using a transformation to simultaneously achieve linearity and constant variance assumes a relationship between variance and link function which is generally overly simplistic.
  • GLMs provide more flexibility: EDM family and link function can be chosen separately based on the data.
  • GLMs model data on the original scale rather than an artificially transformed scale
  • GLMs give realistic probability statements when the data are actually non-normal
  • GLMs more easily allow the impact of the explanatory variables on \(\mu\) to be interpreted.

Summary

What is a GLM?

Two components:

  • A random component describing the distribution of the \(y_i\); must be an EDM
  • A systematic component that relates the linear predictor to \(\mu\) via a link function \(g\)