The Systematic Component of a GLM / GLMs defined

Stat 203 Lecture 22

Dr. Janssen

The Link Function

Linking the linear predictor

Recall that the linear predictor is

\[ \eta = \beta_0 + \sum\limits_{j=1}^p \beta_j x_j. \]

The systematic component of a GLM connects \(\eta\) is to the mean \(\mu\) through a link function \(g\) so that \(g(\mu) = \eta\).

The canonical link function is a special link function such that

\[ \eta = \theta = g(\mu). \]

Example: Normal distribution

For the normal distribution, \(\theta = \mu\), so the canonical link function is the identity, \(g(\mu) = \mu\). Thus, \(\eta = \mu\).

Example: Poisson distribution

For the Poisson distribution, \(\theta = \log\mu\), so the canonical link is \(g(\mu) = \log \mu = \eta\).

Offsets

Definition and Example

Definition 1 An offset is a term \(\beta_j x_j\) in the linear predictor that requires no estimation. In this case, \(\beta_j\) is known a priori.

Example 1 Consider modelling the annual number of hospital births in various cities. The annual number of births is discrete, so a Poisson distribution may be appropriate. But what might \(\mu_i\) (in city \(i\)) depend on?

GLMs Defined

Two Components

Definition 2 A generalized linear model consists of two components:

Random component: The observations \(y_i\) come independently from a specified EDM such that \(y_i \sim \text{EDM}(\mu_i, \phi/w_i)\) for \(i = 1, 2, \ldots, n\). The \(w_i\) are known non-negative prior weights, which potentially weight each observation \(i\) differently. Commonly, \(w_i = 1\).
Systematic component: A linear predictor \(\eta_i = o_i + \beta_0 + \sum_{j=1}^p \beta_j x_{ji}\), where the \(o_i\) are offsets that are often equal to zero, and \(g(\mu) = \eta\) is a known, monotonic, differentiable link function.

The GLM is therefore

\[ \begin{cases} y_i \sim \text{EDM}(\mu_i, \phi/w_i) \\ g(\mu_i) = o_i + \beta_0 + \sum_{j=1}^p \beta_j x_{ji} \end{cases} \]

Total Deviance

Unit to Total

Recall that the unit deviance is

\[ d(y,\mu) = 2\left(t(y,y) - t(y,\mu)\right). \]

The function

\[ D(y,\mu) = \sum\limits_{i=1}^n w_i d(y_i, \mu_i) \]

is called the deviance function. Its value is called the deviance or total deviance.

Scaled Deviance

We can scale \(D\) by the dispersion parameter:

\[ D^*(y,\mu) = D(y,\mu)/\phi. \]

Its value is called the scaled deviance or scaled total deviance.

Log-likelihood for GLMs

By using the dispersion model form of the EDM, the log-likelihood function is

\[ \begin{align*} \ell(\mu; y) &= \sum\limits_{i=1}^n \log b(y_i, \phi/w_i) - \frac{1}{2\phi} \sum\limits_{i=1}^n w_i d(y_i, \mu_i)\\ &= \sum\limits_{i=1}^n \log b(y_i, \phi/w_i) - \frac{D(y,\mu)}{2\phi} \end{align*} \]

Comparing to Transformations

Recall: Transforming the response

In Chapter 3, we explored transformations of the response, \(y\). What were our goals?

Using \(V(\mu)\)

When \(V(\mu)\) represents the true mean-variance relationship for the responses, we can relate \(V(\mu)\) to a variance-stabilizing transformation.

Thus, we can think of using linear regression after a transformation of \(y\) as roughly equivalent to a GLM with variance \(V(\mu) = 1/h'(\mu)^2\) and link function \(g(\mu) = h(\mu)\). In this way, transforming the response is an approximation of fitting a GLM.

Caveats

Using a transformation to simultaneously achieve linearity and constant variance assumes a relationship between variance and link function which is generally overly simplistic.
GLMs provide more flexibility: EDM family and link function can be chosen separately based on the data.
GLMs model data on the original scale rather than an artificially transformed scale
GLMs give realistic probability statements when the data are actually non-normal
GLMs more easily allow the impact of the explanatory variables on \(\mu\) to be interpreted.

Summary

What is a GLM?

Two components:

A random component describing the distribution of the \(y_i\); must be an EDM
A systematic component that relates the linear predictor to \(\mu\) via a link function \(g\)