Stat 203 Lecture 22
Recall that the linear predictor is
\[ \eta = \beta_0 + \sum\limits_{j=1}^p \beta_j x_j. \]
The systematic component of a GLM connects \(\eta\) is to the mean \(\mu\) through a link function \(g\) so that \(g(\mu) = \eta\).
The canonical link function is a special link function such that
\[ \eta = \theta = g(\mu). \]
For the normal distribution, \(\theta = \mu\), so the canonical link function is the identity, \(g(\mu) = \mu\). Thus, \(\eta = \mu\).
For the Poisson distribution, \(\theta = \log\mu\), so the canonical link is \(g(\mu) = \log \mu = \eta\).
Definition 1 An offset is a term \(\beta_j x_j\) in the linear predictor that requires no estimation. In this case, \(\beta_j\) is known a priori.
Example 1 Consider modelling the annual number of hospital births in various cities. The annual number of births is discrete, so a Poisson distribution may be appropriate. But what might \(\mu_i\) (in city \(i\)) depend on?
Definition 2 A generalized linear model consists of two components:
The GLM is therefore
\[ \begin{cases} y_i \sim \text{EDM}(\mu_i, \phi/w_i) \\ g(\mu_i) = o_i + \beta_0 + \sum_{j=1}^p \beta_j x_{ji} \end{cases} \]
Recall that the unit deviance is
\[ d(y,\mu) = 2\left(t(y,y) - t(y,\mu)\right). \]
The function
\[ D(y,\mu) = \sum\limits_{i=1}^n w_i d(y_i, \mu_i) \]
is called the deviance function. Its value is called the deviance or total deviance.
We can scale \(D\) by the dispersion parameter:
\[ D^*(y,\mu) = D(y,\mu)/\phi. \]
Its value is called the scaled deviance or scaled total deviance.
By using the dispersion model form of the EDM, the log-likelihood function is
\[ \begin{align*} \ell(\mu; y) &= \sum\limits_{i=1}^n \log b(y_i, \phi/w_i) - \frac{1}{2\phi} \sum\limits_{i=1}^n w_i d(y_i, \mu_i)\\ &= \sum\limits_{i=1}^n \log b(y_i, \phi/w_i) - \frac{D(y,\mu)}{2\phi} \end{align*} \]
In Chapter 3, we explored transformations of the response, \(y\). What were our goals?
When \(V(\mu)\) represents the true mean-variance relationship for the responses, we can relate \(V(\mu)\) to a variance-stabilizing transformation.
Thus, we can think of using linear regression after a transformation of \(y\) as roughly equivalent to a GLM with variance \(V(\mu) = 1/h'(\mu)^2\) and link function \(g(\mu) = h(\mu)\). In this way, transforming the response is an approximation of fitting a GLM.
Two components: