Overdispersion and Goodness-of-Fit

Stat 203 Lecture 27

Dr. Janssen

Overdispersion

The idea

For a binomial distribution, \(\text{var}[y] = \mu (1-\mu)\). In practice, the amount of variation can exceed this quantity, even for binomial-like data.

This is called overdispersion.

Estimating \(\phi\)

MLE?

Example: in normal linear regression, the MLE of \(\phi = \sigma^2\) is

\[ \hat{\sigma}^2 = \frac{1}{n} \sum\limits_{i=1}^n w_i (y_i - \hat{\mu}_i)^2, \]

which is never used as it is biased.

Instead:

\[ s^2 = \frac{1}{n-p'}\sum\limits_{i=1}^n w_i (y_i - \hat{\mu}_i)^2. \]

(Modified) Profile Log-Likelihood Estimator

The profile log-likelihood for \(\phi\) is

\[ \ell(\phi) = \ell(\hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}_p, \phi; y) \]

The modified profile log-likelihood is

\[ \ell^0(\phi) = \frac{p'}{2} \log(\phi) + \ell(\hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}_p, \phi; y). \]

Mean Deviance Estimator of \(\phi\)

Another approach is to use the mean deviance estimator of \(\phi\):

\[ \tilde{\phi} = \frac{D(y,\hat{\mu})}{n-p'}. \]

Pearson Estimator of \(\phi\)

The Pearson statistic is the working RSS:

\[ X^2 = \sum\limits_{i=1}^n w_i (z_i - \hat{\eta}_i)^2 = \sum\limits_{i=1}^n \frac{w_i (y_i - \hat{\mu}_i)^2}{V(\hat{\mu}_i)}. \]

The Pearson estimator of \(\phi\) is then

\[ \overline{\phi} = \frac{X^2}{n-p'}. \]

Which is best?

  • \(\hat{\phi}\) is biased and so is rarely used
  • \(\hat{\phi}^0\) has excellent theoretical properties but is hard to compute
  • Mean deviance and Pearson estimator are convenient
  • Mean deviance estimator behaves well when saddlepoint holds
  • Pearson estimator is almost universally applicable

Goodness-of-fit

The idea

Compare Model A to an alternative Model B of a particular type, typically the largest model we can fit to the data (a saturated model).

If goodness-of-fit is rejected, this is evidence that the current model is not accurate.

Pearson Goodness-of-Fit Test

Large-sample asymptotics do not apply, because the number of parameters in the saturated model increases with the number of observations..

Some rules-of-thumb for small dispersion asymptotics are given on p. 277; in these cases, the Pearson statistic for goodness-of-fit are approximately chi-square.

Back to binomial

Two causes

  1. The probabilities \(\mu_i\) are not constant between observations even when all the explanatory variables are unchanged.

  2. Alternatively, the \(m_i\) cases, of which observation \(y_i\) is a proportion, are not independent.

Lack of independence

Example: positive cases arrive in clusters rather than as individual cases.

Instead, writing \(\rho\) for the correlation between the Bernoulli trials, we find \(\phi_i = 1 + (m_i-1)\rho\).