MLE for More Than One Parameter and Properties

Stat 203 Lecture 17

Dr. Janssen

Standard Errors of Parameters

Variance

Exercise. \(\mathcal{I}(\zeta) = E[U(\zeta)] = \text{var}[U(\zeta)]\).

A Taylor series expansion of the log-likelihood around \(\zeta = \hat\zeta\) shows that

\[ \text{var}[\hat\zeta] \approx 1/\mathcal{I}(\zeta). \]

Score Equations for More than One Parameter

Our discussion of likelihood functions so far has not included covariates and explanatory variables. The normal and non-normal regression models developed in this book will assume that each response observation \(y_i\) follows a probability distribution that is parametrized by a location parameter \(\mu_i = E[y_i]\), and dispersion parameter \(\phi\) that specifies the variance of \(y_i\).

The mean \(\mu_i\) will be assumed to be a function fo explanatory variables \(x_{ij}\) and regression parameters \(\beta_j\). Specifically, we assume a linear predictor

\[ \eta_j = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}. \]

The mean \(\mu_i\) depends on the linear predictor; more precisely, \(g(\mu_i) = \eta_i\) for some known function \(g()\). The function \(g()\) links the means to the linear predictor, and so is known as the link function.

Setup

For regression models, the log-likelihood function is

\[ \ell (\beta_0, \beta_1, \ldots, \beta_p; y) = \sum\limits_{i=1}^n \log\mathcal{P}(y_i; \mu_i, \phi). \]

The score functions have the form:

\[ U(\beta_j) = \frac{\partial \ell(\beta_0, \beta_1, \ldots, \beta_p ; y)}{\partial\beta_j} = \sum\limits_{i=1}^n \frac{\partial \log\mathcal{P}(y_i; \mu_i, \phi)}{\partial\mu_i} \frac{\partial\mu_i}{\partial\beta_j}. \]

Example: `quilpie`

Recall that \(\mu = \text{Pr}(y=1)\) is the probability that the 10mm rain threshold is exceeded. A (bad) direct linear model could be

\[ \mu = \beta_0 + \beta_1 x. \]

Possible:

\[ \log \frac{\mu}{1-\mu} = \eta = \beta_0 + \beta_1 x. \]

But note that \(0\le \mu\le 1\), as it’s a probability. The systematic component can’t ensure this without imposing difficult-to-enforce constraints on \(\beta_j\). A different from of the systematic component is on the screen. It ensures that \(0 < \mu < 1\).

There are two parameters to be estimated: \(\beta_0, \beta_1\), so there are two score functions, \(U(\beta_0)\) and \(U(\beta_1)\). We see:

\[ \frac{\partial \mu}{\partial \beta_0} = \mu(1-\mu) \qquad \text{ and } \qquad \frac{\partial \mu}{\partial \beta_1} = \mu(1-\mu)x. \]

Working with just one observation:

\[ \begin{align*} U(\beta_0) &= \frac{\partial \log\mathcal{P}(y;\mu)}{\partial\beta_0} = \frac{d \log(\mathcal{P}(y; \mu))}{d\mu} \times \frac{\partial \mu}{\partial \beta_0} = y- \mu;\\ U(\beta_1) &= \frac{\partial \log \mathcal{P}(y; \mu)}{\partial\beta_1} = \frac{d\log(\mathcal{P}(y; \mu))}{d\mu}\times \frac{\partial \mu}{\partial\beta_1} = (y-\mu)x. \end{align*} \]

The two score equations are thus \(U(\hat{\beta}_0) = \sum\limits_{i=1}^n y_i - \hat{\mu}_i = 0\) and \(U(\hat{\beta}_1) = \sum\limits_{i=1}^n (y_i - \hat{\mu}_i)x_i =0\), where \(\log(\hat{\mu}_i/(1-\hat{\mu}_i)) = \hat{\beta}_0 + \hat{\beta}_1 x_i\).

Solving these simultaneous equations for \(\hat{\beta}_0, \hat{\beta_1}\) is best achieved using interative matrix algorithms, which we will skip.

Information: Observed and Expected

Observed Information

When we have more than one parameter:

\[ \mathcal{J}_{jk}(\beta) = - \frac{\partial U(\beta_j)}{\partial \beta_k} = - \frac{dU(\beta_j)}{d\mu} \frac{\partial \mu}{\partial \beta_k}. \]

The expected information is \(\mathcal{I}_{jk}(\beta) = E[\mathcal{J}_{jk}(\beta)]\).

The expected information relating to \(\beta_j\) is \(\mathcal{I}_{jj}(\beta)\).

Example: `quilpie`

We find:

\[ \begin{align*} \mathcal{J}_{00}(\beta) &= - \frac{\partial U(\beta_0)}{\partial\beta_0} = - \frac{d U(\beta_0)}{d\mu} \frac{\partial \mu}{\partial \beta_0} &= \sum\limits_{i=1}^n \mu_i (1-\mu_i); \\ \mathcal{J}_{11}(\beta) &= - \frac{\partial U(\beta_1)}{\partial\beta_1} = - \frac{d U(\beta_1)}{d\mu} \frac{\partial \mu}{\partial \beta_1} &= \sum\limits_{i=1}^n \mu_i (1-\mu_i)x_i^2;\\ \mathcal{J}_{01}(\beta) = \mathcal{J}_{10}(\beta) &= - \frac{\partial U(\beta_1)}{\partial\beta_0} = - \frac{d U(\beta_1)}{d\mu} \frac{\partial \mu}{\partial \beta_0} &= \sum\limits_{i=1}^n \mu_i (1-\mu_i)x_i.\\ \end{align*} \]

Standard Errors of Parameters

Similarly:

\[ \text{var}[\hat{\beta}_j] \approx 1/\mathcal{I}_{jj}(\beta), \]

which means that \(\text{se}(\hat{\beta}_j) \approx 1/\mathcal{I}_{jj}(\hat{\beta})^{1/2}\).

Properties of MLEs

For One Parameter

The MLE of \(\zeta\), denoted \(\hat{\zeta}\), has the following properties.

MLEs are invariant. Thus, if \(s\) is one-to-one, \(s(\hat{\zeta})\) is the MLE of \(s(\zeta)\).
MLEs are asymptotically unbiased. That is, \(E[\hat{\zeta}] = \zeta\) as \(n\to\infty\).
MLEs are asymptotically efficient. That is, no other asymptotically unbiased estimator exists with a smaller variance.
MLEs are consistent. The MLE converges to the true value of \(\zeta\) for increasing \(n\).
MLEs are asymptotically normally distributed. If \(\zeta_0\) is the true value of \(\zeta\), then \(\hat{\zeta} \sim N(\zeta_0, 1/\mathcal{I}(\zeta_0))\) as \(n\to\infty\).

Exploration (if time)

MLE for Poisson

The Poisson distribution has the probability function

\[ \mathcal{P}(y; \mu) = \frac{\exp(-\mu)\mu^y}{y!} \]

for \(\mu < \infty\) and where \(y\) is a nonnegative integer. Initially, consider estimating the mean \(\mu\) for the Poisson distribution, based on a sample \(y_1, y_2, \ldots, y_n\).

Determine the likelihood function and the log-likelihood function.
Find the score function \(U(\mu)\).
Using the score function, find the MLE of \(\mu\).
Find the observed and expected information for \(\mu\).
Find the standard error for \(\hat{\mu}\).

MLE for More Than One Parameter and Properties

Standard Errors of Parameters

Variance

Score Equations for More than One Parameter

Setup

Example: quilpie

Information: Observed and Expected

Observed Information

Example: quilpie

Standard Errors of Parameters

Properties of MLEs

For One Parameter

Exploration (if time)

MLE for Poisson

Example: `quilpie`

Example: `quilpie`