[1] 0.09007923
[1] 0.2202206
[1] 0.2202206
Stat 203 Lecture 3
Definition 1 A regression model is a statistical model that assumes the mean response \(\mu_i\) for observation \(i\) depends on the \(p\) explanatory variables \(x_{1i}, x_{2i}, \ldots, x_{pi}\) via some general function \(f\) through a number of regression parameters \(\beta_j\):
\[ E[y_i] = \mu_i = f(x_{1i}, \ldots,x_{pi}; \beta_0, \beta_1, \ldots, \beta_q). \]
More to the point (for this class):
\[ \mu_i = f(\beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi}) \qquad(1)\]
Regression models with the form given in Equation 1 are said to be linear in the parameters. All the models we discuss this semester have this form.
The component \(\beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi}\) is called the linear predictor.
Definition 2 (Linear Regression) The systematic component of a linear regression model assumes the form
\[ E[y_i] = \mu_i = \beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi}, \] while the randomness is assumed to have constant variance \(\sigma^2\) about \(\mu_i\).
Definition 3 (Generalized Linear Model) The systematic component of a generalized linear model assumes the form
\[ \begin{align*} \mu_i &= g^{-1}(\beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi})\\ \text{or: } g(\mu_i) &= \beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi}, \end{align*} \] where \(g\), called the link function, is a monotonic, differentiable function.
The randomness is given via a specific family of probability distributions, chosen based on the situation.
The following notational conventions apply.
lime
Recall: lime
contains data on 385 small-leaved lime trees grown in Russia.
Foliage
: the foliage biomass, in kg (oven dried matter)DBH
: the tree diameter, at breast height, in cmAge
: the age of the tree, in yearsOrigin
: the origin of the tree; one of Coppice
, Natural
, Planted
One potential linear regression model is: \[ \begin{cases} \text{var}[y_i] = \sigma^2 & \text{(random component)}\\ \mu_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 & \text{(systematic component)} \end{cases} \qquad(2)\]
Other possible systematic components?
\[ \begin{align} \mu &= \beta_0 &+ \beta_1 x_1 & & & \\ \mu &= \beta_0 & &+ \beta_2 x_2 &+ \beta_3 x_3 &+ \beta_4 x_4\\ \mu &= \beta_0 &+ \beta_1 x_1 &+ \beta_2 x_2 & \\ \mu &= & \beta_1 x_1 &+ \beta_2 x_2 x_3 &+ \beta_3 x_3\\ 1/\mu &= \beta_0 &+ \beta_1 x_1 &+ \beta_2 x_2 &\\ \log(\mu) &= \beta_0 &+ \beta_1 x_1 &+ \beta_2 x_2 &+ \beta_3 x_3\\ \mu &= \beta_0 &+ \beta_1 x_1^2 &+ \exp(\beta_2 x_2^4) &+ \beta_3 x_3 & + \beta_4 x_4 \end{align} \]
In a Poisson process, we counting the number of events per unit of time or space. We assume:
Examples?
Definition 4 (Poisson Probability Function) Given the expected count \(\mu > 0\) for a Poisson process occurring over a unit of time/space, the probability that \(y\) events will occur in one of the units of time/space is given by the function
\[ \mathcal{P}(y; \mu) = \frac{\exp(-\mu) \mu^y}{y!}. \qquad(3)\]
[1] "Miners" "Eucs" "Area" "Grazed" "Shrubs" "Bulokes" "Timber"
[8] "Minerab"
A possible Poisson GLM for this data is
\[ \begin{cases} y\sim \text{Pois}(\mu) & \text{(random component)}\\ \mu = \exp(\beta_0 + \beta_1 x) & \text{(systematic component)} \end{cases} \qquad(4)\]
where \(y\) is the number of noisy miners in a given hectare, and \(x\) is the number of eucalpyt trees at the given location.
Question Given a one-unit change in \(x\), how should each of the two systematic components in Equation 5 be interpreted to describe the corresponding change in \(\mu\)?
\[ \begin{align} \mu &= \beta_0 + \beta_1 x\\ \log \mu &= \beta_0 + \beta_1 x \end{align} \qquad(5)\]
Choose a dataset from the GLMsData
package and write a rough draft of a regression model. Justify your choice of systematic component.