Stat 203 Lecture 24
For the following research questions, describe the explanatory and response variables, and classify each type response as numerical, binomial, or a count.
Examples of responses that are proportions?
For these examples, a binomial distribution may be appropriate for modeling the responses:
The binomial probability function is
\[ \mathcal{P}(y; \mu, m) = \binom{m}{my}\mu^{my} (1-\mu)^{m(1-y)} \]
where:
For the binomial distribution, the unit deviance is
\[ d(y,\mu) = 2 \left( y\log \frac{y}{\mu} + (1-y) \log \frac{1-y}{1-\mu} \right) \]
When \(y = 0, 1\) the limit form of the unit deviance is used.
Interlude on estimating parameters (Ch. 6).
After computing the MLEs \(\hat{\beta}_j\) and corresponding fitted values \(\hat{\mu}\), the residual deviance is the minimized total deviance
\[ D(y,\hat{\mu}) = \sum\limits_{i=1}^n w_i d(y_i, \hat{\mu}_i). \]
For binomial GLMs:
\[ D(y,\hat{\mu}) = \sum_{i=1}^n m_i d(y_i,\hat{\mu}_i) \sim \chi_{n-p'}^2 \]
for a model with \(p'\) parameters in the linear predictor.
turbines
The logit (or logistic) link function is the canonical link for the binomial distribution and the default in R
:
\[ \eta = \log \frac{\mu}{1-\mu} = \text{logit}(\mu). \]
A binomial GLM with the logit link function is often called a logistic regression model. It is specified in R
using link ="logit"
.
The probit link function is
\[ \eta = \Phi^{-1}(\mu) = \text{probit}(\mu), \]
where \(\Phi\) is the CDF for the normal distribution. This link is specified in R
as link="probit"
.
The complementary log-log link function is
\[ \eta = \log\left( - \log(1-\mu)\right). \]
This is specified in R
as link="cloglog"
.
turbines
turbines
library(GLMsData); data(turbines)
tr.logit <- glm( Fissures/Turbines ~ Hours, data=turbines,
family=binomial, weights=Turbines)
tr.probit <- update( tr.logit, family=binomial(link="probit") )
tr.cll <- update( tr.logit, family=binomial(link="cloglog") )
tr.array <- rbind( coef(tr.logit), coef(tr.probit), coef(tr.cll))
tr.array <- cbind( tr.array, c(deviance(tr.logit),
deviance(tr.probit), deviance(tr.cll)) )
colnames(tr.array) <- c("Intercept", "Hours","Residual dev.")
rownames(tr.array) <- c("Logit","Probit","Comp log-log")
tr.array
Intercept Hours Residual dev.
Logit -3.923597 0.0009992372 10.331466
Probit -2.275807 0.0005783211 9.814837
Comp log-log -3.603280 0.0008104936 12.227914
turbines
newHrs <- seq( 0, 5000, length=100)
newdf <- data.frame(Hours=newHrs)
newP.logit <- predict( tr.logit, newdata=newdf, type="response")
newP.probit <- predict( tr.probit, newdata=newdf, type="response")
newP.cll <- predict( tr.cll, newdata=newdf, type="response")
plot( Fissures/Turbines ~ Hours, data=turbines, pch=19, las=1,
xlim=c(0, 5000), ylim=c(0, 0.7),
xlab="Hours run", ylab="Proportion with fissures")
lines( newP.logit ~ newHrs, lty=1, lwd=2)
lines( newP.probit ~ newHrs, lty=2, lwd=2)
lines( newP.cll ~ newHrs, lty=4, lwd=2)
legend("topleft", lwd=2, lty=c(1, 2, 4),
legend=c("Logit","Probit","Comp. log-log"))