Hands-on with GLMs I

Stat 203 Lecture 23

Dr. Janssen

Explorations

Exploring blocks

Children were asked to build towers as high as they could out of cubical and cylindrical blocks. The number of blocks used and time taken were recorded and are available in the blocks dataset in the GLMsData package. For now, we’ll consider only the number of blocks \(y\) and the age of the child \(x\).

Explore the data; plot the number of blocks used against the age of the child. Propose a GLM for the data, identifying the EDM, parameters, and systematic component.

Code
library(GLMsData); data(blocks);

par(mfrow=c(1, 2))
plot(jitter(Number)~Age, data=blocks)
plot( Number~cut(Age, 3), data=blocks)

Case Study: Household Size in the Philippines

Setting the Stage

How many other people live with you in your home? The number of people sharing a house differs from country to country and often from region to region. International agencies use household size when determining needs of populations, and the household sizes determine the magnitude of the household needs.

Regions of the Philippines.

Importing Data

fHH1 <- read.csv("https://prof.mkjanssen.org/glm/data/fHH1.csv")
head(fHH1)
  X     location age total numLT5                          roof
1 1 CentralLuzon  65     0      0 Predominantly Strong Material
2 2  MetroManila  75     3      0 Predominantly Strong Material
3 3  DavaoRegion  54     4      0 Predominantly Strong Material
4 4      Visayas  49     3      0 Predominantly Strong Material
5 5  MetroManila  74     3      0 Predominantly Strong Material
6 6      Visayas  59     6      0 Predominantly Strong Material
  • location = where the house is located (Central Luzon, Davao Region, Ilocos Region, Metro Manila, or Visayas)
  • age = the age of the head of household
  • total = the number of people in the household other than the head
  • numLT5 = the number in the household under 5 years of age
  • roof = the type of roof in the household (either Predominantly Light/Salvaged Material, or Predominantly Strong Material, where stronger material can sometimes be used as a proxy for greater wealth)

Questions

Exploration

In groups of 2-3, brainstorm some questions about this data.

Exploratory Data Analysis

Code
m <- mean(fHH1$total)
v <- var(fHH1$total)
c(Mean=m,Variance=v)
    Mean Variance 
3.684667 5.534254 
Code
#sd(fHH1$total)

ggplot(fHH1, aes(total)) + 
  geom_histogram(binwidth = .25, color = "black", 
                 fill = "white") + 
  xlab("Number in the house excluding head of household") +
  ylab("Count of households")

Code
cuts = cut(fHH1$age,
           breaks=c(15,20,25,30,35,40,45,50,55,60,65,70))
ageGrps <- data.frame(cuts,fHH1)

ggplot(data = ageGrps, aes(x = total)) +
  geom_histogram(binwidth = .25, color = "black", 
                 fill = "white") +
  facet_wrap(cuts) +
  xlab("Household size")

Fitting a GLM

modela = glm(total ~ age, family = poisson, data = fHH1)

Fitting a GLM

modela = glm(total ~ age, family = poisson, data = fHH1)
coef(summary(modela))
                Estimate   Std. Error   z value      Pr(>|z|)
(Intercept)  1.549942225 0.0502754106 30.829032 1.070156e-208
age         -0.004705881 0.0009363388 -5.025832  5.012548e-07

Question

How can we interpret these coefficients?

Deviance

modela_dev <- summary(modela)$deviance
modela_disp <- summary(modela)$dispersion

c(Deviance=modela_dev, Dispersion=modela_disp)
  Deviance Dispersion 
  2337.089      1.000 

Confidence Intervals

confint(modela)

Confidence Intervals

confint(modela)
                   2.5 %       97.5 %
(Intercept)  1.451170100  1.648249185
age         -0.006543163 -0.002872717
exp(confint(modela))
                2.5 %    97.5 %
(Intercept) 4.2681057 5.1978713
age         0.9934782 0.9971314

Question

How should these be interpreted? What does significance look like?

Your Turn

Exploration: Singapore Auto

Data: https://prof.mkjanssen.org/glm/data/SingaporeAuto.csv

Description: https://prof.mkjanssen.org/glm/data/SingaporeAutoDescriptions.pdf