GLM: framework that unifies many types of regression
Hands-on work in R
Interactive lecture-style class
Required text: Generalized Linear Models with Examples in R
Types of Work
Weekly homework, typed and submitted as an Rmarkdown file to Canvas (due Fridays)
Two midterm exams: October 4, November 10 (take-home possible)
Final exam: Monday, December 18, 1:15-3:15pm
Applied project
Canvas tour
Set up R
Open your computers and fire up R. Then download the following (instructions on p. 505 of the text):
GLMsData
MASS
statmod
tweedie
Then make sure the box is checked by each of them in the Packages pane (or type, e.g., library(GLMsData) in the console).
Section 1.2: Describing Data
Example: Describing data
library(GLMsData) # Load GLMsData package (if not loaded already)data(lungcap) # Make the dataset lungcap available for usehead(lungcap) # Show the first few lines of lungcap
Age FEV Ht Gender Smoke
1 3 1.072 46 F 0
2 4 0.839 48 F 0
3 4 1.102 48 F 0
4 4 1.389 48 F 0
5 4 1.577 49 F 0
6 4 1.418 49 F 0
Example: Describing data
library(GLMsData) # Load GLMsData package (if not loaded already)data(lungcap) # Make the dataset lungcap available for usehead(lungcap) # Show the first few lines of lungcap
Age FEV Ht Gender Smoke
1 3 1.072 46 F 0
2 4 0.839 48 F 0
3 4 1.102 48 F 0
4 4 1.389 48 F 0
5 4 1.577 49 F 0
6 4 1.418 49 F 0
head(lungcap$Age)
[1] 3 4 4 4 4 4
Example: Describing data
library(GLMsData) # Load GLMsData package (if not loaded already)data(lungcap) # Make the dataset lungcap available for usehead(lungcap) # Show the first few lines of lungcap
Age FEV Ht Gender Smoke
1 3 1.072 46 F 0
2 4 0.839 48 F 0
3 4 1.102 48 F 0
4 4 1.389 48 F 0
5 4 1.577 49 F 0
6 4 1.418 49 F 0
head(lungcap$Age) # Show the first six values of Age
[1] 3 4 4 4 4 4
tail(lungcap$Gender) # Show the last six values of Gender
[1] M M M M M M
Levels: F M
Example: Describing data
library(GLMsData) # Load GLMsData package (if not loaded already)data(lungcap) # Make the dataset lungcap available for usehead(lungcap) # Show the first few lines of lungcap
Age FEV Ht Gender Smoke
1 3 1.072 46 F 0
2 4 0.839 48 F 0
3 4 1.102 48 F 0
4 4 1.389 48 F 0
5 4 1.577 49 F 0
6 4 1.418 49 F 0
head(lungcap$Age) # Show the first six values of Age
[1] 3 4 4 4 4 4
tail(lungcap$Gender) # Show the last six values of Gender
[1] M M M M M M
Levels: F M
length(lungcap$Age)
[1] 654
Example: Describing data
library(GLMsData) # Load GLMsData package (if not loaded already)data(lungcap) # Make the dataset lungcap available for usehead(lungcap) # Show the first few lines of lungcap
Age FEV Ht Gender Smoke
1 3 1.072 46 F 0
2 4 0.839 48 F 0
3 4 1.102 48 F 0
4 4 1.389 48 F 0
5 4 1.577 49 F 0
6 4 1.418 49 F 0
head(lungcap$Age) # Show the first six values of Age
[1] 3 4 4 4 4 4
tail(lungcap$Gender) # Show the last six values of Gender
[1] M M M M M M
Levels: F M
length(lungcap$Age)
[1] 654
dim(lungcap)
[1] 654 5
Talking about data
\(n\) denotes the size of the dataset; \(n = 654\)
We use \(y\) to denote the response; \(y_i\) refers to the \(i\)th value of the response
We typically use \(x\)’s to denote explanatory variables: \(x_1\) is the first explanatory variable, \(x_{1,1}\) the first value of the first explanatory variable, etc.
Factors are explanatory variables that are qualitative, like gender
Covariates are explanatory variables that are quantitative
Age FEV Ht Gender Smoke
Min. : 3.000 Min. :0.791 Min. :46.00 F:318 Min. :0.00000
1st Qu.: 8.000 1st Qu.:1.981 1st Qu.:57.00 M:336 1st Qu.:0.00000
Median :10.000 Median :2.547 Median :61.50 Median :0.00000
Mean : 9.931 Mean :2.637 Mean :61.14 Mean :0.09939
3rd Qu.:12.000 3rd Qu.:3.119 3rd Qu.:65.50 3rd Qu.:0.00000
Max. :19.000 Max. :5.793 Max. :74.00 Max. :1.00000
On Smoke
The variable Smoke is qualitative, but stored as a 0 or 1. We can make explicit that Smoke is a factor as follows:
lungcap$Smoke <-factor(lungcap$Smoke,levels=c(0, 1), # The values of Smokelabels=c("Non-smoker","Smoker")) # The labels
Exploration
Explore the following datasets in the GLMsData package:
punting
lime
dyouth
punting
data(punting)head(punting)
Left Right Punt
1 170 170 162.50
2 130 140 144.00
3 170 180 174.50
4 160 160 163.50
5 150 170 192.00
6 150 150 171.75
dim(punting)
[1] 13 3
str(punting)
'data.frame': 13 obs. of 3 variables:
$ Left : int 170 130 170 160 150 150 180 110 110 120 ...
$ Right: int 170 140 180 160 170 150 170 110 120 130 ...
$ Punt : num 162 144 174 164 192 ...
summary(punting)
Left Right Punt
Min. :110.0 Min. :110.0 Min. :104.8
1st Qu.:130.0 1st Qu.:130.0 1st Qu.:140.2
Median :150.0 Median :150.0 Median :162.0
Mean :143.8 Mean :147.7 Mean :150.3
3rd Qu.:160.0 3rd Qu.:170.0 3rd Qu.:165.2
Max. :180.0 Max. :180.0 Max. :192.0