Plotting Data/Two Components of a Statistical Model
Stat 203 Lecture 2
Dr. Janssen
Plotting Data
Getting Started
Let’s look at how to create some plots in R.
library(GLMsData)data(lungcap)plot( FEV ~ Age, data=lungcap, xlab="Age (in years)", # The x-axis labelylab="FEV (in L)", # The y-axis labelmain="FEV vs age", # The main titlexlim=c(0, 20), # Explicitly set x-axis limitsylim=c(0, 6), # Explicitly set y-axis limitslas=1) # Makes axis labels horizontal
Getting Started
Let’s look at how to create some plots in R.
library(GLMsData)data(lungcap)plot( FEV ~ Age, data=lungcap, xlab="Age (in years)", # The x-axis labelylab="FEV (in L)", # The y-axis labelmain="FEV vs age", # The main titlexlim=c(0, 20), # Explicitly set x-axis limitsylim=c(0, 6), # Explicitly set y-axis limitslas=1) # Makes axis labels horizontal
FEV vs Ht
Code
plot( FEV ~ Ht, data=lungcap, main="FEV vs height",xlab="Height (in inches)", ylab="FEV (in L)",las=1, ylim=c(0, 6) )
FEV vs Gender
Code
plot( FEV ~ Gender, data=lungcap,main="FEV vs gender", ylab="FEV (in L)",las=1, ylim=c(0, 6))
FEV vs Smoke
Code
lungcap$Smoke <-factor(lungcap$Smoke,levels=c(0, 1), labels=c("Non-smoker","Smoker"))plot( FEV ~ Smoke, data=lungcap, main="FEV vs Smoking status",ylab="FEV (in L)", xlab="Smoking status",las=1, ylim=c(0, 6))
FEV vs Age and FEV vs Ht
FEV vs Age
Code
plot( FEV ~ Age,data=subset(lungcap, Smoke=="Smoker"), # Only select smokersmain="FEV vs age\nfor smokers", # \n means `new line'ylab="FEV (in L)", xlab="Age (in years)",ylim=c(0, 6), xlim=c(0, 20), las=1)
Code
plot( FEV ~ Age,data=subset(lungcap, Smoke=="Non-smoker"), # Only select non-smokersmain="FEV vs age\nfor non-smokers",ylab="FEV (in L)", xlab="Age (in years)",ylim=c(0, 6), xlim=c(0, 20), las=1)
FEV vs Height
Code
plot( FEV ~ Ht, data=subset(lungcap, Smoke=="Smoker"),main="FEV vs height\nfor smokers",ylab="FEV (in L)", xlab="Height (in inches)",xlim=c(45, 75), ylim=c(0, 6), las=1)
Code
plot( FEV ~ Ht, data=subset(lungcap, Smoke=="Non-smoker"),main="FEV vs height\nfor non-smokers",ylab="FEV (in L)", xlab="Height (in inches)",xlim=c(45, 75), ylim=c(0, 6), las=1)
Explore!
There are other plots in the text on pp. 8-9; take a look at them, and then explore one of the datasets from last time:
punting
lime
dyouth
lime
Code
data(lime)plot(Foliage ~ DBH, data=subset(lime,Origin=="Planted"),xlab="Diameter (breast height, in cm)",ylab="Foliage (biomass, in kg)",main="Foliage vs Diameter")
Code
boxplot(lime$Foliage ~ lime$Origin, xlab="Origin", ylab="Foliage (biomass, in kg)", main="Foliage vs. Origin")
Code
plot(lime$Foliage ~ lime$Age,xlab="Age of the tree (in years)",ylab="Foliage (biomass, in kg)",main="Foliage vs Age" )
Code
boxplot(lime$Foliage ~ lime$Origin,xlab="Origin",ylab="Foliage (biomass, in kg)",main="Foliage vs Origin",las=1)
Coding for Factors
Mathematizing Factors
How can we incorporate factors in a statistical model?
By coding them.
Example
head(lungcap$Gender)
[1] F F F F F F
Levels: F M
Example
head(lungcap$Gender)
[1] F F F F F F
Levels: F M
contrasts(lungcap$Gender)
M
F 0
M 1
Two Parts of a Statistical Model
The Random Component
Definition 1 For a given combination of explanatory variables, a model for the distribution of a recorded response variable is called the random component.
Definition 2 The systematic component of a model is the mathematical relationship between the mean of the response and the explanatory variables.
Example
Consider the lime dataset. A potential systematic component is
where \(\mu_i = E[y_i]\) is the expected value of \(y_i\), and the \(\beta_j\)’s are unknown regression parameters.
The explanatory variables are the \(x_i\)’s: DBH, Age, Origin.
The random component can take many forms; if we assume \(y_i \sim N(\mu_i,\sigma^2)\), we are assuming that the \(y_i\)’s are normally distributed about \(\mu_i\) with constant variance \(\sigma^2\).