Plotting Data/Two Components of a Statistical Model

Stat 203 Lecture 2

Dr. Janssen

Plotting Data

Getting Started

Let’s look at how to create some plots in R.

library(GLMsData)
data(lungcap)
plot( FEV ~ Age, data=lungcap, 
       xlab="Age (in years)",  # The x-axis label
       ylab="FEV (in L)",      # The y-axis label
       main="FEV vs age",      # The main title
       xlim=c(0, 20),          # Explicitly set x-axis limits
       ylim=c(0, 6),           # Explicitly set y-axis limits
       las=1)                  # Makes axis labels horizontal

Getting Started

Let’s look at how to create some plots in R.

library(GLMsData)
data(lungcap)
plot( FEV ~ Age, data=lungcap, 
       xlab="Age (in years)",  # The x-axis label
       ylab="FEV (in L)",      # The y-axis label
       main="FEV vs age",      # The main title
       xlim=c(0, 20),          # Explicitly set x-axis limits
       ylim=c(0, 6),           # Explicitly set y-axis limits
       las=1)                  # Makes axis labels horizontal

`FEV` vs `Ht`

Code

plot( FEV ~ Ht, data=lungcap, main="FEV vs height",
       xlab="Height (in inches)", ylab="FEV (in L)",
       las=1, ylim=c(0, 6) )

`FEV` vs `Gender`

Code

plot( FEV ~ Gender, data=lungcap,
       main="FEV vs gender", ylab="FEV (in L)",
       las=1, ylim=c(0, 6))

`FEV` vs `Smoke`

Code

lungcap$Smoke <- factor(lungcap$Smoke,
                  levels=c(0, 1),                 
                  labels=c("Non-smoker","Smoker"))
plot( FEV ~ Smoke,  data=lungcap, main="FEV vs Smoking status",
       ylab="FEV (in L)", xlab="Smoking status",
       las=1, ylim=c(0, 6))

`FEV` vs `Age` and `FEV` vs `Ht`

FEV vs Age

Code

plot( FEV ~ Age,
    data=subset(lungcap, Smoke=="Smoker"),  # Only select smokers
    main="FEV vs age\nfor smokers",         # \n means `new line'
    ylab="FEV (in L)", xlab="Age (in years)",
    ylim=c(0, 6), xlim=c(0, 20), las=1)

Code

plot( FEV ~ Age,
    data=subset(lungcap, Smoke=="Non-smoker"),  # Only select non-smokers
    main="FEV vs age\nfor non-smokers",
    ylab="FEV (in L)", xlab="Age (in years)",
    ylim=c(0, 6), xlim=c(0, 20), las=1)

FEV vs Height

Code

plot( FEV ~ Ht, data=subset(lungcap, Smoke=="Smoker"),
    main="FEV vs height\nfor smokers",
    ylab="FEV (in L)", xlab="Height (in inches)",
    xlim=c(45, 75), ylim=c(0, 6), las=1)

Code

plot( FEV ~ Ht, data=subset(lungcap, Smoke=="Non-smoker"),
    main="FEV vs height\nfor non-smokers",
    ylab="FEV (in L)",  xlab="Height (in inches)",
    xlim=c(45, 75), ylim=c(0, 6), las=1)

Explore!

There are other plots in the text on pp. 8-9; take a look at them, and then explore one of the datasets from last time:

punting
lime
dyouth

`lime`

Code

data(lime)
plot(Foliage ~ DBH, data=subset(lime,Origin=="Planted"),
     xlab="Diameter (breast height, in cm)",
     ylab="Foliage (biomass, in kg)",
        main="Foliage vs Diameter")

Code

boxplot(lime$Foliage ~ lime$Origin, xlab="Origin", ylab="Foliage (biomass, in kg)", main="Foliage vs. Origin")

Code

plot(lime$Foliage ~ lime$Age,
     xlab="Age of the tree (in years)",
     ylab="Foliage (biomass, in kg)",
     main="Foliage vs Age"
     )

Code

boxplot(lime$Foliage ~ lime$Origin,
        xlab="Origin",
        ylab="Foliage (biomass, in kg)",
        main="Foliage vs Origin",
        las=1)

lime {GLMsData} R Documentation Small-leaved lime trees Description Data from small-leaved lime trees grown in Russia

Usage data(lime) Format A data frame containing 385 observations with the following 4 variables.

Foliage the foliage biomass, in kg (oven dried matter)

DBH the tree diameter, at breast height, in cm

Age the age of the tree, in years

Origin the origin of the tree; one of Coppice, Natural, Planted

Details The data give measurements from small-leaved lime trees (Tilia cordata) growing in Russia.

Source Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir A; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael (2017): Biomass tree data base. doi:10.1594/PANGAEA.871491, In supplement to: Schepaschenko, D et al. (2017): A dataset of forest biomass structure for Eurasia. Scientific Data, 4, 170070, doi:10.1038/sdata.2017.70. Extracted from https://doi.pangaea.de/10.1594/PANGAEA.871491

References The source (Schepaschenko et al.) obtains the data from various sources:

Dylis N.V., Nosova L.M. (1977) Biomass of forest biogeocenoses under Moscow region. Moscow: Nauka Publishing.

Gabdelkhakov A.K. (2015) Tilia cordata Mill. tree biomass in plantations and coppice forests. Eco-potential. No. 3 (11). p. 7–16.

Gabdelkhakov A.K. (2005) Tilia cordata Mill. tree biomass in plantations. Ural forests and their management. Issue 26. Yekaterinburg: USFEU. p. 43–51.

Polikarpov N.P. (1962) Scots pine young forest dynamics on clear cut. Moscow: Academy of Sci. USSR.

Prokopovich E.V. (1995) Ecological conditions of soil forming and biological cycle of matters in spruce forests of the Middle Ural. Ph.D. Thesis. Ekaterinburg: Plant and Animals Ecology Institute.

Remezov N.P., Bykova L.N., Smirnova K.M. (1959) Uptake and cycling of nitrogen and ash elements in forests of European part of USSR. Moscow: State University.

Smirnov V.V. (1971) Organic mass of certain forest phytocoenoses at European part of USSR. Moscow: Nauka.

Uvarova S.S. (2005) Biomass dynamics of Tilia cordata trees on the example of Achit forest enterprise of Sverdlovsk region. Ural forests and their management. Issue 26. Ekaterinburg: State Forest Engineering University, p. 38–40.

Uvarova S.S. (2006) Growth and biomass of Tilia cordata forests of Sverdlovsk region Dissertation. Ekaterinburg: State Forest Engineering University. (USFEU library)

Examples Run examples

data(lime) summary(lime) [Package GLMsData version 1.4 Index]

Coding for Factors

Mathematizing Factors

How can we incorporate factors in a statistical model?
By coding them.

Example

head(lungcap$Gender)

[1] F F F F F F
Levels: F M

Example

head(lungcap$Gender)

[1] F F F F F F
Levels: F M

contrasts(lungcap$Gender)

  M
F 0
M 1

Two Parts of a Statistical Model

The Random Component

Definition 1 For a given combination of explanatory variables, a model for the distribution of a recorded response variable is called the random component.

Definition 2 The systematic component of a model is the mathematical relationship between the mean of the response and the explanatory variables.

Example

Consider the lime dataset. A potential systematic component is

\[ \mu_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} +\beta_4 x_{4i}, \qquad(1)\]

where \(\mu_i = E[y_i]\) is the expected value of \(y_i\), and the \(\beta_j\)’s are unknown regression parameters.

The explanatory variables are the \(x_i\)’s: DBH, Age, Origin.

The random component can take many forms; if we assume \(y_i \sim N(\mu_i,\sigma^2)\), we are assuming that the \(y_i\)’s are normally distributed about \(\mu_i\) with constant variance \(\sigma^2\).

Major Goal

In this class, we’ll explore:

Linear regression models
Generalized linear models

Plotting Data/Two Components of a Statistical Model

Plotting Data

Getting Started

Getting Started

FEV vs Ht

FEV vs Gender

FEV vs Smoke

FEV vs Age and FEV vs Ht

FEV vs Age

FEV vs Height

Explore!

lime

Coding for Factors

Mathematizing Factors

Example

Example

Two Parts of a Statistical Model

The Random Component

Example

Major Goal

`FEV` vs `Ht`

`FEV` vs `Gender`

`FEV` vs `Smoke`

`FEV` vs `Age` and `FEV` vs `Ht`

`lime`