Transforming the Response

Stat 203 Lecture 12

Dr. Janssen

Symmetry, Constraints, and the Ladder of Powers

The basic idea

  • Data is collected to measure a particular response \(y\)
  • We apply an invertible function \(h\) and fit a linear regression model to \(y^* = h(y)\).

Three main reasons to do so

  • Transforming the measurement scale can avoid difficulties with constraints on the linear regression coefficients
  • Transforming the response can cause its distribution to be more nearly normally distributed.
  • Stabilizing variance

Variance-Stabilizing Transformations

A common situation

Again, assume our variable \(y\) can take on only positive values. Further assume that it can vary by orders of magnitude in the same dataset.

Question

Where is the variance likely to be small: when \(\mu\) is close to 0, or when \(\mu\) is large? Why?

Removing mean-variance relationship

If \(y\) takes positive values, the ladder of powers may be useful:

  • If the mean-variance relationship is increasing, a power transformation with \(\lambda < 1\) will reduce or remove it.
  • If the mean-variance relationship is decreasing, a power transformation with \(\lambda > 1\) will reduce or remove it.

More generally

Suppose \(y\) has a mean-variance relationship defined by the function \(V(\mu)\), with \(\text{var}[y] = \phi V(\mu)\), and consider a transformation \(y^* = h(y)\).

  • When \(V(\mu) = \mu^2\), the variance-stabilizing transformation is the logarithm, as then \(h'(\mu) = 1/\mu\).
  • When \(V(\mu) = \mu\), the variance-stabilizing transformation is the square root, as \(h'(\mu) = 1/\mu^{1/2}\).

Common situations

  • Most common transformations appear on a ladder of powers
  • Milder are close to \(\lambda = 1\); more severe further away
  • Start with a mild transformation!
  • The log transformation is the most common
  • If \(y\) is a proportion, an arcsine-square root transformation may be useful: \(y^* = \arcsin(\sqrt{y})\)
  • If \(y\) takes negative values, we cannot use power transformations with \(\lambda \le 0\).

Example: lungcap

LC.lm <- lm( FEV ~ Ht + Gender + Smoke, data=lungcap)
scatter.smooth( rstandard( LC.lm ) ~ lungcap$Ht, col="grey",
    las=1, ylab="Standardized residuals", xlab="Height (inches)")

Example: lungcap

LC.lm <- lm( FEV ~ Ht + Gender + Smoke, data=lungcap)
LC.sqrt <- update( LC.lm, sqrt(FEV) ~ .)
scatter.smooth( rstandard(LC.sqrt)~fitted(LC.sqrt), las=1, col="grey",
    ylab="Standardized residuals", xlab="Fitted values",
    main="Square-root transformation")

Example: lungcap

LC.lm <- lm( FEV ~ Ht + Gender + Smoke, data=lungcap)
LC.log <- update( LC.lm, log(FEV) ~ .)
scatter.smooth( rstandard(LC.log)~fitted(LC.log), las=1, col="grey",
    ylab="Standardized residuals", xlab="Fitted values",
    main="Log transformation")

Box-Cox Transformations

\[ y^* = \begin{cases} \frac{y^\lambda - 1}{\lambda} & \text{for } \lambda\ne 0\\ \log(y) & \text{for } \lambda = 0. \end{cases} \]

This is continuous in \(\lambda\), as

\[ \lim\limits_{\lambda\to 0} \frac{y^\lambda -1}{\lambda} = \log(y). \]

Example of Box-Cox

library(MASS)
boxcox( FEV ~ Ht + Gender + Smoke,
         lambda=seq(-0.25, 0.25, length=11), data=lungcap)