Transforming the Response

Stat 203 Lecture 12

Dr. Janssen

Symmetry, Constraints, and the Ladder of Powers

The past few sections have described a variety of diagnostics for identifying different types of weaknesses in the fitted model. The next few sections will consider some standard strategies for modifying the fitted model in order to remedy or ameliorate specific problems.

One commonly-occurring problem is that the response is recorded on a measurement scale for which the variance increases or decreases with the mean. If this is the case, the variance can often be stabilized by transforming the response to a different scale.

Sometimes we can fix a nonlinear relationship between \(y\) and \(x\) by transforming \(x\); more generally, a complex relationship between a covariate and the response signals the need to build further terms into the model to capture the relationship.

However, we typically have to settle the measurement scale of \(y\) first, because any transformation of \(y\) will impact the shape of its relationships with the covariates.

The basic idea

Data is collected to measure a particular response \(y\)
We apply an invertible function \(h\) and fit a linear regression model to \(y^* = h(y)\).

Three main reasons to do so

Transforming the measurement scale can avoid difficulties with constraints on the linear regression coefficients
Transforming the response can cause its distribution to be more nearly normally distributed.
Stabilizing variance

FEV should not be negative, but this might artificially constrain our choices for regression coefficients. So we convert to \(\log(\text{FEV})\) to be more flexible. Then any possible value for \(\log(\text{FEV})\) can translate to a valid FEV. Another possibility is that \(y\) represents a count for which zero is a possible value; then we could apply a log transform of the form \(y^* = \log(y+0.5)\) or \(y^* = \log(y+1)\) to avoid taking the log of 0. For transforming a count out of a possible total of \(n\), we sometimes transform to \(y^* = \log\left(\frac{y+0.5}{n+0.5}\right)\).
trying to make \(y\)-values more symmetric. again, thinking about concentration of hydrogen ions, this is strictly positive, usually close to zero, but potentially varying by orders of magnitude. By contrast, pH levels are typically more symmetrically distributed, so more likely to be normally distributed. in general, right-skew distributions arise most commonly when the response measures a physical quantity that can only take positive values. In this case, a log transformation, or a power transformation \(y^* = y^\lambda\) (with \(\lambda < 1\)) will reduce right skewness.

Common values for \(\lambda\) make up what is called the ladder of powers; the smaller \(\lambda\) is chosen, the stronger the transformation. Too small will reverse from right skew to left.

procedure: start with \(\lambda\) near 1, decrease until symmetry of residuals is roughly achieved.

See p. 118 for a table of ladder of powers.

Variance-Stabilizing Transformations

A common situation

Again, assume our variable \(y\) can take on only positive values. Further assume that it can vary by orders of magnitude in the same dataset.

Question

Where is the variance likely to be small: when \(\mu\) is close to 0, or when \(\mu\) is large? Why?

close to 0! the responses are only positive, yet the expected value is small, so the responses also have to be small (if \(y\) could be negative, could have some positive and negative observations cancelling one another out)

In these cases, we say that \(y\) shows a positive mean-variance relationship.

In the scientific literature, the uncertainty of physical measurements of positive quantities are often expressed in terms of the coefficient of variation (standard deviation divided by the mean) instead of in terms of variance or standard deviation. This is because the coefficient of variation often tends to be more nearly constant across cases than is the standard deviation, so it is more useful to express variability in relative terms rather than in absolute terms.

Mathematically, this means that the variance is porportional to the mean squared, \(\text{var}[y] = \phi \mu^2\) for some \(\phi\). In such cases, \(y\) is said to have a quadratic mean-variance relationship.

The strongest motivation for transforming the response is usually to try to move the mean-variance relationship.

Removing mean-variance relationship

If \(y\) takes positive values, the ladder of powers may be useful:

If the mean-variance relationship is increasing, a power transformation with \(\lambda < 1\) will reduce or remove it.
If the mean-variance relationship is decreasing, a power transformation with \(\lambda > 1\) will reduce or remove it.

More generally

Suppose \(y\) has a mean-variance relationship defined by the function \(V(\mu)\), with \(\text{var}[y] = \phi V(\mu)\), and consider a transformation \(y^* = h(y)\).

When \(V(\mu) = \mu^2\), the variance-stabilizing transformation is the logarithm, as then \(h'(\mu) = 1/\mu\).
When \(V(\mu) = \mu\), the variance-stabilizing transformation is the square root, as \(h'(\mu) = 1/\mu^{1/2}\).

Common situations

Most common transformations appear on a ladder of powers
Milder are close to \(\lambda = 1\); more severe further away
Start with a mild transformation!
The log transformation is the most common
If \(y\) is a proportion, an arcsine-square root transformation may be useful: \(y^* = \arcsin(\sqrt{y})\)
If \(y\) takes negative values, we cannot use power transformations with \(\lambda \le 0\).

Example: `lungcap`

LC.lm <- lm( FEV ~ Ht + Gender + Smoke, data=lungcap)
scatter.smooth( rstandard( LC.lm ) ~ lungcap$Ht, col="grey",
    las=1, ylab="Standardized residuals", xlab="Height (inches)")

Example: `lungcap`

LC.lm <- lm( FEV ~ Ht + Gender + Smoke, data=lungcap)
LC.sqrt <- update( LC.lm, sqrt(FEV) ~ .)
scatter.smooth( rstandard(LC.sqrt)~fitted(LC.sqrt), las=1, col="grey",
    ylab="Standardized residuals", xlab="Fitted values",
    main="Square-root transformation")

Example: `lungcap`

LC.lm <- lm( FEV ~ Ht + Gender + Smoke, data=lungcap)
LC.log <- update( LC.lm, log(FEV) ~ .)
scatter.smooth( rstandard(LC.log)~fitted(LC.log), las=1, col="grey",
    ylab="Standardized residuals", xlab="Fitted values",
    main="Log transformation")

Box-Cox Transformations

\[ y^* = \begin{cases} \frac{y^\lambda - 1}{\lambda} & \text{for } \lambda\ne 0\\ \log(y) & \text{for } \lambda = 0. \end{cases} \]

This is continuous in \(\lambda\), as

\[ \lim\limits_{\lambda\to 0} \frac{y^\lambda -1}{\lambda} = \log(y). \]

Example of Box-Cox

library(MASS)
boxcox( FEV ~ Ht + Gender + Smoke,
         lambda=seq(-0.25, 0.25, length=11), data=lungcap)

Transforming the Response

Symmetry, Constraints, and the Ladder of Powers

The basic idea

Three main reasons to do so

Variance-Stabilizing Transformations

A common situation

Removing mean-variance relationship

More generally

Common situations

Example: lungcap

Example: lungcap

Example: lungcap

Box-Cox Transformations

Example of Box-Cox

Example: `lungcap`

Example: `lungcap`

Example: `lungcap`