Estimation for Multiple Regression

Stat 203 Lecture 6

Dr. Janssen

Coefficient Estimates

Goal

Minimize the sum of squared deviations:

\[ S = \sum\limits_{i=1}^n w_i (y_i - \mu_i)^2 \]

Definition 1 The least-squares estimators of the \(\beta_j\)’s are the values that minimize \(S\), and are denoted \(\hat\beta_0, \ldots, \hat\beta_p\).

By calculus, this value of \(S\) occurs when

\[ \frac{\partial S}{\partial \beta_j}= 0 \text{ for } j= 0, 1, \ldots, p. \]

The least-squares estimators

Solutions:

\[ \hat\beta_j = \frac{\sum_{i=1}^n w_i x_{ij}^* y_i}{\sum_{i=1}^n w_i (x_{ij}^*)^2} \]

where \(x_{ij}^*\) give the values for the \(j\)th explanatory variable \(x_j\) after begin adjusted for all the other explanatory variables \(x_0, \ldots, x_p\) apart from \(x_j\).

Fitted values, residuals, and variance

The fitted values are

\[ \hat\mu_i = \hat\beta_0 + \sum\limits_{j=1}^p \hat\beta_j x_{ji}. \]

The residuals are the deviations of the responses from the fitted values:

\[ r_i = y_i - \hat\mu_i. \]

The residual sum-of-squares (RSS) is:

\[ \text{RSS} = \sum\limits_{i=1}^n w_i (y_i - \hat\mu_i)^2. \]

The d.f. associated with RSS is \(n - p'\), so

\[ s^2 = \frac{\sum_{i=1}^n w_i (y_i - \hat\mu_i)^2}{n-p'} = \frac{\text{RSS}}{n-p'}. \]

Standard Errors

Write \(\mathcal{I}_j^* = \sum_{i=1}^n w_i \left(x_{ij}^*\right)^2\) for the sum of the squares of the \(j\)th explanatory variable adjusted for the other variables. Then:

\[ \var[\hat\beta_j] = \sigma^2/\mathcal{I}_j^*. \]

As before, we estimate \(\var[\hat\beta_j]\) by \(\widehat\var[\hat\beta_j]\) by substituting \(s^2\) for \(\sigma^2\); then the standard error is

\[ \text{se}(\hat\beta_j) = s/\sqrt{\mathcal{I}_j^*} \]

Example: lungcap

library(GLMsData); data(lungcap)
scatter.smooth( lungcap$Ht, lungcap$FEV, las=1, col="grey",
    ylim=c(0, 6), xlim=c(45, 75), # Use similar scales for comparisons
    main="FEV", xlab="Height (in inches)", ylab="FEV (in L)" )

scatter.smooth( lungcap$Ht, log(lungcap$FEV), las=1, col="grey",
    ylim=c(-0.5, 2), xlim=c(45, 75), # Use similar scales for comparisons
    main="log of FEV", xlab="Height (in inches)", ylab="log of FEV (in L)")

Example: lungcap

A potential model for \(y = \log(\)FEV\()\) could thus be:

\[ \begin{cases} \var[y_i] = \sigma^2 \\ \mu_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4, \end{cases} \]

where \(x_1\) is height, \(x_2\) is age, \(x_3\) is the dummy variable for gender, and \(x_4\) is the dummy variable for smoking.

model1 = lm(FEV ~ Age + Ht + factor(Gender) + Smoke, data=lungcap)
summary(model1)
model2 = lm(log(FEV) ~ Age + Ht + factor(Gender) + Smoke, data=lungcap)
summary(model2)

Output

Code
model1 = lm(FEV ~ Age + Ht + factor(Gender) + Smoke, data=lungcap)
summary(model1)

Call:
lm(formula = FEV ~ Age + Ht + factor(Gender) + Smoke, data = lungcap)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.37656 -0.25033  0.00894  0.25588  1.92047 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -4.456974   0.222839 -20.001  < 2e-16 ***
Age              0.065509   0.009489   6.904 1.21e-11 ***
Ht               0.104199   0.004758  21.901  < 2e-16 ***
factor(Gender)M  0.157103   0.033207   4.731 2.74e-06 ***
Smoke           -0.087246   0.059254  -1.472    0.141    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4122 on 649 degrees of freedom
Multiple R-squared:  0.7754,    Adjusted R-squared:  0.774 
F-statistic:   560 on 4 and 649 DF,  p-value: < 2.2e-16
Code
model2 = lm(log(FEV) ~ Age + Ht + factor(Gender) + Smoke, data=lungcap)
summary(model2)

Call:
lm(formula = log(FEV) ~ Age + Ht + factor(Gender) + Smoke, data = lungcap)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.63278 -0.08657  0.01146  0.09540  0.40701 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -1.943998   0.078639 -24.721  < 2e-16 ***
Age              0.023387   0.003348   6.984  7.1e-12 ***
Ht               0.042796   0.001679  25.489  < 2e-16 ***
factor(Gender)M  0.029319   0.011719   2.502   0.0126 *  
Smoke           -0.046068   0.020910  -2.203   0.0279 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1455 on 649 degrees of freedom
Multiple R-squared:  0.8106,    Adjusted R-squared:  0.8095 
F-statistic: 694.6 on 4 and 649 DF,  p-value: < 2.2e-16
[1] "Foliage" "DBH"     "Age"     "Origin" 

Call:
lm(formula = sqrt(Foliage) ~ DBH + Age + factor(Origin), data = lime)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.63455 -0.24347 -0.00849  0.18550  1.89247 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)            0.089966   0.056507   1.592   0.1122    
DBH                    0.076356   0.004704  16.234   <2e-16 ***
Age                   -0.002520   0.001622  -1.554   0.1209    
factor(Origin)Natural -0.120855   0.046911  -2.576   0.0104 *  
factor(Origin)Planted  0.135408   0.060717   2.230   0.0263 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4008 on 380 degrees of freedom
Multiple R-squared:  0.6733,    Adjusted R-squared:  0.6698 
F-statistic: 195.8 on 4 and 380 DF,  p-value: < 2.2e-16

Matrix Formulation

Notation

Let the \(n\times 1\) vector of responses be denoted by \(\y\), and the \(n\times p'\) matrix of explanatory variables, called the model matrix, by \(X = [\x_0, \x_1, \ldots, \x_p]\), where \(\x_j\) is the \(n\times 1\) vector of values for \(x_j\).

We write \(\x_0\) as the vector of ones (constant term) for convenience. Then the matrix form of the linear regression model is

\[ \begin{cases} \var[\y] = W^{-1} \sigma^2\\ \bmu = X \bbeta, \end{cases} \]

where \(E[\y] = \bmu\) and \(W^{-1}\) is a known, positive-definite symmetric matrix of size \(n\times n\) called the covariance matrix.

A special case occurs when \(w_{ii} = 1/w_i\), and the off-diagonal entries are 0. Most commonly, the observations are weighted identically and \(W^{-1} = I_n\).

Example: gestation

\[ \begin{align} \y &= \begin{bmatrix} 0.520 \\ 0.700 \\ 1.000 \\ \vdots \\ 3.612 \\ 3.390 \\ 3.740 \end{bmatrix};\\ X &= \begin{bmatrix} 1 & 22 \\ 1 & 23 \\ 1 & 25 \\ \vdots & \vdots \\ 1 & 42 \\ 1 & 43 \\ 1 & 44 \end{bmatrix}; \\ W^{-1} &= \begin{bmatrix} 1/1 & 0 & 0 &\cdots & 0 & 0 & 0 \\ 0 & 1/1 & 0 & \cdots & 0 & 0 & 0 \\ 0 & 0 & 1/1 & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 1/53 & 0 & 0 \\ 0 & 0 & 0 & \cdots & 0 & 1/9 & 0 \\ 0 & 0 & 0 & \cdots & 0 & 0 & 1/1 \end{bmatrix}. \end{align} \]

Coefficient estimates

Using matrix notation, the (weighted) sum-of-squares deviations can be written as

\[ S = (\y - \bmu)^T W (\y-\bmu), \]

where \(\bmu = X\bbeta\).

When we differentiate \(S\) with respect to \(\bbeta\) and set equal to zero, we find the minimum occurs when

\[ X^T W X \bbeta = X^T W \y \Rightarrow \hat\bbeta = (X^T W X)^{-1} X^T W \y. \]

Example: lungcap

data(lungcap)
lungcap$Smoke <- factor(lungcap$Smoke, levels=c(0, 1),
                     labels=c("Non-smoker","Smoker"))
Xmat <- model.matrix( ~ Age + Ht + factor(Gender) + factor(Smoke),
                      data=lungcap)
XtX <- t(Xmat) %*% Xmat  # t() is transpose; %*% is matrix multiply
y <- log(lungcap$FEV)
inv.XtX <- solve( XtX )  # solve  returns the matrix inverse
XtY <- t(Xmat) %*% y
beta <- inv.XtX %*% XtY; drop(beta)

Example: lungcap

Code
data(lungcap)
lungcap$Smoke <- factor(lungcap$Smoke, levels=c(0, 1),
                     labels=c("Non-smoker","Smoker"))
Xmat <- model.matrix( ~ Age + Ht + factor(Gender) + factor(Smoke),
                      data=lungcap)
XtX <- t(Xmat) %*% Xmat  # t() is transpose; %*% is matrix multiply
y <- log(lungcap$FEV)
inv.XtX <- solve( XtX )  # solve  returns the matrix inverse
XtY <- t(Xmat) %*% y
beta <- inv.XtX %*% XtY; drop(beta)
        (Intercept)                 Age                  Ht     factor(Gender)M 
        -1.94399818          0.02338721          0.04279579          0.02931936 
factor(Smoke)Smoker 
        -0.04606754 

Exploration

Head to https://mkjanssen.org/courses/glm/glm_f23.html, and choose Worksheet 7.