Example R programs and commands 16. Multiple linear regression # All lines preceded by the "#" character are my comments. # All other left-justified lines are my input. # All other indented lines are the R program output. # Read the data: # Y X1 X2 X3 X4 # -------------------------------------------------- data<-scan() 51.4 0.2 17.8 24.6 18.9 72.0 1.9 29.4 20.7 8.0 53.2 0.2 17.0 18.5 22.6 83.2 10.7 30.2 10.6 7.1 57.4 6.8 15.3 8.9 27.3 66.5 10.6 17.6 11.1 20.8 98.3 9.6 35.6 10.6 5.6 dim(data)<-c(5,7) # coerce the data into a 5x7 matrix, TRANSPOSED! y<-data[1,] # first row of data == first column of table x<-data[2:5,] # rows 2,3,4,5 == columns X1,X2,X3,X4 x1<-x[1,] x2<-x[2,] x3<-x[3,] x4<-x[4,] # Regression coefficients: lm(y~x1+x2+x3+x4) Call: lm(formula = y ~ x1 + x2 + x3 + x4) Coefficients: (Intercept) x1 x2 x3 x4 -67.3175 2.1318 3.4988 0.8889 1.8556 # Individual t-tests for H0:beta1=0,..., H0:beta4=0, H0:alpha=0: summary(lm(y~x1+x2+x3+x4)) Call: lm(formula = y ~ x1 + x2 + x3 + x4) Residuals: 1 2 3 4 5 6 7 -0.9232 -0.8408 2.2322 -0.5524 -1.8780 1.1796 0.7826 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -67.3175 55.4048 -1.215 0.3483 x1 2.1318 0.8756 2.435 0.1353 x2 3.4988 1.0229 3.420 0.0759 . x3 0.8889 0.8139 1.092 0.3888 x4 1.8556 1.0431 1.779 0.2172 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.488 on 2 degrees of freedom Multiple R-Squared: 0.993, Adjusted R-squared: 0.979 F-statistic: 70.96 on 4 and 2 DF, p-value: 0.01394 # NOTE: the F statistic is for the ANOVA test of # H0:beta1=beta2=beta3=beta4=0 # Result: reject H0 at the 0.05 level. # Get a predicted value Y_hat from the model: predict(lm(y~x1+x2+x3+x4), newdata=data.frame( x1=5.2, x2=21.3, x3=19.7, x4=12.2)) # ...alternatively fit <- lm(y~x1+x2+x3+x4) new<-data.frame( x1=5.2, x2=21.3, x3=19.7, x4=12.2) predict(fit, new) # ...which makes getting the 99% confidence interval clearer: predict(fit, new, interval="confidence", level=0.99) # Stepwise elimination of X's summary(lm(y~x1+x2+x3+x4)) Call: lm(formula = y ~ x1 + x2 + x3 + x4) Residuals: 1 2 3 4 5 6 7 -0.9232 -0.8408 2.2322 -0.5524 -1.8780 1.1796 0.7826 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -67.3175 55.4048 -1.215 0.3483 x1 2.1318 0.8756 2.435 0.1353 x2 3.4988 1.0229 3.420 0.0759 . x3 0.8889 0.8139 1.092 0.3888 # biggest p value x4 1.8556 1.0431 1.779 0.2172 ... # ... remove the variable with the biggest p value, namely x3 summary(lm(y~x1+x2+x4)) Call: lm(formula = y ~ x1 + x2 + x4) Residuals: 1 2 3 4 5 6 7 0.4884 -1.2575 1.1854 -1.9929 -2.2712 1.8845 1.9632 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -10.6925 20.1510 -0.531 0.6325 x1 1.2116 0.2458 4.929 0.0160 * x2 2.5462 0.5513 4.618 0.0191 * x4 0.8486 0.5032 1.686 0.1903 # biggest p value ... # ... remove the variable with the biggest p value, namely x4 summary(lm(y~x1+x2)) Call: lm(formula = y ~ x1 + x2) Residuals: 1 2 3 4 5 6 7 -1.0421 -1.8181 2.0746 -3.7601 0.2034 0.4116 3.9306 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.8777 3.7890 6.038 0.003794 ** x1 1.3438 0.2816 4.772 0.008826 ** x2 1.6458 0.1663 9.898 0.000585 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.102 on 4 degrees of freedom Multiple R-Squared: 0.9782, Adjusted R-squared: 0.9674 F-statistic: 89.92 on 2 and 4 DF, p-value: 0.0004734 # ...and stop, since we have found significant dependence on x1,x2.