MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 1 LET'S USE PROC IML TO ANALYZE THE REGRESSION 21:30 Monday, November 17, 2008 YY XX Y is 10 and the matrix X is 1.00 1.30 1.80 0.40 23 1.00 1.00 1.60 1.10 24 1.00 1.10 0.70 2.00 21 1.00 1.90 2.30 0.50 32 1.00 2.30 2.10 2.40 28 1.00 1.60 0.90 1.30 19 1.00 1.40 2.50 1.50 27 1.00 2.40 2.60 1.40 35 1.00 2.10 2.90 2.20 34 1.00 1.80 1.40 1.90 42 1.00 1.60 0.70 2.20 33 1.00 2.40 2.80 1.70 24 1.00 1.40 1.20 1.30 23 1.00 2.30 3.40 1.40 23 1.00 2.30 7.40 0.70 27 1.00 2.40 5.50 1.60 32 1.00 2.20 4.50 1.20 26 1.00 2.00 3.40 1.20 15 1.00 1.20 3.00 1.50 28 1.00 2.30 1.90 1.50 X'X and (X'X)^{-1} are Mu X1 X2 X3 NAMES XPX Mu 20.00 37.00 52.60 29.00 X1 37.00 72.92 105.93 54.25 X2 52.60 105.93 190.94 71.10 X3 29.00 54.25 71.10 47.54 Mu X1 X2 X3 NAMES XPXINV Mu 1.0903 -0.3499 -0.0164 -0.2413 X1 -0.3499 0.3792 -0.0730 -0.1102 X2 -0.0164 -0.0730 0.0350 0.0409 X3 -0.2413 -0.1102 0.0409 0.2327 ORD, Y, X'Y, beta=(X'X)^{-1}X'Y, Yfit=X*beta, and R_i are ORD YY XPY BETA YFIT RESID 1 10 526.0 4.1197 14.98 -4.980 2 23 1002.2 7.2723 18.38 4.620 3 24 1357.2 -0.9318 26.88 -2.884 4 21 814.2 7.7084 19.65 1.352 5 32 37.39 -5.389 6 28 24.94 3.062 7 19 23.53 -4.534 8 27 29.94 -2.942 9 35 33.65 1.352 10 34 30.55 3.449 11 42 32.06 9.938 MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 2 LET'S USE PROC IML TO ANALYZE THE REGRESSION 21:30 Monday, November 17, 2008 ORD YY XPY BETA YFIT RESID 12 33 32.07 0.932 13 24 23.20 0.796 14 23 28.47 -5.470 15 23 19.35 3.653 16 27 28.78 -1.782 17 32 25.18 6.824 18 26 24.75 1.254 19 15 21.61 -6.614 20 28 30.64 -2.638 YBAR YFITBAR RESIDBAR Averages are 26.3 26.300 -0.0000 MU B2 B3 B4 The regr.equation is YY = 4.12 + 7.27 X1 + -0.93 X2 + 7.71 X3 MSE, Beta (again), and Cov(beta)=MSE*(X'X)^{-1} are MSE NAMES BETA COVBETA 23.9499 Mu 4.1197 26.1118 -8.3802 -0.3922 -5.7790 X1 7.2723 -8.3802 9.0822 -1.7476 -2.6384 X2 -0.9318 -0.3922 -1.7476 0.8380 0.9802 X3 7.7084 -5.7790 -2.6384 0.9802 5.5739 RSQUARE Model R^2 is 0.62291 PARAMETER ESTIMATE TABLE WITH 95% CONFIDENCE INTERVALS: NAMES BETA STDBETA BETALO BETAHI TSTAT PVAL Mu 4.11972 5.110 -6.713 14.952 0.81 0.4319 X1 7.27227 3.014 0.884 13.661 2.41 0.0282 X2 -0.93180 0.915 -2.872 1.009 -1.02 0.3239 X3 7.70842 2.361 2.704 12.713 3.27 0.0049 ANOVA TABLE FOR FULL-MODEL TEST (NOTE: Df(Fstat) = (3,16)): ANAMES SUMSQ DEGFREE MSUMSQ FSTAT PFSTAT SSMOD 633.0013 3 211.000 8.810 0.0011 SSE 383.1987 16 23.950 SUM 1016.2000 19 SSTOT 1016.2000 19 MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 3 LET'S USE PROC IML TO ANALYZE THE REGRESSION 21:30 Monday, November 17, 2008 THE MATRIX OF CORRELATION COEFFICIENTS IS X1 X2 X3 NAMES2 CORR X1 . 0.5622 0.1211 X2 0.5622 . -0.3042 X3 0.1211 -0.3042 . THE MATRIX OF PEARSON CORRELATION P-VALUES IS X1 X2 X3 NAMES2 PVALS X1 . 0.0099 0.6110 X2 0.0099 . 0.1922 X3 0.6110 0.1922 . NOW EXITING PROC IML MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 4 NOW USING SAS'S REGULAR ROUTINES 21:30 Monday, November 17, 2008 CHECK EXPORTED VALUES OF YY, YFIT, and THE RESIDUALS: Obs ORD YY YFIT RESID 1 1 10 14.9798 -4.97980 2 2 23 18.3804 4.61962 3 3 24 26.8838 -2.88380 4 4 21 19.6481 1.35190 5 5 32 37.3894 -5.38938 6 6 28 24.9377 3.06232 7 7 19 23.5340 -4.53403 8 8 27 29.9423 -2.94228 9 9 35 33.6478 1.35220 10 10 34 30.5513 3.44871 11 11 42 32.0616 9.93838 12 12 33 32.0684 0.93155 13 13 24 23.2037 0.79631 14 14 23 28.4696 -5.46962 15 15 23 19.3465 3.65347 16 16 27 28.7818 -1.78175 17 17 32 25.1757 6.82427 18 18 26 24.7463 1.25375 19 19 15 21.6137 -6.61368 20 20 28 30.6382 -2.63815 MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 5 NOW USING SAS'S REGULAR ROUTINES 21:30 Monday, November 17, 2008 PLOT RESIDUALS AGAINST YFIT USING PROC PLOT RESIDUALS SHOULD GIVE AN INDEPENDENT-LOOKING SCATTERPLOT Plot of RESID*YFIT. Symbol used is 'R'. RESID ‚ 10 ˆ R ‚ ‚ ‚ ‚ R ‚ ‚ 5 ˆ ‚ R ‚ R R ‚ R ‚ ‚ R R R ‚ R R 0 ˆ ‚ ‚ R ‚ ‚ R R R ‚ ‚ R -5 ˆ R ‚ R R ‚ R ‚ ‚ ‚ ‚ -10 ˆ Šƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒ 15 20 25 30 35 40 YFIT MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 6 NOW USING SAS'S REGULAR ROUTINES 21:30 Monday, November 17, 2008 OUTPUT FROM SAS'S REGRESSION ROUTINE (`PROC REG') NOTE THAT MOST OUTPUT IS EXACTLY THE SAME! The REG Procedure Model: MODEL1 Dependent Variable: yy Number of Observations Read 20 Number of Observations Used 20 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 633.00128 211.00043 8.81 0.0011 Error 16 383.19872 23.94992 Corrected Total 19 1016.20000 Root MSE 4.89387 R-Square 0.6229 Dependent Mean 26.30000 Adj R-Sq 0.5522 Coeff Var 18.60785 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 4.11972 5.10997 0.81 0.4319 x1 1 7.27227 3.01367 2.41 0.0282 x2 1 -0.93180 0.91544 -1.02 0.3239 x3 1 7.70842 2.36090 3.27 0.0049 MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 7 NOW USING SAS'S REGULAR ROUTINES 21:30 Monday, November 17, 2008 INVESTIGATE DIFFERENT CHOICES OF COVARIATES TRY WITH x1 x3 ONLY DOES THIS DECREASE THE MODEL R^2 SIGNIFICANTLY? The REG Procedure Model: MODEL1 Dependent Variable: yy Number of Observations Read 20 Number of Observations Used 20 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 608.18784 304.09392 12.67 0.0004 Error 17 408.01216 24.00072 Corrected Total 19 1016.20000 Root MSE 4.89905 R-Square 0.5985 Dependent Mean 26.30000 Adj R-Sq 0.5513 Coeff Var 18.62758 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 3.68368 5.09738 0.72 0.4797 x1 1 5.32909 2.33436 2.28 0.0356 x3 1 8.79828 2.10637 4.18 0.0006 MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 8 NOW USING SAS'S REGULAR ROUTINES 21:30 Monday, November 17, 2008 TRY FORMAL MODEL-SELECTION PROCEDURES FOR COVARIATES (BEST) R^2, ADJUSTED R^2, MALLOW CP BACKWARDS (STEPWISE) SELECTION SEE FarmsModSel.sas FOR EXPLANATIONS The REG Procedure Model: MODEL1 Dependent Variable: yy R-Square Selection Method Number of Observations Read 20 Number of Observations Used 20 Number in Model R-Square Variables in Model 1 0.4754 x3 1 0.1864 x1 1 0.0128 x2 ------------------------------------------- 2 0.5985 x1 x3 2 0.4857 x2 x3 2 0.3717 x1 x2 ------------------------------------------- 3 0.6229 x1 x2 x3 MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 9 NOW USING SAS'S REGULAR ROUTINES 21:30 Monday, November 17, 2008 TRY FORMAL MODEL-SELECTION PROCEDURES FOR COVARIATES (BEST) R^2, ADJUSTED R^2, MALLOW CP BACKWARDS (STEPWISE) SELECTION SEE FarmsModSel.sas FOR EXPLANATIONS The REG Procedure Model: MODEL2 Dependent Variable: yy Adjusted R-Square Selection Method Number of Observations Read 20 Number of Observations Used 20 Number in Adjusted Model R-Square R-Square Variables in Model 3 0.5522 0.6229 x1 x2 x3 2 0.5513 0.5985 x1 x3 1 0.4463 0.4754 x3 2 0.4252 0.4857 x2 x3 2 0.2977 0.3717 x1 x2 1 0.1412 0.1864 x1 1 -.0420 0.0128 x2 MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 10 NOW USING SAS'S REGULAR ROUTINES 21:30 Monday, November 17, 2008 TRY FORMAL MODEL-SELECTION PROCEDURES FOR COVARIATES (BEST) R^2, ADJUSTED R^2, MALLOW CP BACKWARDS (STEPWISE) SELECTION SEE FarmsModSel.sas FOR EXPLANATIONS The REG Procedure Model: MODEL3 Dependent Variable: yy C(p) Selection Method Number of Observations Read 20 Number of Observations Used 20 Number in Model C(p) R-Square Variables in Model 2 3.0361 0.5985 x1 x3 3 4.0000 0.6229 x1 x2 x3 1 6.2587 0.4754 x3 2 7.8230 0.4857 x2 x3 2 12.6604 0.3717 x1 x2 1 18.5202 0.1864 x1 1 25.8862 0.0128 x2 MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 11 NOW USING SAS'S REGULAR ROUTINES 21:30 Monday, November 17, 2008 TRY FORMAL MODEL-SELECTION PROCEDURES FOR COVARIATES (BEST) R^2, ADJUSTED R^2, MALLOW CP BACKWARDS (STEPWISE) SELECTION SEE FarmsModSel.sas FOR EXPLANATIONS The REG Procedure Model: MODEL4 Dependent Variable: yy Number of Observations Read 20 Number of Observations Used 20 Backward Elimination: Step 0 All Variables Entered: R-Square = 0.6229 and C(p) = 4.0000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 633.00128 211.00043 8.81 0.0011 Error 16 383.19872 23.94992 Corrected Total 19 1016.20000 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 4.11972 5.10997 15.56688 0.65 0.4319 x1 7.27227 3.01367 139.46056 5.82 0.0282 x2 -0.93180 0.91544 24.81344 1.04 0.3239 x3 7.70842 2.36090 255.31679 10.66 0.0049 Bounds on condition number: 1.8406, 14.44 --------------------------------------------------------------------------- Backward Elimination: Step 1 Variable x2 Removed: R-Square = 0.5985 and C(p) = 3.0361 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 608.18784 304.09392 12.67 0.0004 Error 17 408.01216 24.00072 Corrected Total 19 1016.20000 MULTIPLE REGRESSION FOR THREE VARIABLES - YOURNAME 12 NOW USING SAS'S REGULAR ROUTINES 21:30 Monday, November 17, 2008 TRY FORMAL MODEL-SELECTION PROCEDURES FOR COVARIATES (BEST) R^2, ADJUSTED R^2, MALLOW CP BACKWARDS (STEPWISE) SELECTION SEE FarmsModSel.sas FOR EXPLANATIONS The REG Procedure Model: MODEL4 Dependent Variable: yy Backward Elimination: Step 1 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 3.68368 5.09738 12.53412 0.52 0.4797 x1 5.32909 2.33436 125.08219 5.21 0.0356 x3 8.79828 2.10637 418.74489 17.45 0.0006 Bounds on condition number: 1.0149, 4.0596 --------------------------------------------------------------------------- All variables left in the model are significant at the 0.1000 level. Summary of Backward Elimination Variable Number Partial Model Step Removed Vars In R-Square R-Square C(p) F Value Pr > F 1 x2 2 0.0244 0.5985 3.0361 1.04 0.3239