MODEL SELECTION for ``apple taste'' with 5 predictors
PROC REG of taste on 5 variables
MODEL RSQUARE=0.8793 (P=0.0016), but NOTHING IS SIGNIFICANT
in the Parameter Estimate table !!
NOTE ALSO that the parameter estimates for Na and K
are large but of opposite sign, even though they are
found together and have similar chemical effects.
This may be an example of unreliable estimates of
parameters when predictors are highly correlated.

The REG Procedure
Model: MODEL1
Dependent Variable: yy AppleTaste

Number of Observations Read 14
Number of Observations Used 14

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 5 3577641 715528 11.65 0.0016
Error 8 491317 61415    
Corrected Total 13 4068957      

Root MSE 247.81967 R-Square 0.8793
Dependent Mean 2195.42857 Adj R-Sq 0.8038
Coeff Var 11.28799    

Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept Intercept 1 299.64546 988.08291 0.30 0.7694
Nat Sodium 1 -10.77226 76.56324 -0.14 0.8916
Kk Potassium 1 43.61828 60.00005 0.73 0.4880
Pp Phosphorus 1 0.34494 0.74392 0.46 0.6552
Shade Shade 1 -179.09980 1800.82748 -0.10 0.9232
Water Water 1 1.75238 3.52645 0.50 0.6326



MODEL SELECTION for ``apple taste'' with 5 predictors
GENERATING A CORRELATION MATRIX using proc corr
IN EACH CORRELATION TABLE ENTRY
The first entry is the estimated Pearson rho
The second entry is the t-test for H_0:rho=0
THE FIRST ROW AND COLUMN are for Response vs. Predictors
The other entries (among Predictors) show a complex pattern
of high correlations between predictor variables,
such as Na vs K (they are bundled together in fertilizers)
P vs Water, and everything vs Shade.

The CORR Procedure

6 Variables: yy Nat Kk Pp Shade Water

Pearson Correlation Coefficients, N = 14
Prob > |r| under H0: Rho=0
  yy Nat Kk Pp Shade Water
yy
AppleTaste
1.00000
0.41070
0.1446
0.41574
0.1393
0.75356
0.0019
0.91704
<.0001
0.73599
0.0027
Nat
Sodium
0.41070
0.1446
1.00000
0.95312
<.0001
-0.13605
0.6428
0.59800
0.0239
-0.14548
0.6197
Kk
Potassium
0.41574
0.1393
0.95312
<.0001
1.00000
-0.17195
0.5567
0.57962
0.0298
-0.19159
0.5117
Pp
Phosphorus
0.75356
0.0019
-0.13605
0.6428
-0.17195
0.5567
1.00000
0.69683
0.0056
0.97877
<.0001
Shade
Shade
0.91704
<.0001
0.59800
0.0239
0.57962
0.0298
0.69683
0.0056
1.00000
0.67403
0.0082
Water
Water
0.73599
0.0027
-0.14548
0.6197
-0.19159
0.5117
0.97877
<.0001
0.67403
0.0082
1.00000



MODEL SELECTION for ``apple taste'' with 5 predictors
A SECOND RUN OF PROC REG for VIF scores
ALL VIF scores for predictors are too large
(VIF=10 is considered a rule of thumbs.)
This suggests that the predictors are highly correlated
and that some if not most should be dropped.
Fortunately, the output also suggests that there are
no outliers.

The REG Procedure
Model: MODEL1
Dependent Variable: yy AppleTaste

Number of Observations Read 14
Number of Observations Used 14

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 5 3577641 715528 11.65 0.0016
Error 8 491317 61415    
Corrected Total 13 4068957      

Root MSE 247.81967 R-Square 0.8793
Dependent Mean 2195.42857 Adj R-Sq 0.8038
Coeff Var 11.28799    

Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t| Variance
Inflation
Intercept Intercept 1 299.64546 988.08291 0.30 0.7694 0
Nat Sodium 1 -10.77226 76.56324 -0.14 0.8916 26.21534
Kk Potassium 1 43.61828 60.00005 0.73 0.4880 80.37797
Pp Phosphorus 1 0.34494 0.74392 0.46 0.6552 144.71123
Shade Shade 1 -179.09980 1800.82748 -0.10 0.9232 274.06551
Water Water 1 1.75238 3.52645 0.50 0.6326 31.40784



MODEL SELECTION for ``apple taste'' with 5 predictors
A SECOND RUN OF PROC REG for VIF scores
ALL VIF scores for predictors are too large
(VIF=10 is considered a rule of thumbs.)
This suggests that the predictors are highly correlated
and that some if not most should be dropped.
Fortunately, the output also suggests that there are
no outliers.

The REG Procedure
Model: MODEL1
Dependent Variable: yy AppleTaste

Output Statistics
Obs Dependent
Variable
Predicted
Value
Std Error
Mean Predict
Residual Std Error
Residual
Student
Residual
  -2-1 0 1 2 Cook's
D
1 2876 2545 162.2851 330.9954 187.3 1.767 |      |***   | 0.391
2 2078 2054 134.5014 24.3789 208.1 0.117 |      |      | 0.001
3 3052 2921 185.3258 131.4174 164.5 0.799 |      |*     | 0.135
4 2265 1962 121.7668 303.4113 215.8 1.406 |      |**    | 0.105
5 940.0000 1121 209.0136 -181.2443 133.1 -1.361 |    **|      | 0.761
6 2815 2768 163.4178 47.4069 186.3 0.254 |      |      | 0.008
7 2661 2735 148.6937 -73.6539 198.3 -0.372 |      |      | 0.013
8 2181 2279 143.1761 -97.7067 202.3 -0.483 |      |      | 0.019
9 2052 1952 151.6323 99.5374 196.0 0.508 |      |*     | 0.026
10 2064 2314 136.7896 -250.1510 206.6 -1.211 |    **|      | 0.107
11 1551 1348 137.5019 202.6941 206.2 0.983 |      |*     | 0.072
12 2338 2587 135.9839 -248.8729 207.2 -1.201 |    **|      | 0.104
13 1753 1848 219.8169 -95.4855 114.4 -0.834 |     *|      | 0.428
14 2110 2303 185.6468 -192.7272 164.2 -1.174 |    **|      | 0.294

Sum of Residuals 0
Sum of Squared Residuals 491317
Predicted Residual SS (PRESS) 1782663



MODEL SELECTION for ``apple taste'' with 5 predictors
MODEL SELECTION of RESPONSE (YY) for 5 Predictors
Comparing Stepwise, Backward, and Mallow
Stepwise Regression finds Na (Sodium) Shade
Backwards Regression finds K (Potassium) P (Phosphorus)
Mallow gives a sorted list of models, K P slightly best

The REG Procedure
Model: MODEL1
Dependent Variable: yy AppleTaste

Number of Observations Read 14
Number of Observations Used 14


 


Stepwise Selection: Step 1


Variable Shade Entered: R-Square = 0.8410 and C(p) = 0.5367
 
 
 
 

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 1 3421848 3421848 63.45 <.0001
Error 12 647110 53926    
Corrected Total 13 4068957      

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 645.72764 204.20305 539226 10.00 0.0082
Shade 811.96905 101.93130 3421848 63.45 <.0001


Bounds on condition number: 1, 1


 


Stepwise Selection: Step 2


Variable Nat Entered: R-Square = 0.8705 and C(p) = 0.5814
 
 
 
 

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 3541934 1770967 36.96 <.0001
Error 11 527023 47911    
Corrected Total 13 4068957      

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 798.46545 215.30332 658943 13.75 0.0034
Nat -26.08884 16.47880 120087 2.51 0.1417
Shade 925.46000 119.87479 2855594 59.60 <.0001


Bounds on condition number: 1.5567, 6.2267


 


All variables left in the model are significant at the 0.1500 level.


No other variable met the 0.1500 significance level for entry into the model.


 
 
 

Summary of Stepwise Selection
Step Variable
Entered
Variable
Removed
Label Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 Shade   Shade 1 0.8410 0.8410 0.5367 63.45 <.0001
2 Nat   Sodium 2 0.0295 0.8705 0.5814 2.51 0.1417



MODEL SELECTION for ``apple taste'' with 5 predictors
MODEL SELECTION of RESPONSE (YY) for 5 Predictors
Comparing Stepwise, Backward, and Mallow
Stepwise Regression finds Na (Sodium) Shade
Backwards Regression finds K (Potassium) P (Phosphorus)
Mallow gives a sorted list of models, K P slightly best

The REG Procedure
Model: MODEL2
Dependent Variable: yy AppleTaste

Number of Observations Read 14
Number of Observations Used 14


 


Backward Elimination: Step 0


All Variables Entered: R-Square = 0.8793 and C(p) = 6.0000
 
 
 
 

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 5 3577641 715528 11.65 0.0016
Error 8 491317 61415    
Corrected Total 13 4068957      

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 299.64546 988.08291 5648.07120 0.09 0.7694
Nat -10.77226 76.56324 1215.75074 0.02 0.8916
Kk 43.61828 60.00005 32457 0.53 0.4880
Pp 0.34494 0.74392 13204 0.22 0.6552
Shade -179.09980 1800.82748 607.45977 0.01 0.9232
Water 1.75238 3.52645 15165 0.25 0.6326


Bounds on condition number: 274.07, 2783.9


 


Backward Elimination: Step 1


Variable Shade Removed: R-Square = 0.8791 and C(p) = 4.0099
 
 
 
 

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 4 3577033 894258 16.36 0.0004
Error 9 491924 54658    
Corrected Total 13 4068957      

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 391.72312 325.63626 79095 1.45 0.2597
Nat -16.51251 47.45755 6617.15339 0.12 0.7359
Kk 38.09669 21.46412 172188 3.15 0.1097
Pp 0.27749 0.28833 50626 0.93 0.3610
Water 1.59125 2.95493 15850 0.29 0.6033


Bounds on condition number: 24.778, 288.31


 


Backward Elimination: Step 2


Variable Nat Removed: R-Square = 0.8775 and C(p) = 2.1176
 
 
 
 

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 3 3570416 1190139 23.87 <.0001
Error 10 498541 49854    
Corrected Total 13 4068957      

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 317.02761 233.84411 91631 1.84 0.2050
Kk 30.97381 6.16202 1259634 25.27 0.0005
Pp 0.29157 0.27264 57019 1.14 0.3100
Water 1.42377 2.78440 13035 0.26 0.6202


Bounds on condition number: 24.121, 147.33


 


Backward Elimination: Step 3


Variable Water Removed: R-Square = 0.8743 and C(p) = 0.3299
 
 
 
 

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 3557381 1778690 38.25 <.0001
Error 11 511577 46507    
Corrected Total 13 4068957      

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 337.00028 222.68472 106512 2.29 0.1584
Kk 30.61040 5.91185 1246835 26.81 0.0003
Pp 0.42795 0.05463 2854105 61.37 <.0001


Bounds on condition number: 1.0305, 4.1219


 


All variables left in the model are significant at the 0.1000 level.


 
 
 

Summary of Backward Elimination
Step Variable
Removed
Label Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 Shade Shade 4 0.0001 0.8791 4.0099 0.01 0.9232
2 Nat Sodium 3 0.0016 0.8775 2.1176 0.12 0.7359
3 Water Water 2 0.0032 0.8743 0.3299 0.26 0.6202



MODEL SELECTION for ``apple taste'' with 5 predictors
MODEL SELECTION of RESPONSE (YY) for 5 Predictors
Comparing Stepwise, Backward, and Mallow
Stepwise Regression finds Na (Sodium) Shade
Backwards Regression finds K (Potassium) P (Phosphorus)
Mallow gives a sorted list of models, K P slightly best

The REG Procedure
Model: MODEL3
Dependent Variable: yy
 
C(p) Selection Method

Number of Observations Read 14
Number of Observations Used 14


 

Number in
Model
C(p) R-Square Variables in Model
2 0.3299 0.8743 Kk Pp
1 0.5367 0.8410 Shade
2 0.5814 0.8705 Nat Shade
2 0.8473 0.8665 Pp Shade
2 0.8496 0.8664 Shade Water
2 1.0461 0.8635 Kk Water
2 1.1990 0.8612 Kk Shade
3 2.1176 0.8775 Kk Pp Water
3 2.2680 0.8752 Nat Kk Pp
3 2.3222 0.8744 Kk Pp Shade
3 2.4351 0.8727 Nat Kk Shade
3 2.5548 0.8709 Nat Pp Shade
3 2.5813 0.8705 Nat Shade Water
3 2.6948 0.8688 Kk Shade Water
3 2.8147 0.8670 Pp Shade Water
3 2.8342 0.8667 Nat Kk Water
2 2.8513 0.8362 Nat Pp
4 4.0099 0.8791 Nat Kk Pp Water
4 4.0198 0.8790 Kk Pp Shade Water
4 4.2150 0.8760 Nat Kk Shade Water
2 4.2187 0.8156 Nat Water
4 4.2469 0.8755 Nat Kk Pp Shade
4 4.5285 0.8713 Nat Pp Shade Water
3 4.8136 0.8368 Nat Pp Water
5 6.0000 0.8793 Nat Kk Pp Shade Water
1 18.6318 0.5678 Pp
1 20.3651 0.5417 Water
2 20.6280 0.5679 Pp Water
1 44.8026 0.1728 Kk
1 45.0784 0.1687 Nat
2 46.6515 0.1751 Nat Kk

 



 



MODEL SELECTION for ``apple taste'' with 5 predictors
CHECKING THE CONSENSUS BEST MODEL:
VIF scores are much smaller
Parameter estimates are all positive
Individual parameter estimates are all significant
The output suggests that there are still no outliers.

The REG Procedure
Model: MODEL1
Dependent Variable: yy AppleTaste

Number of Observations Read 14
Number of Observations Used 14

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 3557381 1778690 38.25 <.0001
Error 11 511577 46507    
Corrected Total 13 4068957      

Root MSE 215.65474 R-Square 0.8743
Dependent Mean 2195.42857 Adj R-Sq 0.8514
Coeff Var 9.82290    

Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t| Variance
Inflation
Intercept Intercept 1 337.00028 222.68472 1.51 0.1584 0
Kk Potassium 1 30.61040 5.91185 5.18 0.0003 1.03047
Pp Phosphorus 1 0.42795 0.05463 7.83 <.0001 1.03047



MODEL SELECTION for ``apple taste'' with 5 predictors
CHECKING THE CONSENSUS BEST MODEL:
VIF scores are much smaller
Parameter estimates are all positive
Individual parameter estimates are all significant
The output suggests that there are still no outliers.

The REG Procedure
Model: MODEL1
Dependent Variable: yy AppleTaste

Output Statistics
Obs Dependent
Variable
Predicted
Value
Std Error
Mean Predict
Residual Std Error
Residual
Student
Residual
  -2-1 0 1 2 Cook's
D
1 2876 2565 112.7706 311.0664 183.8 1.692 |      |***   | 0.359
2 2078 2018 75.4518 60.0722 202.0 0.297 |      |      | 0.004
3 3052 2927 103.7734 124.8913 189.0 0.661 |      |*     | 0.044
4 2265 1929 65.2153 336.4415 205.6 1.637 |      |***   | 0.090
5 940.0000 1171 150.6759 -231.3620 154.3 -1.500 |    **|      | 0.715
6 2815 2811 91.1217 3.7117 195.5 0.0190 |      |      | 0.000
7 2661 2685 101.8988 -24.3510 190.1 -0.128 |      |      | 0.002
8 2181 2341 60.1426 -160.3518 207.1 -0.774 |     *|      | 0.017
9 2052 1963 75.6404 89.2776 202.0 0.442 |      |      | 0.009
10 2064 2285 82.1294 -221.1845 199.4 -1.109 |    **|      | 0.070
11 1551 1345 119.4113 205.6548 179.6 1.145 |      |**    | 0.193
12 2338 2552 102.9772 -214.0711 189.5 -1.130 |    **|      | 0.126
13 1753 1797 73.4737 -43.9520 202.8 -0.217 |      |      | 0.002
14 2110 2346 135.4746 -235.8430 167.8 -1.406 |    **|      | 0.429

Sum of Residuals 0
Sum of Squared Residuals 511577
Predicted Residual SS (PRESS) 983500