SAS Output

MODEL SELECTION for ``apple taste'' with 5 predictors

PROC REG of taste on 5 variables

MODEL RSQUARE=0.8793 (P=0.0016), but NOTHING IS SIGNIFICANT

in the Parameter Estimate table !!

NOTE ALSO that the parameter estimates for Na and K

are large but of opposite sign, even though they are

found together and have similar chemical effects.

This may be an example of unreliable estimates of

parameters when predictors are highly correlated.

The REG Procedure

Model: MODEL1

Dependent Variable: yy AppleTaste

Number of Observations Read	14
Number of Observations Used	14

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	5	3577641	715528	11.65	0.0016
Error	8	491317	61415
Corrected Total	13	4068957

Root MSE	247.81967	R-Square	0.8793
Dependent Mean	2195.42857	Adj R-Sq	0.8038
Coeff Var	11.28799

Parameter Estimates
Variable	Label	DF	Parameter Estimate	Standard Error	t Value	Pr > \|t\|
Intercept	Intercept	1	299.64546	988.08291	0.30	0.7694
Nat	Sodium	1	-10.77226	76.56324	-0.14	0.8916
Kk	Potassium	1	43.61828	60.00005	0.73	0.4880
Pp	Phosphorus	1	0.34494	0.74392	0.46	0.6552
Shade	Shade	1	-179.09980	1800.82748	-0.10	0.9232
Water	Water	1	1.75238	3.52645	0.50	0.6326

MODEL SELECTION for ``apple taste'' with 5 predictors

GENERATING A CORRELATION MATRIX using proc corr

IN EACH CORRELATION TABLE ENTRY

The first entry is the estimated Pearson rho

The second entry is the t-test for H_0:rho=0

THE FIRST ROW AND COLUMN are for Response vs. Predictors

The other entries (among Predictors) show a complex pattern

of high correlations between predictor variables,

such as Na vs K (they are bundled together in fertilizers)

P vs Water, and everything vs Shade.

The CORR Procedure

6 Variables:	yy Nat Kk Pp Shade Water

Pearson Correlation Coefficients, N = 14 Prob > \|r\| under H0: Rho=0
	yy	Nat	Kk	Pp	Shade	Water
yy AppleTaste	1.00000	0.41070 0.1446	0.41574 0.1393	0.75356 0.0019	0.91704 <.0001	0.73599 0.0027
Nat Sodium	0.41070 0.1446	1.00000	0.95312 <.0001	-0.13605 0.6428	0.59800 0.0239	-0.14548 0.6197
Kk Potassium	0.41574 0.1393	0.95312 <.0001	1.00000	-0.17195 0.5567	0.57962 0.0298	-0.19159 0.5117
Pp Phosphorus	0.75356 0.0019	-0.13605 0.6428	-0.17195 0.5567	1.00000	0.69683 0.0056	0.97877 <.0001
Shade Shade	0.91704 <.0001	0.59800 0.0239	0.57962 0.0298	0.69683 0.0056	1.00000	0.67403 0.0082
Water Water	0.73599 0.0027	-0.14548 0.6197	-0.19159 0.5117	0.97877 <.0001	0.67403 0.0082	1.00000

MODEL SELECTION for ``apple taste'' with 5 predictors

A SECOND RUN OF PROC REG for VIF scores

ALL VIF scores for predictors are too large

(VIF=10 is considered a rule of thumbs.)

This suggests that the predictors are highly correlated

and that some if not most should be dropped.

Fortunately, the output also suggests that there are

no outliers.

The REG Procedure

Model: MODEL1

Dependent Variable: yy AppleTaste

Number of Observations Read	14
Number of Observations Used	14

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	5	3577641	715528	11.65	0.0016
Error	8	491317	61415
Corrected Total	13	4068957

Root MSE	247.81967	R-Square	0.8793
Dependent Mean	2195.42857	Adj R-Sq	0.8038
Coeff Var	11.28799

Parameter Estimates
Variable	Label	DF	Parameter Estimate	Standard Error	t Value	Pr > \|t\|	Variance Inflation
Intercept	Intercept	1	299.64546	988.08291	0.30	0.7694	0
Nat	Sodium	1	-10.77226	76.56324	-0.14	0.8916	26.21534
Kk	Potassium	1	43.61828	60.00005	0.73	0.4880	80.37797
Pp	Phosphorus	1	0.34494	0.74392	0.46	0.6552	144.71123
Shade	Shade	1	-179.09980	1800.82748	-0.10	0.9232	274.06551
Water	Water	1	1.75238	3.52645	0.50	0.6326	31.40784

MODEL SELECTION for ``apple taste'' with 5 predictors

A SECOND RUN OF PROC REG for VIF scores

ALL VIF scores for predictors are too large

(VIF=10 is considered a rule of thumbs.)

This suggests that the predictors are highly correlated

and that some if not most should be dropped.

Fortunately, the output also suggests that there are

no outliers.

The REG Procedure

Model: MODEL1

Dependent Variable: yy AppleTaste

Output Statistics
Obs	Dependent Variable	Predicted Value	Std Error Mean Predict	Residual	Std Error Residual	Student Residual	-2-1 0 1 2	Cook's D
1	2876	2545	162.2851	330.9954	187.3	1.767	\| \|*** \|	0.391
2	2078	2054	134.5014	24.3789	208.1	0.117	\| \| \|	0.001
3	3052	2921	185.3258	131.4174	164.5	0.799	\| \|* \|	0.135
4	2265	1962	121.7668	303.4113	215.8	1.406	\| \|** \|	0.105
5	940.0000	1121	209.0136	-181.2443	133.1	-1.361	\| **\| \|	0.761
6	2815	2768	163.4178	47.4069	186.3	0.254	\| \| \|	0.008
7	2661	2735	148.6937	-73.6539	198.3	-0.372	\| \| \|	0.013
8	2181	2279	143.1761	-97.7067	202.3	-0.483	\| \| \|	0.019
9	2052	1952	151.6323	99.5374	196.0	0.508	\| \|* \|	0.026
10	2064	2314	136.7896	-250.1510	206.6	-1.211	\| **\| \|	0.107
11	1551	1348	137.5019	202.6941	206.2	0.983	\| \|* \|	0.072
12	2338	2587	135.9839	-248.8729	207.2	-1.201	\| **\| \|	0.104
13	1753	1848	219.8169	-95.4855	114.4	-0.834	\| *\| \|	0.428
14	2110	2303	185.6468	-192.7272	164.2	-1.174	\| **\| \|	0.294

Sum of Residuals	0
Sum of Squared Residuals	491317
Predicted Residual SS (PRESS)	1782663

MODEL SELECTION for ``apple taste'' with 5 predictors

MODEL SELECTION of RESPONSE (YY) for 5 Predictors

Comparing Stepwise, Backward, and Mallow

Stepwise Regression finds Na (Sodium) Shade

Backwards Regression finds K (Potassium) P (Phosphorus)

Mallow gives a sorted list of models, K P slightly best

The REG Procedure

Model: MODEL1

Dependent Variable: yy AppleTaste

Number of Observations Read	14
Number of Observations Used	14

Stepwise Selection: Step 1

Variable Shade Entered: R-Square = 0.8410 and C(p) = 0.5367

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	1	3421848	3421848	63.45	<.0001
Error	12	647110	53926
Corrected Total	13	4068957

Variable	Parameter Estimate	Standard Error	Type II SS	F Value	Pr > F
Intercept	645.72764	204.20305	539226	10.00	0.0082
Shade	811.96905	101.93130	3421848	63.45	<.0001

Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable Nat Entered: R-Square = 0.8705 and C(p) = 0.5814

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	2	3541934	1770967	36.96	<.0001
Error	11	527023	47911
Corrected Total	13	4068957

Variable	Parameter Estimate	Standard Error	Type II SS	F Value	Pr > F
Intercept	798.46545	215.30332	658943	13.75	0.0034
Nat	-26.08884	16.47880	120087	2.51	0.1417
Shade	925.46000	119.87479	2855594	59.60	<.0001

Bounds on condition number: 1.5567, 6.2267

All variables left in the model are significant at the 0.1500 level.

No other variable met the 0.1500 significance level for entry into the model.

Summary of Stepwise Selection
Step	Variable Entered	Variable Removed	Label	Number Vars In	Partial R-Square	Model R-Square	C(p)	F Value	Pr > F
1	Shade		Shade	1	0.8410	0.8410	0.5367	63.45	<.0001
2	Nat		Sodium	2	0.0295	0.8705	0.5814	2.51	0.1417

MODEL SELECTION for ``apple taste'' with 5 predictors

MODEL SELECTION of RESPONSE (YY) for 5 Predictors

Comparing Stepwise, Backward, and Mallow

Stepwise Regression finds Na (Sodium) Shade

Backwards Regression finds K (Potassium) P (Phosphorus)

Mallow gives a sorted list of models, K P slightly best

The REG Procedure

Model: MODEL2

Dependent Variable: yy AppleTaste

Number of Observations Read	14
Number of Observations Used	14

Backward Elimination: Step 0

All Variables Entered: R-Square = 0.8793 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	5	3577641	715528	11.65	0.0016
Error	8	491317	61415
Corrected Total	13	4068957

Variable	Parameter Estimate	Standard Error	Type II SS	F Value	Pr > F
Intercept	299.64546	988.08291	5648.07120	0.09	0.7694
Nat	-10.77226	76.56324	1215.75074	0.02	0.8916
Kk	43.61828	60.00005	32457	0.53	0.4880
Pp	0.34494	0.74392	13204	0.22	0.6552
Shade	-179.09980	1800.82748	607.45977	0.01	0.9232
Water	1.75238	3.52645	15165	0.25	0.6326

Bounds on condition number: 274.07, 2783.9

Backward Elimination: Step 1

Variable Shade Removed: R-Square = 0.8791 and C(p) = 4.0099

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	4	3577033	894258	16.36	0.0004
Error	9	491924	54658
Corrected Total	13	4068957

Variable	Parameter Estimate	Standard Error	Type II SS	F Value	Pr > F
Intercept	391.72312	325.63626	79095	1.45	0.2597
Nat	-16.51251	47.45755	6617.15339	0.12	0.7359
Kk	38.09669	21.46412	172188	3.15	0.1097
Pp	0.27749	0.28833	50626	0.93	0.3610
Water	1.59125	2.95493	15850	0.29	0.6033

Bounds on condition number: 24.778, 288.31

Backward Elimination: Step 2

Variable Nat Removed: R-Square = 0.8775 and C(p) = 2.1176

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	3	3570416	1190139	23.87	<.0001
Error	10	498541	49854
Corrected Total	13	4068957

Variable	Parameter Estimate	Standard Error	Type II SS	F Value	Pr > F
Intercept	317.02761	233.84411	91631	1.84	0.2050
Kk	30.97381	6.16202	1259634	25.27	0.0005
Pp	0.29157	0.27264	57019	1.14	0.3100
Water	1.42377	2.78440	13035	0.26	0.6202

Bounds on condition number: 24.121, 147.33

Backward Elimination: Step 3

Variable Water Removed: R-Square = 0.8743 and C(p) = 0.3299

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	2	3557381	1778690	38.25	<.0001
Error	11	511577	46507
Corrected Total	13	4068957

Variable	Parameter Estimate	Standard Error	Type II SS	F Value	Pr > F
Intercept	337.00028	222.68472	106512	2.29	0.1584
Kk	30.61040	5.91185	1246835	26.81	0.0003
Pp	0.42795	0.05463	2854105	61.37	<.0001

Bounds on condition number: 1.0305, 4.1219

All variables left in the model are significant at the 0.1000 level.

Summary of Backward Elimination
Step	Variable Removed	Label	Number Vars In	Partial R-Square	Model R-Square	C(p)	F Value	Pr > F
1	Shade	Shade	4	0.0001	0.8791	4.0099	0.01	0.9232
2	Nat	Sodium	3	0.0016	0.8775	2.1176	0.12	0.7359
3	Water	Water	2	0.0032	0.8743	0.3299	0.26	0.6202

MODEL SELECTION for ``apple taste'' with 5 predictors

MODEL SELECTION of RESPONSE (YY) for 5 Predictors

Comparing Stepwise, Backward, and Mallow

Stepwise Regression finds Na (Sodium) Shade

Backwards Regression finds K (Potassium) P (Phosphorus)

Mallow gives a sorted list of models, K P slightly best

The REG Procedure

Model: MODEL3

Dependent Variable: yy

C(p) Selection Method

Number of Observations Read	14
Number of Observations Used	14

Number in Model	C(p)	R-Square	Variables in Model
2	0.3299	0.8743	Kk Pp
1	0.5367	0.8410	Shade
2	0.5814	0.8705	Nat Shade
2	0.8473	0.8665	Pp Shade
2	0.8496	0.8664	Shade Water
2	1.0461	0.8635	Kk Water
2	1.1990	0.8612	Kk Shade
3	2.1176	0.8775	Kk Pp Water
3	2.2680	0.8752	Nat Kk Pp
3	2.3222	0.8744	Kk Pp Shade
3	2.4351	0.8727	Nat Kk Shade
3	2.5548	0.8709	Nat Pp Shade
3	2.5813	0.8705	Nat Shade Water
3	2.6948	0.8688	Kk Shade Water
3	2.8147	0.8670	Pp Shade Water
3	2.8342	0.8667	Nat Kk Water
2	2.8513	0.8362	Nat Pp
4	4.0099	0.8791	Nat Kk Pp Water
4	4.0198	0.8790	Kk Pp Shade Water
4	4.2150	0.8760	Nat Kk Shade Water
2	4.2187	0.8156	Nat Water
4	4.2469	0.8755	Nat Kk Pp Shade
4	4.5285	0.8713	Nat Pp Shade Water
3	4.8136	0.8368	Nat Pp Water
5	6.0000	0.8793	Nat Kk Pp Shade Water
1	18.6318	0.5678	Pp
1	20.3651	0.5417	Water
2	20.6280	0.5679	Pp Water
1	44.8026	0.1728	Kk
1	45.0784	0.1687	Nat
2	46.6515	0.1751	Nat Kk

MODEL SELECTION for ``apple taste'' with 5 predictors

CHECKING THE CONSENSUS BEST MODEL:

VIF scores are much smaller

Parameter estimates are all positive

Individual parameter estimates are all significant

The output suggests that there are still no outliers.

The REG Procedure

Model: MODEL1

Dependent Variable: yy AppleTaste

Number of Observations Read	14
Number of Observations Used	14

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	2	3557381	1778690	38.25	<.0001
Error	11	511577	46507
Corrected Total	13	4068957

Root MSE	215.65474	R-Square	0.8743
Dependent Mean	2195.42857	Adj R-Sq	0.8514
Coeff Var	9.82290

Parameter Estimates
Variable	Label	DF	Parameter Estimate	Standard Error	t Value	Pr > \|t\|	Variance Inflation
Intercept	Intercept	1	337.00028	222.68472	1.51	0.1584	0
Kk	Potassium	1	30.61040	5.91185	5.18	0.0003	1.03047
Pp	Phosphorus	1	0.42795	0.05463	7.83	<.0001	1.03047

MODEL SELECTION for ``apple taste'' with 5 predictors

CHECKING THE CONSENSUS BEST MODEL:

VIF scores are much smaller

Parameter estimates are all positive

Individual parameter estimates are all significant

The output suggests that there are still no outliers.

The REG Procedure

Model: MODEL1

Dependent Variable: yy AppleTaste

Output Statistics
Obs	Dependent Variable	Predicted Value	Std Error Mean Predict	Residual	Std Error Residual	Student Residual	-2-1 0 1 2	Cook's D
1	2876	2565	112.7706	311.0664	183.8	1.692	\| \|*** \|	0.359
2	2078	2018	75.4518	60.0722	202.0	0.297	\| \| \|	0.004
3	3052	2927	103.7734	124.8913	189.0	0.661	\| \|* \|	0.044
4	2265	1929	65.2153	336.4415	205.6	1.637	\| \|*** \|	0.090
5	940.0000	1171	150.6759	-231.3620	154.3	-1.500	\| **\| \|	0.715
6	2815	2811	91.1217	3.7117	195.5	0.0190	\| \| \|	0.000
7	2661	2685	101.8988	-24.3510	190.1	-0.128	\| \| \|	0.002
8	2181	2341	60.1426	-160.3518	207.1	-0.774	\| *\| \|	0.017
9	2052	1963	75.6404	89.2776	202.0	0.442	\| \| \|	0.009
10	2064	2285	82.1294	-221.1845	199.4	-1.109	\| **\| \|	0.070
11	1551	1345	119.4113	205.6548	179.6	1.145	\| \|** \|	0.193
12	2338	2552	102.9772	-214.0711	189.5	-1.130	\| **\| \|	0.126
13	1753	1797	73.4737	-43.9520	202.8	-0.217	\| \| \|	0.002
14	2110	2346	135.4746	-235.8430	167.8	-1.406	\| **\| \|	0.429

Sum of Residuals	0
Sum of Squared Residuals	511577
Predicted Residual SS (PRESS)	983500