*******************************************************************; * A three-factor ANOVA with one observation per cell: * * For any ANOVA model, the ``Corrected Total Sum of Squares sum * SSCtot = Sum_(All Ys) (Y - Ybar)^2 can be written * SSCtot = SSMod + SSE, where * * SSMod = Sum_over_effects SS(Effect) (for effects being modeled) * * and SSE = SSCtot - SSMod. A full-factorial model has * * SSE = SSE(cells) = Sum_(cells) (Y - Cell mean)^2 * * which is the same as the SSE for a one-way ANOVA on cells. * * This means that, in a non-full-factorial model, the SS(Effect) * terms for effects that are not modeled are added to SSE, so that * * SSE = SSE(cells) + Sum_(effects not in model) SS(Effect). * * Thus a full-factorial ANOVA model with one observation per cell * has SSE=0. * * A two-factor ANOVA with factors A and B has * * SSCtot = SS(A) + SS(B) + SS(A*B) + SSE(cells) * * where SS(A) and SS(B) are the main effects and SS(A*B) is the * interaction. * * If SS(cells)=0, then one or more model effects must be added * to SSE to be able to obtain P-values for the other effects. * One way of doing this is to set SSE=SS(A*B). This is equivalent * to analyzing a model with main effects A and B only with the * hope that the interaction term is not significant. * * This approach will at least be conservative, since SS(A*B) will * at worse overestimate the true error variance. (That is, the * significance of the main effects may be underreported, but * shouldn't lead you to claim that SS(A) or SS(B) is significant * when it isn't, unless you are unlucky due to small-sample * effects.) * * With more than two factors, ways to handle ANOVAs with one * observation per cell generally assume that certain interactions * are not statistically significant (or at least are relatively * small) and then combine them to form an SSE for testing the * other effects. This can be viewed as `cannibalizing' effects * (in a full-factorial model) in order to estimate the error. * * A SPLIT-PLOT analysis is one way to doing this for a three-factor * ANOVA with one observation per cell. The term ``split-plot'' * goes back to the origins of ANOVA analyses in plant and animal * breeding. * * By definition, a split-plot design has three factors (A, B, and C) * with one observation per cell. It is assumed that factor A is * the ``strong'' factor and that factors B and C will be less * significant (or significant but smaller). Generally speaking, * interactions with a highly significant factor (here perhaps A) * may be significant and informative. We assume here that while * B and C may have significant but uninteresting main effects, * the interaction B*C and the the interation of B*C with A (A*B*C) * will be less significant. Specifically, we will use * * B*C and A*B*C (1) * * to estimate the error. (That is, SSE=SS(B*C)+SS(A*B*C).) * * Since a full-factorial model with 3 factors has 7 effects, this * leaves 5 effects that can be tested. These are * * The 3 main effects: A, B, C and * Two 2-way interactions: A*B and A*C * * In particular, we use the effects in (1) to test for all three * main effects and the two interactions with the ``strong'' * factor A. * * In agricultural experiments, the three factors might be * * Factor A: Strain or type or variety of plant * Factor B: Amount of water * Factor C: Location of test field (including soil quality) * * Here the main effect A is the variability of the plant varieties. * The two interactions A*B and A*C measure nonadditive effects * between plant variety and water level (A*B) and between plant * variety and field or location or soil quality (A*C). The term * `split-plot' comes from the fact that each of the fields used * for factor C are split up into subplots corresponding to * different plant varieties and different amounts of water. * * As an example, consider the following apocryphal situation in an * apocryphal West African country. A new agricultural agent * (Western or native) wants to compare two different varieties * of spring yam under three different levels of irrigation. * * Accordingly, the agent plants spring yams in five different fields * in five different tribal areas. Each field is split into six * subplots corresponding to two yam varieties and three levels of * irrigation, for a total of 30 subplots for the five fields. * * The output of the 30 subplots after a season of growing yams is * * Water New Yam Variety Old Yam Variety * Level: Dry Med Wet Dry Med Wet * ----------------------------------------------- * Field1: 120 147 140 131 141 136 * Field2: 122 145 157 140 129 139 * Field3: 125 160 135 139 134 138 * Field4: 127 144 150 136 140 136 * Field5: 134 155 165 132 133 137 * * The agent wants to know: * * (i) Is there a significant main effect for Yam Type? Do Field and * Water Level have significant main effects? If any main effect is * significant, which levels are associated with greater output? * * (ii) Is there an interaction between Yam Type and Water Level? * Between Yam Type and Field? What do the interaction plots look * like? What do the interactions mean? * * (iii) Conventional wisdom in the tribal areas is that while the * `New Improved' (but more expensive) yam can have impressive yields, * it does not do as well under drought conditions. Is this * conventional wisdom reasonable? * *******************************************************************; title 'SPLIT-PLOT ANALYSIS OF YAM TYPE - YOURNAME'; options ls=75 ps=60 pageno=1 nocenter; * Attach descriptive names to levels of two of the factors by the ; * use of `user-defined formats' ; * Note the initial $ for translations of a text-valued code as ; * opposed to a numerical code; proc format; value $yamfmt 'New'='New_Improved_Yam' 'Old'='Standard_Yam' Other='???'; value wlevfmt 1='Dry' 2='Medium' 3='Wet' Other='???'; run; data yams; input Yamtype$ Field$ z1-z3; array zz(3) z1-z3; * Yam output will be yy. Write records with Waterlev=1,2,3 * for the three waterlevel conditions:; do Waterlev=1 to 3; yy=zz(Waterlev); output; end; drop z1-z3; * Tell SAS to display `Waterlev' as `Dry Medium Wet' ; * instead of 1,2,3 ; format Waterlev wlevfmt.; * Tel SAS to use the full descriptive names of the two ; * Yam types other than just `New' and `Old'; format Yamtype yamfmt.; datalines; New Field1 120 147 140 New Field2 122 145 157 New Field3 125 160 135 New Field4 127 144 150 New Field5 134 155 165 Old Field1 131 141 136 Old Field2 140 129 139 Old Field3 139 134 138 Old Field4 136 140 136 Old Field5 132 133 137 ; proc print; title2 'THE DATA AS SAS SEES IT'; title3 "NOTE THAT SAS USES THE `PROC FORMAT' NAMES"; title4 " AND NOT THE NAMES AS CODED."; run; *******************************************************************; * To do a split-plot analysis, list the 5 effects (A, B, C, A*B, A*C) * to be tested in the model statement. With one observation per * cell, the remaining two effects (B*C and A*B*C) will be recruited * for the SSE used estimate the error variance. *******************************************************************; proc glm; title2 'SPLIT-PLOT ANALYSIS OF YAM TYPE'; title3 'THE YAM-TYPE MAIN EFFECT IS SIGNIFICANT, BUT THIS IS DWARFED'; title4 ' BY THE YAMTYPE*WATERLEVEL INTERACTION'; title5 'THIS SUGGESTS THAT THE YAM-TYPE MAIN EFFECT MAY TURN OUT TO BE'; title6 ' AN ARTIFACT OF THE EXPERIMENTAL DESIGN'; classes Yamtype Waterlev Field; model yy= Yamtype Waterlev Field Yamtype*Waterlev Yamtype*Field; * Display and compare levels of the three main effects; means Yamtype Waterlev Field / duncan; run; *******************************************************************; * Display interaction plots for the two interactions being tested, * namely Yamtype*Waterlev and Yamtype*Field. * In general, always put the factor with more numerous levels on the * X axis. * Display the Yamtype*Waterlev interaction first. * Note the use of `vpos' to limit the vertical range of lineprinter * plots. *******************************************************************; proc means nway noprint data=yams; classes Yamtype Waterlev; var yy; output out=pmeans mean=MeanYield; run; proc plot; plot MeanYield*Waterlev=Yamtype / vpos=30; run; *******************************************************************; * Now the Yamtype*Field interaction. This may be of greatest interest * to members of the these tribes. *******************************************************************; proc means nway noprint data=yams; classes Yamtype Field; var yy; output out=pmeans mean=MeanYield; run; proc plot; plot MeanYield*Field=Yamtype / vpos=25; run;