**********************************************************; * Here we consider a two-way layout with data on activity levels * of school children. * We want to test the effect of the drug ritalin on kinds with * hyperactive disorder (HD) and on non-hyperactive (NonHD) kids. * * Ritalin is chemically similar to the amphetamines. It appears to * calm down many kids with HD, apparently because it helps them * to concentrate. In contrast, ritalin causes many normal kids * to become hyperactive (in the everyday sense of the word), * again apparently because it is similar to the amphetamines. * * Thus ritalin should have a positive effect on activity level on * non-HD children but a negative effect on HD children. In * particular, it cannot have the same additive effect on activity * level in the two classes of children. This defines a statistical * interaction between the two factors HD/NonHD and * ritalin/no ritalin. * * The model below is a two-factor ANOVA with factors and levels * * Factor 1: Group Levels: HD and NonHD * Factor 2: Drug Levels: Drug (ritalin) and NoDrug (placebo) * * Asking whether ritalin has a different effect on HD kids is * equivalent to asking if the interaction term (CxD)_{ij} * in the full factorial model * * Yij = mu + C(Group_i) + D(Drug_j) + (CxD)_ij + error(ij) ? * * is significantly from zero. This is equivalent to asking if * there is a Group x Drug interaction. * * Note that we are using the same word (Drug) to denote both the * factor Drug and the level of that factor with Ritalin, but SAS * will be able to tell the difference from the context. Here the * `NonDrug' kids were also given pills (a pill with no effect, or * a PLACEBO) to make sure that being given pills did not have an * effect in itself. * * (A similar placebo-like effect is called the Hawthorne Effect, * which is an effect on performance due to being in a study. The * name comes from an industrial engineering study in Hawthorne, * Illinois, in which it was noted that the `placebo' workers in * the study did better on the average than workers who were not * in the study, apparently because of the psychological effects * of being in a formal study. This is not a factor here since * both group of kids knew that they were in the study.) * * NOTE: A full-factorial ANOVA with two levels for each factor and d * factors is called a 2^d FACTORIAL DESIGN. Thus this model is a * 2^2 factorial design. In a 2^d design, all factors have two * levels, and all effects have one degree of freedom. The total * model degrees of freedom is 2^d-1, or 3 if d=2. * * Since r=s=2 for a 2^2 design, r-1 = s-1 = (r-1)*(s-1) = 1, so that * both the main effects and the interaction have one degree of * freedom. Thus the F-test for all three effects have one degree * of freedom in the numerator. In most cases, an ANOVA SS with * one degree of freedom is the square of a statistic that is * normally distributed. In general, for a 2^2 design with * factors A and B and factor levels A1,A2, B1,B2, one can * consider the dot products of the 4 cell means with the four * matrices * * I A B A*B * B1 B2 B1 B2 B1 B2 B1 B2 * -------------------------------------------------- * 1 1 -1 -1 -1 1 1 -1 * 1 1 1 1 -1 1 -1 1 * -------------------------------------------------- * * The overall mean (which estimates mu) is a constant times the * dot product of I with the cell means. The dot products of * A, B, and A*B with cell means give statistics whose squares * are the same constant times the numerator SSs for the two main * effects and the interaction. * * The data in the example is from the text ``Applied Statistics and * the SAS programming language'', R. Cody and J. Smith, 2004, * 5th edition, Prentice Hall, New Jersey, page 215. * **********************************************************; title 'RITALIN DATA: LOOKING FOR AN INTERACTION - YOURNAME'; options ls=75 ps=60 pageno=1 nocenter; data ritalin; retain Group Drug; input xx$ @@; if xx='HD' or xx='NonHD' then Group=xx; else if xx='Drug' or xx='NoDrug' then Drug=xx; else do; yy=input(xx,12.0); output; end; drop xx; * Give SAS some more detail for output; label Group='HD or NonDH', Drug='Ritalin or NoRitalin'; datalines; HD Drug 51 57 48 55 NoDrug 70 72 68 75 NonHD Drug 67 60 58 65 NoDrug 50 45 55 52 ; proc print; title2 'THE DATA AS SAS SEES IT'; run; ********************************************************; * Display an interaction plot to see what is going on: * We create a new dataset with records for the cell means, then * plot the cell means. ********************************************************; proc means nway noprint; class Group Drug; var yy; output out=cmeans mean=; * The cell means are now also called yy; run; proc print data=cmeans; title2 "PROC MEANS OUTPUT"; run; options ps=30; title2 'INTERACTION PLOT of Drug vs. Group'; title3 'NOTE THAT THE INTERACTION IS STRONGLY VISIBLE'; title4 ' IN THE CHANGE OF SIGN OF H-N (HD minus NonHD)'; proc plot data=cmeans; plot yy*Drug = Group; run; options ps=60; ********************************************************; * Now run the full model to test main and interaction effects. Since * the default dataset is now `cmeans', we must indicate the * original dataset explicitly * * The notation | in `Group | Drug' means the MAIN EFFECTS of * `Group' and `Drug' as well as ALL INTERACTIONS between them, * which is the full-factorial model in this case * * In this case, * * Group | Drug = Group Drug Group*Drug * * which we could also have entered. *********************************************************; proc glm data=ritalin; title2 'FULL FACTORIAL MODEL for two factors'; classes Group Drug; model yy = Group | Drug; run; ********************************************************; * What happens if we forget the interaction and run `proc glm' * anyway? Recall that this throws the interaction term into * the error, which decreases all F-statistics and hence * reduces the significance of all P-values. ********************************************************; proc glm data=ritalin; title2 'MAIN EFFECTS only'; title3 '(THIS IS A BAD IDEA HERE since it includes the INTERACTIONS'; title4 ' with the error)'; title5 "NOTE THAT nothing is significance now due to the"; title6 " inflated SSE term."; classes Group Drug; model yy=Group Drug; run; ********************************************************; * To compare cell means, we have to run a one-way ANOVA on cells. * Since the full-factorial model is equivalent to a one-way ANOVA * on cells, the Model F-test and Model R^2 will be the same. ********************************************************; data ritalin; set ritalin; * Re-read records from the current dataset; cell=Group || Drug; * Add a new field; run; proc glm data=ritalin; title2 'ONE-WAY ANOVA on cells for two factors'; title3 'Which COMBINATIONS of factor levels are different?'; class cell; model yy=cell; means cell / duncan; run;