***********************************************; * An ANCOVA or ANOCOVA (`Analysis of Covariance') model is a model of * the form * * yy = mu0 + mu(treatment) + beta*xx + error * * where ``treatment'' corresponds to treatment groups as in a one-way * ANOVA and xx is a numerical covariate. This has aspects of both * an ANOVA and a linear regression on a continuous covariate. * * An ANCOVA could be viewed as a regression on the residuals of a * one-way ANOVA, or, alternatively, as a one-way ANOVA on the * residuals of a regression. * * An extension of the model would allow the slope `beta' to depend on * the treatment group. This is called an ANOCOVA with an `interaction'. * In this case, both mu and beta can vary with the treatment group, * so that this amounts to separate simple regressions * * Y_ij = mu_i + beta_i*X_ij + e_ij * * within each treatment group. * * As an example, suppose that each of three groups of lambs is given a * different brand of lamb chow. We test the three lamb chows by * weighing the lambs when they are one year old. * * We are told that the lambs were randomly assigned to the three * treatment three groups, but are not genetically uniform. In * particular, there might be genetic reasons why some lambs are * heavier when they are one year old. If the treatment effect (lamb * chow) turns out to be significant, it might be because of * accidental correlations between these genetic effects and the * assignment to treatment groups. * * Alternatively, we can attempt to correct for this genetic effect by * using the maternal weight at birth of the ewes that gave birth to * the lambs. This would give two factors in the model, the lamb-chow * treatment group and maternal weight, leading to an ANCOVA model * of the type above. * * For historical reasons, a linear model that has treatment groups * as well as linear regression terms is called an ANCOVA or ANOCOVA, * for ``Analysis of Covariance''. * * The idea of this example was adapted from D.M. Montgomery, * ``Design and Analysis of Experiments'', John Wiley & Sons 1976 * * * Note the use of `proc format' below to assign descriptive names to * the VALUES of a coded variable, here `chow'. In contrast, a `label' * statement assigns descriptive names to the NAMES of variables. * * In the datalines block, feed codes are entered as 1,2,3. The * statement * * format chow feedfmt. * * in the data step tells SAS to expand codes 1,2,3 in the `chow' * column to the three brand names. SAS views this as analogous to * determining the format in which numbers or text strings are * displayed, and describes using `proc format' in this way as * creating a ``user-defined format''. * * The name of a user-defined format in a format statement as above * must always end with a period, as is the case here, so that SAS * will know what is a variable name and what is a format name. * * * In this case, the codes are numeric (1,2,3) and are expaneded to text. * If the codes were text values (like 'A','B',C'), precede the format * name by $, as in (for example) * * proc format * value $feedfmt 'A'='AZenith' 'B'='BXQ11' 'C'='Clover7'; * run * ***********************************************; title 'LAMB WEIGHT FOR 3 LAMB CHOWS - YOUR NAME'; options nodate ls=75 ps=60 pageno=1 nocenter; * Use `proc format' to create a user-defined format to expand * codes 1,2,3 to brand names; proc format; value feedfmt 1='AZenith' 2='BXQ11' 3='Clover7'; run; * Values are read in pairs from the datalines block, since the * main data values are pairs (Y,X) for lamb yearling and * maternal ewe weights. ; data lambs; retain chow feednum; input xx$ yy @@; if xx= 'Feed' then do; chow=yy; feednum=yy; end; else do; lambwt=input(xx,12.0); matwt=yy; output; end; * Tells SAS to expand chow=1,2,3 to brand names; format chow feedfmt.; * Drop xx yy for neatness, since only (lamwt,chow,matwt) are used; drop xx yy; datalines; Feed 1 45 98 47 93 46 89 45 101 56 127 56 99 43 82 38 81 44 91 46 95 47 103 46 87 48 84 50 106 48 89 Feed 2 52 92 48 99 54 111 45 102 50 91 46 105 47 82 56 103 44 81 46 96 53 109 49 103 54 120 53 120 53 95 Feed 3 45 116 55 99 56 111 58 85 49 110 55 94 50 105 54 110 51 104 51 99 55 91 43 105 45 99 50 104 55 122 ; proc print; title2 'THE DATA AS SAS SEES IT'; run; proc glm; title2 'ONE-WAY ANOVA FOR WEIGHT ON FEED BRAND'; title3 'THIS IS (BORDERLINE) SIGNIFICANT, BUT IS IT THE ENTIRE STORY?'; title4 'NOTE THAT THE MODEL RSQUARE IS ONLY 0.166'; class chow; model lambwt=chow; run; proc glm; title2 'TRY AGAIN WITH MATERNAL WEIGHT INCLUDED (AN ANCOVA MODEL)'; title3 'NOW LAMB WEIGHT SEEMS TO DEPEND MOSTLY ON MATERNAL WEIGHT.'; class chow; model lambwt=chow matwt; run; proc means n mean std; title2 "WILL MEANS AND STANDARD DEVIATIONS GIVE ANY CLUES?"; title3 "NOTE THAT THE CLASS MEANS OF LAMB AND MATERNAL WEIGHTS"; title4 " SEEM TO BE CORRELATED ACROSS FEED TYPE."; class chow; var lambwt matwt; run; proc glm; title2 "FINALLY, ALLOW FOR A FEED*MATWT 'INTERACTION'"; title3 "THIS FITS A SIMPLE REGRESSION WITHIN EACH TREATMENT GROUP"; title4 "THE FEED BRANDS NOW SEEM TO BE TOO HETEROGENEOUS TO COMPARE"; title5 "CAN YOU SEE WHY?"; class chow; model lambwt=chow matwt chow*matwt; run; proc plot; title2 "FINALLY, LET'S SEE WHAT THE DATA LOOKS LIKE"; title3 "(WE SHOULD HAVE DONE THIS FIRST)"; plot matwt*lambwt=chow / vpos=30; run; * Let's look at simple regressions within each feed brand:; proc sort; by chow lambwt matwt; * To be safe; run; * The `by' command tells SAS to generate output for each lamb chow, ; * here as three separate outputs:; * To have high-resolution instead of text plots, ; * omit the `lineprinter' ; options ps=50; proc reg lineprinter; title2 "LET'S LOOK AT SIMPLE REGRESSIONS WITHIN EACH FEED BRAND"; by chow; model lambwt=matwt; * This gives three separate plots of observed versus fitted values; * within each treatment group; plot lambwt*matwt=chow predicted.*matwt=feednum / overlay; run; title2 "WE COULD SHOW FITTED AND OBSERVED VALUES FOR EACH LAMB CHOW, BUT A PLOT"; title3 " WITH THE THREE FITTED REGRESSION LINES TOGETHER GIVES A CLEARER"; title4 " PICTURE OF THE CHOW*MATWT INTERACTION:"; proc glm noprint; by chow; model lambwt=matwt; output p=predicted; run; options ps=40; proc plot; plot predicted*matwt=chow; run; options ps=60; proc corr nosimple; title2 "FINALLY, LET'S LOOK ARE CORRELATIONS WITHIN EACH CHOW:"; by chow; var lambwt matwt; run;