HOMEWORK #3 due Wednesday 4-21
NOTE: Organize your homework in the following order:
title
statement so that your name
will appear at the top of each output page.
title2
statement to make
it clearer what output pages belong to what problem.
Problem 1. A gasoline refinery manager wants to compare the efficiency of two different formulations of gasoline under three different seasonal conditions and for four different types of automobile. The resulting efficiencies are recorded in Table 1.
Table 1 --- Gasoline Efficiencies for Two Blends for different Seasons and Automobiles Blend1 Blend2 Fall Winter Summer Fall Winter Summer Auto1 191 206 183 156 172 179 Auto2 192 209 202 179 187 190 Auto3 159 197 188 215 204 243 Auto4 187 188 203 248 245 231The manager is primarily interested in overall efficiency of the two blends (factor A=Blend) for different levels of two blocking effects, B=Season and C=Automobile. He considers using a split-plot design for the principal factor A and two blocking factors B and C.
(ii) Using the effects B*C and A*B*C to estimate the error (whether they
are reasonable or not), do the two gasoline blends different
significantly, averaged over Season and Automobile? Are there significant
main effects for Season or Automobile? What are the P-values of the
significant effects? For any main effect that is significant, which levels
are associated with the smallest and largest estimated values of
efficiency? (Hint: See SplitPlot.sas
on the Math420
Web site. In the proc glm model statement, either list all three main
effects first before listing the two interactions with A=Blend, or else
ignore the Type I table in the proc glm output and use the
Type III table for P-values.)
(iii) Are there significant interactions between Blend and either Season
or Automobile? What are the P-values of the significant interactions? For
any interaction that is significant, what does the interaction plot look
like? What does the interaction mean?
(Hint: If you say proc plot; plot
MeanYeff*XX=Blend; run;
for an interaction plot, then SAS will use
the first letter of the levels of Blend as the plotting symbol. Make sure
that you define names for the two levels of Blend such that the first
letter of the level names are distinct. Otherwise, either the interaction
plot with Blend as the plotting symbol will be uninterpretable, or else
you will have to go to the trouble of defining a separate plotting-symbol
variable for Blend that does have distinct first letters.)
Problem 2. An experimenter wants to test the effect of 8 factors, which he calls A B C D E F G H, on a response variable YY associated with an industrial process. He can afford to do 16 runs and decides to use a 2_{IV}^{8-4} fractional factorial design. The High/Low settings of the eight factors and the output that he measures are in the following table.
Table 2. Output of an industrial experiment with High/Low settings of 8 factors Obs A B C D E F G H YY --------------------------------------- 1. -1 -1 -1 -1 -1 -1 -1 -1 69.8 2. 1 -1 -1 -1 1 1 1 -1 89.8 3. -1 1 -1 -1 1 1 -1 1 62.6 4. 1 1 -1 -1 -1 -1 1 1 83.0 5. -1 -1 1 -1 1 -1 1 1 73.1 6. 1 -1 1 -1 -1 1 -1 1 79.5 7. -1 1 1 -1 -1 1 1 -1 123.5 8. 1 1 1 -1 1 -1 -1 -1 89.3 9. -1 -1 -1 1 -1 1 1 1 41.7 10. 1 -1 -1 1 1 -1 -1 1 33.4 11. -1 1 -1 1 1 -1 1 -1 72.7 12. 1 1 -1 1 -1 1 -1 -1 50.2 13. -1 -1 1 1 1 1 -1 -1 44.8 14. 1 -1 1 1 -1 -1 1 -1 87.8 15. -1 1 1 1 -1 -1 -1 1 32.1 16. 1 1 1 1 1 1 1 1 83.8Note that the columns for A B C D are in Yates order. Since this is a 2_{IV}^{8-4} design, the settings under E F G H are the same as E=ABC, F=ABD, G=ACD, and H=BCD. (Hint: See
FracFac84.sas
on
the Math420 Web site for a similar analysis of a 2_{IV}^{8-4} design.
Recall that this design confounds the 2^8=256 possible effects involving
the 8 factors into 16 groups with 16 effects each. Of these, the main
effects A B C D E F G H are confounded only with 3-way or higher
interactions, and the 8*7/2=28 two-way interactions are confounded
together into 7 groups with 4 two-way and 11 higher-order interactions
each. A complete independent (that is, unconfounded) set of variables is A
B C D E F G H AB AC AD BC BD CD ABCD.)
(i) Find the parameter estimates for all 15 effects in the design (other than the intercept) and sort them in decreasing order. Do any of the estimates appear to stick out on the high or low side?
(ii) Do the two-way interactions AB AC AD BC BD CD and four-way
interaction ABCD appear to be relatively small in comparison with the
largest estimates in absolute value? Do a regression analysis of the 8
main effects A B C D E F G H using the 7 two-way and four-way interactions
to estimate the error. Which of the main effects are significant? What are
the P-values of the significant effects? (Hint: In proc
reg
and proc glm
in SAS, any effects that you do not
list in the model statement are used for error.)
(iii) Construct normal probability plots and P-P plots of the 15 effect estimates in part (i). Do any appear to be outliers? Which effects do they correspond to?
(iv) Assume that the three outliers that you identified in part (iii) are active and the 5 other factors are inert. Do a 2^3 factorial design (with 2 observations per cell) analysis of the data in Table 2 using the three active factors. Which of the main effects in this analysis are significant? Which of the interactions? Find the P-values of the significant effects. Is this consistent with your answers in parts (ii) and (iii)? Which analysis would you trust more?
Problem 3. The same experimenter does a second experiment under different conditions with a similar set of 8 factors, which he also calls A B C D E F G H, on a response variable ZZ for a second industrial process. Again, he can afford to do 16 runs and decides to use a 2_{IV}^{8-4} fractional factorial design. The High/Low settings of the eight factors and the output that he measures are in the following table.
Table 3. Output of an industrial experiment with High/Low settings of 8 factors Obs A B C D E F G H ZZ --------------------------------------- 1. -1 -1 -1 -1 -1 -1 -1 -1 74.8 2. 1 -1 -1 -1 1 1 1 -1 94.8 3. -1 1 -1 -1 1 1 -1 1 85.6 4. 1 1 -1 -1 -1 -1 1 1 102.0 5. -1 -1 1 -1 1 -1 1 1 86.1 6. 1 -1 1 -1 -1 1 -1 1 88.5 7. -1 1 1 -1 -1 1 1 -1 106.5 8. 1 1 1 -1 1 -1 -1 -1 80.3 9. -1 -1 -1 1 -1 1 1 1 50.7 10. 1 -1 -1 1 1 -1 -1 1 50.4 11. -1 1 -1 1 1 -1 1 -1 63.7 12. 1 1 -1 1 -1 1 -1 -1 37.2 13. -1 -1 1 1 1 1 -1 -1 65.8 14. 1 -1 1 1 -1 -1 1 -1 104.8 15. -1 1 1 1 -1 -1 -1 1 67.1 16. 1 1 1 1 1 1 1 1 118.8See the comments after Table 2 in Problem 2.
(i) Find the parameter estimates for all 15 effects in the design (other than the intercept) and sort them in decreasing order. Do any of the estimates appear to stick out on the high or low side?
(ii) Construct normal probability plots and P-P plots of the 15 effect estimates in part (i). Do any appear to be outliers? Which effects do they correspond to?
(iii) The outliers that you identified in part (ii) should be consistent with 3 active factors with the remaining factors inert. Analyze the corresponding 2^3 factorial design (with 2 observations per cell) for the data in Table 2 using the three active factors. Which of the main effects in this analysis are significant? Which of the interactions? Find the P-values of the significant effects. Is this consistent with your answers in part (ii)?
Problem 4. (i) Use the Box-Meyer program BoxMeyer.exe on the Math420 Web site to analyze the data in Table 2. What 2^1, 2^2, or 2^3 submodels have the highest F-statistics? the highest Bayesian posterior probabilities? What factors have the highest Bayesian posterior probabilities of being active? Are these results consistent with your answers to Problem 2?
(ii) Use the Box-Meyer program to analyze the data in Table 3. What submodels have the highest F-statistics? the highest Bayesian posterior probabilities? What factors have the highest Bayesian posterior probabilities of being active? Are these results similar to your answers to Problem 3?
(iii) In in 2_IV^{8-4} design, what are the other three two-way interactions are confounded with (for example) CD? If e.g. CD is significant, is it easy to tell that this is due to CD and not to one of the other three interactions? (Hint: From two of the defining relations G=ACD and H=BCD, one concludes CD=BH=AG, so that you just have to find one more.)
(iv) In part (ii), why do four different models have similar high values for the F-statistic and for the Bayesian posterior probability, even though three of the four models have a factor that does not show up as significant in Problem 2? (Hint: In a 2_IV^{8-4} design, show that the relation (for example) F=ACD implies that the seven effects A,C,D,AC,AD,CD,ACD are identical with the seven effects A,C,F,AC,AF,CF,ACF after a permutation. This implies that the two models ACD and ACF would have the same F-statistics and the same Bayesian posterior probability.)
Problem 5. A different experimenter studies wants to test the effect of 4 factors, which she calls A B C D, on a response variable Yield associated with an industrial process. She can afford to do 12 runs with different High/Low settings of A B C D and decides to use a Plackett-Burman design. The results are in Table 4. Recall that only the first four factors (A B C D) were used for High/Low settings.
Table 4. Output of an industrial experiment with High/Low settings of 4 factors Row A B C D e f g h j k l Yield 1 1 -1 1 -1 -1 -1 1 1 1 -1 1 88.5 2 1 1 -1 1 -1 -1 -1 1 1 1 -1 37.2 3 -1 1 1 -1 1 -1 -1 -1 1 1 1 106.5 4 1 -1 1 1 -1 1 -1 -1 -1 1 1 104.8 5 1 1 -1 1 1 -1 1 -1 -1 -1 1 37.2 6 1 1 1 -1 1 1 -1 1 -1 -1 -1 80.3 7 -1 1 1 1 -1 1 1 -1 1 -1 -1 67.1 8 -1 -1 1 1 1 -1 1 1 -1 1 -1 65.8 9 -1 -1 -1 1 1 1 -1 1 1 -1 1 50.7 10 1 -1 -1 -1 1 1 1 -1 1 1 -1 94.8 11 -1 1 -1 -1 -1 1 1 1 -1 1 1 85.6 12 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 74.8
(i) Find the parameter estimates for all 11 effects in the design (other than the intercept) and sort them in decreasing order. (We would expect at least 7 of them to be random.) Construct normal probability plots and P-P plots of the 11 effect estimates. Are any outliers noticeable? Can extreme effects be easily picked out?
(ii) Run the Box-Meyer program on the data in Table 4, using all 11 columns as a control for the first 4 columns. What are the highest-ranking 2^1, 2^2, 2^3 submodels of the 11 factors in terms of the highest F-statistic? in terms of the highest Bayesian posterior probability? Do any factors stick out as having a noticeably larger posterior probability of being active?
(iii) Analyze a 2^3 design for the data in Table 4 using the three factors in part (ii) that have the highest posterior probabilities of being active. What effects in that design are significant? What are the P-values of the significant effects? Is this consistent with what you observed from the normal plots?