Math 420 Homework 2 - Spring 2008

  • Click here for Math420 home page
  • Click here for Math420 homework page
  • Click here for Prof. Sawyer's home page

  • Text references are to
    Statistics for Experimenters: Design, Innovation, and Discovery, 2nd edition,
    G. Box, J. S. Hunter, and W. G. Hunter, John Wiley and Sons, 2005, ISBN 978-0471-71813-0

    HOMEWORK #2 due Wednesday 3-26

    NOTE: Organize your homework in the following order:

    (i) your answers to all questions (written answers, not SAS output),
    (ii) all of your SAS programs, and
    (iii) all of the SAS output that you got.
    Add page numbers to your homework so that you can make references from part (i) to part (iii): for example, so that you can say things like, ``The answer in part (a) is 17. The scatterplot for part (b) is on page #Y below.'' Include your SAS output even if you don't refer to it explicitly. Except for forward references like these, a grader should not have to look beyond part (i) of your homework unless he or she thinks that you have done something wrong.
    Include your name in a title statement so that your name will appear at the top of each output page.
    Include the Problem Number in a title2 statement to make it clearer what output pages belong to what problem.
    If a problem asks you to do a statistical test, EXPLAIN CLEARLY what the null hypothesis H_0 is, what test was used, what the P-value is, and whether the data is significant, highly significant, or neither. Include this as part of your answer in part (i).
    
    

    Problem 1. (See also Table 4.14 p164 in the text)

    A chemical manufacturer wants to see if there is a significant difference in conversion rates of a chemical process among five different pressure levels. Unfortunately, only three different runs can be done on any one day, and it known that the response of the apparatus varies by day. Experimental runs were carried out over 10 days with experiments at three of the five different pressure levels on each day, in a randomized order in each day. The pressures and conversion rates for each day are given in Table 1, which is called a balanced incomplete block design (BIBD).
        Table 1. Conversion rates at different pressures over 10 days
         Day     Pressures  (Dashes indicate missing observations)
        ------------------------------------------------
                 250     325     400     475     550    
        ------------------------------------------------
          1       16      18      --      32      --
          2       19      --      --      46      45  
          3       --      26      39      --      61  
          4       --      --      21      35      55  
          5       --      19      --      47      48  
          6       20      --      33      31      --
          7       13      13      34      --      --
          8       21      --      30      --      52  
          9       24      10      --      --      50  
         10       --      24      31      37      -- 
    Note that each Pressure occurs in 6 days, and each pair of Pressures occurs together on 3 days. Answer the following questions, using SAS if convenient:
    (i) How do the average conversion rates vary as a function of the five different pressure levels? Does it look like conversion rates might vary significantly with pressure? How do the average conversion rates vary by day? Does it look like conversion rates might also vary significantly as a function of day? (Hint: If you are using SAS, use proc means.)
    (ii) Is there significant variation in conversion rates as a function of pressure, IGNORING the effects of Day? What is the value of the F statistic? This may not be the complete answer, since it ignores possible confounding effects with Day.
    (iii) Is there significant variation in conversion rates as a function of pressure, AFTER ALLOWING FOR the day effect? What is the value of the F statistic for this test? Is there a significant day effect, after allowing for pressure? (Hint: Look at the Type III table in a two-way ANOVA analysis using proc glm. Since BIBDs are not orthogonal designs, the Type I and Type III tables may be different.)
    
    

    Problem 2. (See Problem 19 page 232 in the text.) In order to reduce the amount of a pollutant, the waste stream of a small factory into a previously pristine mountain stream must be treated. State law requires that the average amount of this pollutant per day cannot exceed 10 pounds. Eleven (11) runs were made with various settings of three factors, Brand (of a pre-treatment chemical), Temperature, and Stirring rate. As the table in the problem indicates, the 11 runs amounted to a 2^3 design for High,Low settings for the three factors and an additional 3 runs at an average setting for all three factors. (The intermediate setting of the pre-treatment chemical was a 50-50 mix of both brands.)

    (i) What main effects of the factors are significant in a 2^3 analysis of the three factors? What interactions? (If you give a list, make sure to say that the list is exhaustive; that is, that you say that there are NO OTHER significant effects.) What are the P-values of the significant effects?
    (ii) Create an interaction plot for all significant two-way interactions. What can you conclude about the interaction from the interaction plot?
    (Hints: Try using proc reg in SAS with Low,High coded as -1,+1 and the intermediate setting as 0. That will give you 11 observations for 8 parameters, so that you should be able to get P-values for all effects. WARNING: The interaction plots will involve three levels of each factor, High, Intermediate, and Low. Use L,M,H as the plotting symbol and values for Low, Medium, High along the X axis that SAS will not permute as it alphabetizes the values, such as -1 0 +1 or A B C.)
    
    

    Problem 3. Consider the data in Table 2 on the amount of unburned carbon in engine exhaust.

        Table 2. Unburned Carbon in 8 runs with 4 factors (see text p275)
        ------------------------------------------------
             Run    A   B   C   D   Yield
        ------------------------------------------------
               1   -1   1   1   1    8.2
               2   -1  -1   1  -1    1.7
               3   -1  -1  -1   1    6.2
               4    1  -1  -1  -1    3.0
               5    1  -1   1   1    6.8
               6    1   1   1  -1    5.0
               7   -1   1  -1  -1    3.8
               8    1   1  -1   1    9.3 
    where -1, 1 represent the Low and High levels of that factor.
    Note that ABCD=-I for runs in the table. Thus Table 1 is a 2^{4-1} design with defining relation ABCD=-I or, equivalently, with the confounding relation D=-ABC. Note that this is D=-ABC and not D=ABC.
    (i) Find the differences in means between the Low and High levels for each of the four factors A, B, C, D. Which two factors have the largest differences in means? (Hint: Try something like proc means; classes A B C D; Ways 1; var Yield; run;   to get everything on one page.)
    (ii) Find the parameter estimates for a full factorial 2^3 model for A, B, and C with D=-ABC substituted for ABC. Which of the parameter estimates are largest in absolute value? Is this consistent with your answer to part (i)?
    (iii) Construct normal probability plots and P-P plots of the seven effect estimates in part (ii). Note that either the three highest values or the three lowest values can appear to be outliers. Which do you think is more likely?
    (iv) Do an ANOVA analysis of the data in Table 1 using the four main effects in the model and all other effects as the error. Which of the main effects are significant? What are the P-values of the significant factors? Is this consistent with your answers in parts (ii) and (iii)?
    (v) For each of the main effects that are significant in part (iv) or have large parameter estimates in part (ii), is the higher level of that factor consistent with more unburned carbon in the exhaust or less unburned carbon?
    
    

    Problem 4. Consider the data in Table 3:

        Table 3. Data from a 2^{5-1} design with 5 factors (see text p276)
        ------------------------------------------------
         Run    A     B     C     D    E   Yield
        ------------------------------------------------
          1    -1    -1    -1    -1    1    14.8
          2     1    -1    -1    -1   -1    14.5
          3    -1     1    -1    -1   -1    18.1
          4     1     1    -1    -1    1    19.4
          5    -1    -1     1    -1   -1    18.4
          6     1    -1     1    -1    1    15.7
          7    -1     1     1    -1    1    27.3
          8     1     1     1    -1   -1    28.2
          9    -1    -1    -1     1   -1    16.0
         10     1    -1    -1     1    1    15.1
         11    -1     1    -1     1    1    18.9
         12     1     1    -1     1   -1    22.0
         13    -1    -1     1     1    1    19.8
         14     1    -1     1     1   -1    18.9
         15    -1     1     1     1   -1    29.9
         16     1     1     1     1    1    27.4 
    (i) Verify that this is a 2^{5-1} design with generating relation ABCDE=I. (Hint: The most direct, but also the most error-prone, way is to verify the relation ABCDE=1 for all 16 rows. A safer way is to use Table 6.14a p258 in the text, which has the 16x16 table for a full-factorial 2^4 design. Verify that the signs above are the same as for the columns a,b,c,d, and abcd. This implies E=ABCD and hence ABCDE=(ABCD)(ABCD)=I.)
    (ii) Find the parameter estimates for the 15 effects involving A,B,C, and D in Table 2 with E in place of ABCD. Sort them by decreasing values of the parameter estimates and display them. Which effects have the largest parameter estimates in absolute values? (Hints: Since ABCDE=I, main effects are confounded with 4-way interactions and two-way interactions with 3-way interactions. Thus a complete set of effects are the 5 main effects, the 5*4/2=10 two-way interactions, and the intercept, so that it is sufficient to consider the 5 main effects and the 10 two-way interactions.)
    (iii) Construct normal probability and P-P plots of the 15 effect estimates in part (ii). Do 4 of the 15 effects appear to be outliers? Are these the 4 largest effects that you found in part (ii)?
    (iv) Consider the 3 factors in the top 4 effects as active and the remaining two factors as inert. Analyze the resulting 2^3 design on the three active factors with two observations per cell. Which of the 7 effects involving these three factors are significant? What are the P-values of the significant effects? Are these the same as the 4 effects in part (iii)?
    (v) The smallest (largest negative) effect and the 5th largest effect in the normal plots also look like they might be suspicious. If the 4th largest effect is dropped and these other two effects included, a different set of three factors appear to be active with the other two inert. (This set of 3 factors overlaps with two of the factors in part (iv).) Do the same analysis as in part (iv). Which of the 7 effects involving these three factors are significant? What are the P-values of the significant effects? Are the results of this analysis consistent with part (iv)? Did you find any new significant effects?
    
    

    Problem 5. (Taken from Problem 16 p278 in text) An experimenter performs a 2^{5-2} analysis with 5 factors A B C D E with confounding relations D=ABC and E=AC. After analyzing the results from this design, she decides to carry out a second 2^{5-2} design with exactly the same design matrix as the first design but with the signs changed for the main effect of C. (That is, C=Low replaced with C-High in each run and vice versa.)

    (i) How many runs does the first design take?
    (ii) What is a complete set of confounding relations at I for the first design? (The number N of relations, including I=I, must be a power of 2 such that 2^5/N is the number of runs in part (i). The confounding relations D=ABC and E=AC implies that three of these relations are ABCD=ACE=I. Thus, for example, the main effect of A is confounded with A=A(ABCD)=BCD, A=A(ACE)=CE, and A times one more generating string, and no other effects.)
    (iii) What is the resolution of the first design? Why?
    (iv) What are the generating relations of the second design? What is its resolution?
    (v) Find the single generating resolution of the combined design. What is the resolution of the combined design?
    (vi) Show that, in the combined design, the main effects of A and C are confounded only with 4-way interactions, and all two-way interactions involving A or C are confounded only with 3-way and higher interactions. Could this be a reason why the second design was chosen?
    
    

    Problem 6. (Taken from Problem 19 p279 in text) An experimenter wants to investigate High,Low states of five factors, temperature (T), concentration (C), pH (P), agitation rate (R), and catalyst type  (K, for catalysts K1 or K2) using 8 runs. She is concerned about possible T*C and T*K interactions, but believes that any other interactions will be small. Find a 2^{5-2} design such that

    (i) None of the main effects T C P R K are confounded with any other main effect,
    (ii) None of the main effects of T C P R K are confounded with either T*C or T*K, and
    (iii) T*C and T*K are not confounded with one another.
    (Hint: There are only a limited number of 2^{5-2} fractional factorial designs. Given dummy names a b c ab ac bc abc for the seven columns, assign 3 of the factors to a b c and decide how to alias the other two.)
    
    

  • Top of this page