TAKEHOME FINAL due before Wed 5-7 at 4:00 pm
(Return to Prof. Sawyer or
to math receptionist in Cupples I Room 100.)
NOTE: There should be NO COLLABORATION on the takehome final,
other than for the mechanics of using the computer.
Open textbook and notes (including course handouts).
In general where the results of a statistical test are asked for,
(i) EXPLAIN CLEARLY what the hypotheses H0 is and what
alternative you are testing against,
(ii) find the P-value for the test indicated (and state what test
you used), and
(iii) state whether the results are significant (P<0.05), highly
significant (P<0.01), or not significant (P >= 0.05). If the P-value
is based on a Student's t or Chi-square or F distribution, also give the
degrees of freedom. (WARNING: An F distribution has TWO degrees of
freedom, one for the numerator and one for the denominator.)
ORGANIZE YOUR WORK in the following manner:
(i) your answers to all questions,
(ii) all your SAS programs, and
(iii) all your SAS output.
ADD CONSECUTIVE PAGE NUMBERS to part (iii) of your homework so that you can make references from part (i) to part (iii). For example, so that you can say things like, ``The answer in part (a) is 57.75. The scatterplot for part (b) is on page #Y below.'' It may be clearest to write page numbers yourself on the SAS output.
Different parts of problems may not be equally weighted.
4 problems.
Problem 1. An international baseball organization conducts a survey to compare the throwing expertise of catchers in a sample of Little League teams distributed among 4 Leagues. Proficiency scores for making an accurate throw from home to second base were made for 3 catchers on each team. The international organization want to know where most of the variation of catcher throwing skills is located: between leagues, among teams within leagues, or a combination of both. The survey data is in Table 1.
Table 1 --- Catcher throwing proficiencies by Team and League League1 Team1 71 68 75 Team2 52 57 63 Team3 74 67 78 Team4 76 91 71 League2 Team1 56 54 57 Team2 70 66 64 Team3 71 62 62 League3 Team1 70 50 64 Team2 59 61 74 Team3 53 65 57 Team4 62 59 72 Team5 69 80 65 Team6 56 76 74 Team7 64 62 49 Team8 61 73 48 Team9 47 57 51 League4 Team1 74 78 62 Team2 78 76 73 Team3 64 54 50 Team4 70 68 66 Team5 65 72 73Note that ``Team1'' does not refer to the same team in different leagues, which might be in different parts of the world, but only to the first team in that league that happened to send its catcher scores in to the international organization. Treat the three observations for each team as an independent sample for that team.
Problem 2. An engineer is interested in the resonant frequency of a mechanical device as a function of three variables: Pressure, with three levels (Press1,Press2,Press3), Drubness, with two levels (Drub1,Drub2), and Abrasiveness, with three levels (Abr1,Abr2,Abr3). The resonant frequencies of two devices are measured for each set of levels of the three variables. The resulting frequencies are listed in Table 2.
Table 2. Resonant frequencies of a Device Press1 Press2 Press3 Drub1 Drub2 Drub1 Drub2 Drub1 Drub2 Abr1 3839 3202 326 117 5950 1254 357 1550 484 227 1915 2924 Abr2 1313 3202 276 368 1574 8814 530 538 1046 1128 1373 2795 Abr3 2097 6417 374 429 3614 1293 238 2476 201 886 1803 1647
Problem 3. An experimenter wants to test the effect of 6 factors, which she calls A B C D E F, on a response variable YY associated with an industrial process. She can afford to do 12 runs and decides to use the Plackett-Burman PB_12 design, which includes 5 additional columns G H J K L that she does not use. The High/Low settings of the design and the output that she measures are in the following table.
Table 3. Output of an experiment with High/Low settings for 6 factors Rows A B C D E F G H J K L Response 1. 1 -1 1 -1 -1 -1 1 1 1 -1 1 80.8 2. 1 1 -1 1 -1 -1 -1 1 1 1 -1 38.8 3. -1 1 1 -1 1 -1 -1 -1 1 1 1 31.6 4. 1 -1 1 1 -1 1 -1 -1 -1 1 1 116.0 5. 1 1 -1 1 1 -1 1 -1 -1 -1 1 38.1 6. 1 1 1 -1 1 1 -1 1 -1 -1 -1 102.5 7. -1 1 1 1 -1 1 1 -1 1 -1 -1 62.5 8. -1 -1 1 1 1 -1 1 1 -1 1 -1 60.3 9. -1 -1 -1 1 1 1 -1 1 1 -1 1 68.7 10. 1 -1 -1 -1 1 1 1 -1 1 1 -1 120.4 11. -1 1 -1 -1 -1 1 1 1 -1 1 1 49.7 12. -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 47.2
(i) Use the Box-Meyer program on the Math420 Web site to find the most likely choice of active factors for the data in Table 3. Analyze all 11 factors in Table 3 together as a control for the 6 factors whose values were actually varied in the data. Recall that one of the advantages of the screens based on the F-statistic and on the Bayesian-model-posterior-probabilities is that they are valid across different model sizes. (That is, 2^1, 2^2, and 2^3 models can be sorted together.) Which submodel has the highest F-statistic? Is it unique? Which submodel has the highest Bayesian posterior probability for submodels? Is it unique? What individual factors (not submodels) have the highest Bayesian posterior probabilities of being active?
(ii) The Box-Meyer program uses model prior probabilities that depend on two parameters: pi, which is the prior probability that any particular factor is active independently of the other factors, and gamma, which is a function of the estimated selection coefficient of active factors to inert factors. The default settings of the program at pi=0.25 and gamma=2.50. Are your conclusions from part (i) robust with respect to these parameters? Re-run the program with the settings pi=0.10, pi=0.80, gamma=1.0, and gamma=11.0 (four different choices of settings). Are your conclusions similar to part (i)? Are the factors that are most likely to be active the same?
(iii) Assume that the submodel that you identified in part (i) contains the active factors and that all of the other factors are inert. Do a 2^3-like factorial design analysis of the data in Table 3 using these active factors. Which of the main effects in this analysis are significant? Which of the interactions? Find the P-values of the significant effects. Is this consistent with your answers in part (i)? (Warning: Since these factors and their interactions are not orthogonal in the 12-run Plackett-Burman design, do not do a linear regression on the High/Low settings. Instead, treat the three factors as categorical or class variables and do a 3-factor full factorial analysis.)
Problem 4. Assume that the Low/High settings for the data in Table 3 were given by
Table 4. Factors and High/Low settings for the data in Table 3 Factor Low High A 2.0 4.0 B 11.0 13.0 C 15.0 18.0 D Light Dark E Stirred Not stirred F 0.01 0.03The experimenter wants to use the information in Tables 3 and 4 to improve the value of the response variable by changing the settings of the three active factors that were found in Problem 3. The other (inert) factors are fixed at their Low setting.
(i) Find the parameter estimates for all 11 factors in the design and sort them in decreasing order. Do any of the estimates appear to stick out on the high or low side? Are these the active factors that were predicted in Problem 3?
(ii) The experimenter wants to carry out additional runs at 5 additional settings in order to increase the value of the response variable. The values that she predicts at new settings will be the estimated regression equation in part (i), with the intercept and the regression coefficients of active factors rounded to two significant figures, and the regression coefficients of inert factors replaced by zero. What is the equation for the response variable at different values of the factors listed in Table 4 that she predicts? (See Section 12.1 in the text for a similar analysis.)
(iii) In general, the direction of fastest ascent of a function w=f(x,y,z) is given by the gradient of f(x,y,z) at (x,y,z). For example, if w=10+2*x+3*y+4*z, the direction of fastest ascent is the vector (2,3,4). Which is the vector direction of fastest ascent for the linear function in part (ii)? If the value of the first active variable is increased by one, by how much should the second and third active variables be changed to stay on the line of fastest ascent?