TAKEHOME FINAL due Wednesday May 9 by 5:30 PM
Text references are to Hollander and Wolfe, ``Nonparametric Statistical
Methods'', 2nd edn.
IN THE FOLLOWING: Do Problems 2 and 3 by hand. Problems 1, 4,
and 5 can be done either by hand or by using one or more computer
programs.
NOTES: Hand in your homework in the order
(a) Your written answers to all problems,
with references as needed to part (c) below,
(b) The computer source for any computer
programs that you used
(c) All output from the programs in
part (b)
This will put the emphasis on what you think the answers
should be and on your evidence for this. If a reader thinks that your
answers are reasonable, then he or she may or may not want to look at your
actual output and computer programs.
Five problems. Not all parts of problems are of equal weight.
1. Widgets (an industrial product) were manufactured using
widget coats from five different suppliers. Quality scores for 42 widgets
along with the supplier's product name are listed in Table 1.
Table 1: Widget quality by widget coat brand
-----------------------------------------------------------
Acme (n=9): 48 62 46 56 48 68 50 38 62
Zenith (n=8): 74 30 52 50 56 38 54 24
QQ21 (n=8): 52 66 74 78 98 58 52 72
T37 (n=9): 88 90 108 62 88 66 90 72 98
Nadir (n=8): 66 56 56 46 96 42 58 86
(a) Is there a significant variation in widget quality as a
function of the widget coat supplier? Carry out the Kruskal-Wallis test to
find out. Use the large-sample approximation.
(b) Which PAIRS of treatments (i.e., pairs of widget coat
brands) are significantly different, allowing for multiple comparisons?
For each signficantly-different pair, what is the
(multiple-comparison-corrected) P-value? Use the large-sample
approximation for the multiple-comparison method based on pairwise
Wilcoxon rank-sum scores discussed in Section 6.5. What can you say
about the pairwise differences that are not significant by this test?
2. The responses Y to an input X in 20 trials are recorded
in the following table.
Table 2: Responses Y to an input X
----------------------------------------------------
X Y X Y
----------- -----------
1. 21 7 11. 24 26
2. 74 7 12. 70 77
3. 84 716 13. 30 7
4. 56 9 14. 67 29
5. 48 7 15. 92 337
6. 29 116 16. 99 513
7. 61 34 17. 45 632
8. 79 21 18. 81 128
9. 96 153 19. 37 30
10. 93 95 20. 71 550
(i) What is the Pearson correlation coefficient rho between X
and Y for the data in the table? Are the variables Y and X significantly
correlated as measured by rho? What is the (two-sided) P-value?
(ii) What is the Spearman correlation coefficient R? Are the
variables Y and X significantly correlated as measured by R? What is the
(two-sided) P-value? Find two-sided P-values using both the appropriate
table in the text and the large-sample approximation (see
Section 8.5 in the text). Don't forget the appropriate tie
corrections. Bracket P-values from the table if need be.
(iii) What is the Kendall correlation coefficient tau? Are the
variables Y and X significantly correlated as measured by the Kendall
test? What is the (two-sided) P-value? Find two-sided P-values using both
the appropriate table in the text and the large-sample approximation.
Don't forget appropriate tie corrections. Bracket P-values from the table
if need be.
3. Measurements of responses to stress were measure for four
different brands of product under 5 different conditions of stress. Two
different measurements were made for each combination of brand and level of
stress (see Table 3).
Table 3: Responses under Stress for Four Brands of Products
--------------------------------------------------------------------
Stress Brand1 Brand2 Brand3 Brand4
0# 3.01, 3.04 3.47, 3.10 3.85, 3.87 3.41, 3.11
1# 2.85, 2.51 3.49, 3.45 3.64, 3.19 3.02, 3.33
2# 2.62, 2.60 3.11, 2.88 3.52, 3.49 3.08, 3.11
3# 2.63, 2.64 2.83, 3.15 3.21, 3.65 2.96, 2.97
4# 2.58, 2.60 3.12, 2.71 3.28, 3.25 2.67, 3.12
(i) Using the brand as blocks, is there a significant effect due to
the amount of stress? Use the large-sample approximation for the
nonparametric test described in Section 7.9 to find out. If there is
a significant effect due to stress, what does it appear to be due to?
Which particular levels of stress appear to be associated with ususually
small or large responses?
(ii) Using the level of stress as blocks, is there a significant
variation in the response effect over the four brands? Use the same
procedure to find out. If there is significant variation with brand, which
brands appear to be associated with unusually large or small responses?
4. Consider the paired (X,Y) data in Table 2.
(i) Find the coefficients beta and mu in the least-squares
regression line Y_i=beta*X_i+mu. What is the P-value for H_0:beta=0,
assuming that the data (X_i,Y_i) are normal, using Student-t methods?
(ii) Find the coefficients beta and mu in the regression line
Y_i=beta*X_i+mu using Theil's nonparametric procedure (see
Section 9.2 in the text). Given beta from Theil's method, estimate
the intercept mu as either the median of the n=20 residuals Y_i-beta*X_i
or else as the median of the n=210 Walsh sums of the 20 residuals. Find
the P-value for H_0:beta=0 using the large-sample approximation described
in Section 9.1.
(iii) Compare the two regression lines in parts (i)
and (ii) by computing (A) the sum of the absolute value of the
errors S_1=Sum(i=1,n) |Y_i-beta*X_i-mu| and (B) the sum of the squares
of the errors S_2=Sum(i=1,n) (Y_i-beta*X_i-mu)^2 in both cases. Which of
the two regression lines does better under criterion (A)? under
criterion (B)? By what amounts?
5. Consider the paired (X,Y) data in Table 2.
(i) Find the coefficients beta and mu using the rank regression
method discussed in Section 9.6 in the text. Given the slope beta,
estimate the intercept mu as either the median of the n=20 residuals
Y_i-beta*X_i or else as the median of the n=210 Walsh sums of the 20
residuals.
(ii) Compare the regression line in part (i) with the two
regression lines in Problem 4. How does it compare using
criterion (A)? Using criterion (B)?
(Hint: See the program RankRegress
on the Math408
Web site.)
Top of this page