Click here for Prof. Sawyer's
home page
TAKEHOME FINAL due Wednesday May 11 by 5:30 PM
Text references are to Hollander and Wolfe,
``Nonparametric Statistical Methods'', 2nd ed.
NOTE: In the following, ^ means superscript, _ (underscore) means
subscript, and Sum(i=1,9) means the sum for i=1 to 9.
Seven problems. Not all parts of problems are of equal weight.
1. The responses Y to an input X in 20 trials are recorded
in the following table.
Table 1: Responses Y to an input X
----------------------------------------------------
X Y X Y
----------- -----------
1. 21 7 11. 24 26
2. 74 7 12. 70 77
3. 84 716 13. 30 7
4. 56 9 14. 67 29
5. 48 7 15. 92 337
6. 29 116 16. 99 513
7. 61 34 17. 45 632
8. 79 21 18. 81 128
9. 96 153 19. 37 30
10. 93 95 20. 71 550
(i) What is the Pearson correlation coefficient rho between X
and Y for the data in the table? Are the variables Y and X significantly
correlated as measured by rho? What is the (two-sided) P-value?
(ii) What is the Kendall correlation coefficient tau? Are the
variables Y and X significantly correlated as measured by the Kendall
test? What is the (two-sided) P-value? Find two-sided P-values using both
the appropriate table in the text and the large-sample approximation.
Don't forget appropriate tie corrections. Bracket P-values from the table
if need be.
(iii) What is the Spearman correlation coefficient R? Are the
variables Y and X significantly correlated as measured by R? What is the
(two-sided) P-value? Find two-sided P-values using both the appropriate
table in the text and the large-sample approximation (see
Section 8.5 in the text). Don't forget the appropriate tie
corrections. Bracket P-values from the table if need be.
2. An agricultural company is interested in selling one or
more of four new types of lamb chow (food) that were developed in the
company's research division. Weight gains for yearling lambs on the four
new lamb chows (labeled E1,E2,E3,E4) and on a standard lamb chow (Estd)
are given in Table 2.
Table 2: Lamb weight gains for five lamb chows
-------------------------------------------------------
Estd: 58 68 28 14 150 98 138 78 124 84
-------------------------------------------------------
E1: 148 176 90 52 132 32 128 32
E2: 168 218 158 238 72 100 192
E3: 44 206 132 12 108 148 156 182 68 70
E4: 92 150 124 136 180 128 132 216 168 220
-------------------------------------------------------
(i) Is there a significant variation in weight gains for the
five different lamb chows in Table 2? Use the Kruskal-Wallis test
to find out. Find the P-value using the large-sample approximation,
taking ties into account.
(ii) Using the standard diet (Estd) as a control, find
multiple-comparison-corrected P-values that weight gains for E1,E2,E3,E4
are significantly greater than those for Estd (with one-sided P-values).
Use the pairwise-Wilcoxon-rank-sum method of Steel (1959) discussed at
the end of Sections 6.7 in the text.
Specifically, if Z_{1i} is the normalized Wilcoxon rank-sum score
between Ei and Estd in Table 2 (that is,
Z_{1i}=W_{1i}^*/sqrt(2) for W_{ij}^* in Section 6.5), the theory in
Section 6.5 says that approximate multiple-comparison-corrected
one-sided P-values values can be found using the statistic
Xmax=max(i=1,k-1)X_i, where X_i are k-1 standard normal random variables
with a common correlation coefficient of 0.50 (k=5). (That is,
P(Z_{1i}>=A_{obs}) = P(Xmax>=A_{obs}).) Use Table A21 with
rho=0.50 and (ell)=k-1 to find multiple-comparison-corrected one-sided
P-values for each of the new lamb chows.
On the basis of these multiple-comparison-corrected P-values, which
(if any) of the lamb chows are significantly better than the
standard Estd? What are their P-values?
3. Consider the data on the percent conversion of methyl
glusoside to monovinyl isomers in Table 7.14 (page 316) of the
text.
(i) Is this a balanced incomplete block design? Why? If so,
what are the parameters k, n, s, p, and lambda?
(ii) Test the hypothesis that the pressure (which defines the
treatment groups) has an effect on the percent conversion to monovinyl
isomers. Find P-values using both the appropriate table and the
large-sample approximation. (Note that both the tabled values and the
large-sample approximation require you to know k,n,s,p, and lambda. See
Problem 63 on page 316 of the text for more details about the
data.)
4. Consider the data in Table 7.25 of page 340 of
the text on the amount of luteinizing hormone (LH) in rats that live
either in constant light or else with 14hrs of light alternating with
10hrs of darkness. Five different levels of a luteinizing release factor
(LRF) was also considered. Six rats were studied for each of 5 levels of
LRF and each of two light regimes (constant or alternating), for a total
of 60 rats, in a two-way experimental layout with six observations per
cell.
(i) Use the large-sample approximation for the Mack-Skillings
procedure (Section 7.9) to test whether the light regime has a
significant effect on LH level, controlling for the amount of LRF.
(See Problem 103 on page 339 of the text for more details
about the data.)
(ii) Suppose that you (incorrectly) ignored the blocking due
to LRF levels and considered the data as a one-way layout with 30 rats
in each treatment group and applied either the Kruskal-Wallis or the
Wilcoxon rank-sum test. Compute the corresponding P-value, also using
the large-sample approximation.
(iii) Compare the P-values in parts (i) and (ii). What is
the effect of the blocking? Which P-value would you have more confidence
in? Why?
5. Consider the paired (X,Y) data in Table 1.
(i) Find the coefficients beta and mu in the least-squares
regression line Y_i=beta*X_i+mu. What is the P-value for H_0:beta=0,
assuming that the data (X_i,Y_i) are normal, using Student-t methods?
(ii) Find the coefficients beta and mu in the regression line
Y_i=beta*X_i+mu using Theil's nonparametric procedure (see
Section 9.2 in the text). Given beta from Theil's method, estimate
the intercept mu as the median of the n=20 residuals Y_i-beta*X_i.
Find the P-value for H_0:beta=0 using the method of
Section 9.1. As in Problem 1, find two-sided P-values using
both the appropriate table in the text and the large-sample
approximation. Don't forget appropriate tie corrections. Bracket
P-values from the table if need be. Note that there are no X-X ties in
the data. (This is not difficult to do by hand, but see also the program
NonparmRegr.c on the Math408 Web site.)
(iii) Compare the two regression lines in parts (i)
and (ii) by computing (A) the sum of the absolute value of the
errors S_1=Sum(i=1,n) |Y_i-beta*X_i-mu| and (B) the sum of the squares
of the errors S_2=Sum(i=1,n) (Y_i-beta*X_i-mu)^2 in both cases. Which of
the two regression lines does better under criterion (A)? under
criterion (B)? By what amounts?
6. For the data in Table 1, as in the previous problem,
(i) For the regression line in part (i) of the previous
problem, find the corresponding 95% confidence interval for beta, again
using Student-t methods.
(ii) For the regression line in part (ii) of the previous
problem, use the Hodges-Lehman-like method of Section 9.3 to find
an exact nonparametric confidence interval for the true value of beta.
Choose a coverage probability as close as possible to 95% (and state
it). Use either the appropriate table in the text or the large-sample
approximation.
7. The method that we used in Problem 2 for multiple
comparisons with a control is not the method that is stressed in
Section 6.7 of the book. In fact, the approximation used in
Problem 2 is not very accurate if treatment group sizes are unequal
and 5 or less.
Write a computer program to estimate the true
multiple-comparison-corrected one-sided P-values with a control that
were approximated in Problem 2. Specifically, estimate the
probability that the random value max(j=2,5) Z_{1j} under
permutations is greater than or equal to each of the observed values
of Z_{1i}, using N=100,000 random permutations of the 45 values in
Table 2 preserving that the treatment-group sizes.
Find 95% confidence intervals for each of the true one-sided
P-values. For the P-values that are significant, do any of these
confidence intervals contain the approximate P-values? Do the
approximate P-values in Problem 2 seem like a good approximation of
the true P-values?
(Hints: (i) See the sample program
OneWay3.c on the Math408 Web site. (ii) You may want
to set N=100 in your program until you are sure that it is running
correctly with output in the form that you want.)
Top of this page