HOMEWORK #3 due Tuesday March 27
Text references are to Hollander and Wolfe, ``Nonparametric Statistical
Methods'', 2nd ed.
IN THE FOLLOWING: Do Problems 1, 5, and 6 by hand. Problems 2
and 4 require you to write a computer program. Problem 3 can be
done either entirely by hand or partially by using a computer program.
NOTES: Hand in your homework in the order
(a) Your written answers to all problems,
with references as needed to part (c) below,
(b) The computer source for any computer
programs that you used
(c) All output from the programs in
part (b)
This will put the emphasis on what you think the answers
should be and on your evidence for this. If a reader thinks that your
answers are reasonable, then he or she may or may not want to look at your
actual output and computer programs.
1. (Like Problem 33 on page 226 of the text.) Three
replications each of bacterial platings for each of six concentrations of
a mutagen led to the data in Table 1 (see Table 6.10 in the
text) where mcg stands for micrograms per milliliter. The values at 0mcg
correspond to the natural state of the organism. The low values at high
concentrations of the mutagen is believed to be due to the toxic effects
of the mutagen.
Table 1: Number of Mutant TA98 Salmonella Colonies
under Exposure to Various Levels of Acid Red 114
Dose: 0mcg 100mcg 333mcg 1000mcg 3333mcg 10000mcg
--------------------------------------------------------------------
22 60 98 60 22 23
23 59 78 82 44 21
35 54 50 59 33 25
--------------------------------------------------------------------
(i) Find one-sided P-values for the hypothesis that the numbers
of colonies increases monotonically to 333mcg and then monotonically
decreases for higher concentration. Use both the exact critical values in
Table A.14 to bracket the P-values and also the large-sample normal
approximation.
(ii) Carry out the same procedures for a maximum at 1000mcg.
(Hints: First find the 6x6 table of Mann-Whitney differences
among the 6 samples and calculate A_3 and A_4 from the table. Note that
the critical values in Table A14 are the same for p and
k+1-p. For example, p=1 and p=k have the same critical values as
the Jonckheere-Terpstra test.)
2. (Similar to Problem 41 on page 234.) Find the one-sided
P-value for the data in Table 1 for the hypothesis of a maximum at an
unknown concentration. Use the Chen-Wolfe procedure discussed in
Comment 45 on page 233 of the text. Write a computer program to
estimate the exact P-value for the hypothesis along with a 95% confidence
interval for the true P-value. Is the P-value comparable to the P-values
that you obtained in Problem 1? (Hint: See
UmbrellaTests.c
and UmbrellaTests.txt
on the
Ma408 Web site.)
3. A local agricultural company is interested in selling one
or more of four new types of lamb chow (food) that were developed in the
company's research division. Weight gains for yearling lambs on the four
new lamb chows (labeled Ch01,Ch02,Ch03,Ch04) and on a standard lamb chow
(Chstd) are given in Table 2.
Table 2: Lamb weight gains for five lamb chows
-------------------------------------------------------
Chstd: 58 68 28 14 150 98 138 78 124 84
-------------------------------------------------------
Ch01: 148 176 90 52 132 32 128 32
Ch02: 168 218 158 238 72 100 192
Ch03: 44 206 132 12 108 148 156 182 68 70
Ch04: 92 150 124 136 180 128 132 216 168 220
-------------------------------------------------------
The sample medians of the weights of the lambs in the five groups in
Table 2 are significantly different (Kruskal-Wallis test, P=0.018,
large-sample approximation, 4 degrees of freedom).
(i) Which PAIRS of chows are significantly different in
Table 2, NOT allowing for multiple comparisons? Use the Wilcoxon rank
sum test to compare each pair of chows. Which pairs of chows are
significantly different? Which pairs are highly significantly different?
What are the (two-sided) P-values for the pairs that are significantly
different?
(ii) Which pairs of chows are significantly different in
Table 2, ALLOWING FOR multiple comparisons? Which are highly
significantly different? Use the multiple-comparison corrected procedure
based on Wilcoxon rank sum scores discussed in Section 6.5 in the
text. Find P-values using the large-sample approximation based on the
normal range statistic that is discussed in the text (and whose critical
values are in Table A.17).
(iii) Now using the standard diet (Chstd) as a control, find
multiple-comparison-corrected P-values that weight gains for
Ch01,Ch02,Ch03,Ch04 are significantly GREATER than those for Chstd (with
one-sided P-values). Which are significantly greater, correcting for
multiple comparisons with a control? Which are highly significantly
greater? Use the pairwise-Wilcoxon-rank-sum method of Steel (1959)
discussed in Comment 76 at the end of Sections 6.7 in the text.
Specifically, if Z_{1i} is the normalized Wilcoxon rank-sum score
between Ch0i and Chstd in Table 2 (that is,
Z_{1i}=W_{1i}^*/sqrt(2) for W_{ij}^* in Section 6.5), the theory in
Section 6.5 says that approximate multiple-comparison-corrected
one-sided P-values values can be found using the statistic
Xmax=max(i=1,k-1)X_i, where X_i are k-1 standard normal random variables
with a common correlation coefficient of 0.50 (k=5). (That is, the
P-value for Z_{1i}=A_{obs} is P(Xmax>=A_{obs}).) Use Table A.21
with rho=0.50 and (ell)=k-1 to find multiple-comparison-corrected
one-sided P-values for each of the new lamb chows.
On the basis of these multiple-comparison-corrected P-values, which
(if any) of the lamb chows are significantly better than the
standard Chstd? What are their P-values?
(Hint: You can write a computer program to calculate and
manipulate the pairwise Wilcoxon scores, but you will have to either
compare numbers in the computer output with Tables A.17 and A.22 by
hand or else include constants from those tables in your computer program.
See OneWayMultComp.c
on the Math408 Web site.)
4. Consider the data in Table 3:
Table 3: Two samples of numbers
-----------------------------------
Sample 1
3.40 3.94 6.30 5.85 3.75 9.19 9.20 7.02
Sample 2
5.83 10.55 9.30 7.04 6.13 11.73 6.47 15.47
11.49 13.69 8.27 5.02 10.20 13.08 9.13 7.39
Sample 1 has 8 observations, sample mean Xbar=6.081, and sample standard
deviation s(X)=2.317. Sample 2 has 16 observations, Ybar=9.624, and
s(Y)=3.085.
(i) (1/8) Is there is a significant difference in location or
sample median between the two samples? Use the Wilcoxon rank-sum test to
find out. What is the (two-sided) P-value? Use either the tables in the
back of the book or the large-sample approximation, as you prefer.
(ii) (3/8) Do the two samples in Table 3 come from the same
probability distribution? Apply the Kolmogorov-Smirnov test to find out.
What is the (two-sided) P-value? Find the P-value using both the exact
tables in the back of the book and using the large-sample approximation.
Are these two P-values similar? How do these P-values compare with the
P-value that you obtained in part (i)?
(Hint: The results in parts (i,ii) are not unusual for
samples that differ principally by a sample mean or median.)
(iii) (1/2) Write a computer program to estimate the exact
two-sided Kolmogorov-Smirnov P-value for the data in Table 3 using
100,000 permutations. Also, find the 95% confidence interval for the true
P-value. Does the 95% confidence interval that you found in part (ii)
contain the exact P-value that you found in Problem 2? the
large-sample approximate P-value?
(Hint: See the program KolmgSmirnv.c
on the
Math408 Web site for sample C code. If you want, you can include C code
that does part (i) in your computer program.)
5. Use the data from Sample 2 in Table 3 to find a 95%
confidence band for the true distribution function F_Y(t)=P(Y<=t) for
the second sample in Table 3. That is, find increasing functions
F_1(t),F_2(t) such that
0 <= F_1(t) <= Fhat_Y(t) <= F_2(t) <= 1
where Fhat_Y(t) is the empirical distribution function of Y determined by
the Y-values in Table 3.2 and
Prob(F_1(t) < F_Y(t) < F_2(t) for every t) >= 0.95
Draw a sketch of the three functions F_1(t),Fhat_Y(t),F_2(t) on the same
graph.
(Hint:: See Section 11.5, p526-528, in the text.)
6. A series of tests emphasizing dexterity in high places
was carried out on cats, rats, and rabbits. The times taken in seconds
to complete each of 14 different tasks are given in Table 2.
Table 4: Times taken to do 14 different tasks for animals from three species
Task: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
----------------------------------------------------------------------------
Cats 0.3 1.0 3.6 0.1 0.6 5.5 1.0 3.7 3.1 1.1 2.0 1.6 4.3 1.0
Rats 1.5 1.1 1.8 1.3 4.3 2.0 8.4 3.7 6.6 1.1 4.0 6.5 2.6 6.5
Rabbts 1.7 1.5 8.1 1.3 4.3 4.6 4.0 3.7 5.1 2.5 6.0 6.9 2.5 6.8
(a) Is there a significant variation in task times among the
three species? Carry out the Friedman test to find out. Find P-values
using the large-sample chi-square approximation for the P-value of the
Friedman statistic. Don't forget to include the tie correction.
(b) Is there a significant trend for increasing task times in
the order cats, rats, rabbits, which some people would say is their order
in terms of increasing clumsiness? Carry out the Page test to find
out. Find one-sided P-values using both (i) Table A23 in the
text and (ii) the large-sample approximation for the P-value based on
the Page statistic. How do the two P-values compare? If you cannot find
the exact P-value using the table, then find upper and lower bounds.
Top of this page