Math 408 Homework 3 - Spring 2007

HOMEWORK #3 due Tuesday March 27

Text references are to Hollander and Wolfe, ``Nonparametric Statistical Methods'', 2nd ed.

IN THE FOLLOWING: Do Problems 1, 5, and 6 by hand. Problems 2 and 4 require you to write a computer program. Problem 3 can be done either entirely by hand or partially by using a computer program.

NOTES: Hand in your homework in the order
(a) Your written answers to all problems, with references as needed to part (c) below,
(b) The computer source for any computer programs that you used
(c) All output from the programs in part (b)
This will put the emphasis on what you think the answers should be and on your evidence for this. If a reader thinks that your answers are reasonable, then he or she may or may not want to look at your actual output and computer programs.

1. (Like Problem 33 on page 226 of the text.) Three replications each of bacterial platings for each of six concentrations of a mutagen led to the data in Table 1 (see Table 6.10 in the text) where mcg stands for micrograms per milliliter. The values at 0mcg correspond to the natural state of the organism. The low values at high concentrations of the mutagen is believed to be due to the toxic effects of the mutagen.

      Table 1: Number of Mutant TA98 Salmonella Colonies
       under Exposure to Various Levels of Acid Red 114

Dose:    0mcg     100mcg    333mcg    1000mcg    3333mcg    10000mcg
--------------------------------------------------------------------
          22       60        98        60         22         23
          23       59        78        82         44         21
          35       54        50        59         33         25
--------------------------------------------------------------------

(i) Find one-sided P-values for the hypothesis that the numbers of colonies increases monotonically to 333mcg and then monotonically decreases for higher concentration. Use both the exact critical values in Table A.14 to bracket the P-values and also the large-sample normal approximation.

(ii) Carry out the same procedures for a maximum at 1000mcg.

(Hints: First find the 6x6 table of Mann-Whitney differences among the 6 samples and calculate A_3 and A_4 from the table. Note that the critical values in Table A14 are the same for p and k+1-p. For example, p=1 and p=k have the same critical values as the Jonckheere-Terpstra test.)

2. (Similar to Problem 41 on page 234.) Find the one-sided P-value for the data in Table 1 for the hypothesis of a maximum at an unknown concentration. Use the Chen-Wolfe procedure discussed in Comment 45 on page 233 of the text. Write a computer program to estimate the exact P-value for the hypothesis along with a 95% confidence interval for the true P-value. Is the P-value comparable to the P-values that you obtained in Problem 1? (Hint: See UmbrellaTests.c and UmbrellaTests.txt on the Ma408 Web site.)

3. A local agricultural company is interested in selling one or more of four new types of lamb chow (food) that were developed in the company's research division. Weight gains for yearling lambs on the four new lamb chows (labeled Ch01,Ch02,Ch03,Ch04) and on a standard lamb chow (Chstd) are given in Table 2.

     Table 2: Lamb weight gains for five lamb chows
    -------------------------------------------------------
    Chstd:   58   68   28   14  150   98  138   78  124   84 
    -------------------------------------------------------
    Ch01:   148  176   90   52  132   32  128   32           
    Ch02:   168  218  158  238   72  100  192                
    Ch03:    44  206  132   12  108  148  156  182   68   70 
    Ch04:    92  150  124  136  180  128  132  216  168  220 
    -------------------------------------------------------

The sample medians of the weights of the lambs in the five groups in Table 2 are significantly different (Kruskal-Wallis test, P=0.018, large-sample approximation, 4 degrees of freedom).

(i) Which PAIRS of chows are significantly different in Table 2, NOT allowing for multiple comparisons? Use the Wilcoxon rank sum test to compare each pair of chows. Which pairs of chows are significantly different? Which pairs are highly significantly different? What are the (two-sided) P-values for the pairs that are significantly different?

(ii) Which pairs of chows are significantly different in Table 2, ALLOWING FOR multiple comparisons? Which are highly significantly different? Use the multiple-comparison corrected procedure based on Wilcoxon rank sum scores discussed in Section 6.5 in the text. Find P-values using the large-sample approximation based on the normal range statistic that is discussed in the text (and whose critical values are in Table A.17).

(iii) Now using the standard diet (Chstd) as a control, find multiple-comparison-corrected P-values that weight gains for Ch01,Ch02,Ch03,Ch04 are significantly GREATER than those for Chstd (with one-sided P-values). Which are significantly greater, correcting for multiple comparisons with a control? Which are highly significantly greater? Use the pairwise-Wilcoxon-rank-sum method of Steel (1959) discussed in Comment 76 at the end of Sections 6.7 in the text.

Specifically, if Z_{1i} is the normalized Wilcoxon rank-sum score between Ch0i and Chstd in Table 2 (that is, Z_{1i}=W_{1i}^*/sqrt(2) for W_{ij}^* in Section 6.5), the theory in Section 6.5 says that approximate multiple-comparison-corrected one-sided P-values values can be found using the statistic Xmax=max(i=1,k-1)X_i, where X_i are k-1 standard normal random variables with a common correlation coefficient of 0.50 (k=5). (That is, the P-value for Z_{1i}=A_{obs} is P(Xmax>=A_{obs}).) Use Table A.21 with rho=0.50 and (ell)=k-1 to find multiple-comparison-corrected one-sided P-values for each of the new lamb chows.

On the basis of these multiple-comparison-corrected P-values, which (if any) of the lamb chows are significantly better than the standard Chstd? What are their P-values?

(Hint: You can write a computer program to calculate and manipulate the pairwise Wilcoxon scores, but you will have to either compare numbers in the computer output with Tables A.17 and A.22 by hand or else include constants from those tables in your computer program. See OneWayMultComp.c on the Math408 Web site.)

4. Consider the data in Table 3:

    Table 3: Two samples of numbers
  -----------------------------------
  Sample 1
    3.40   3.94  6.30  5.85   3.75   9.19  9.20   7.02
  Sample 2
    5.83  10.55  9.30  7.04   6.13  11.73  6.47  15.47
   11.49  13.69  8.27  5.02  10.20  13.08  9.13   7.39

Sample 1 has 8 observations, sample mean Xbar=6.081, and sample standard deviation s(X)=2.317. Sample 2 has 16 observations, Ybar=9.624, and s(Y)=3.085.

(i) (1/8) Is there is a significant difference in location or sample median between the two samples? Use the Wilcoxon rank-sum test to find out. What is the (two-sided) P-value? Use either the tables in the back of the book or the large-sample approximation, as you prefer.

(ii) (3/8) Do the two samples in Table 3 come from the same probability distribution? Apply the Kolmogorov-Smirnov test to find out. What is the (two-sided) P-value? Find the P-value using both the exact tables in the back of the book and using the large-sample approximation. Are these two P-values similar? How do these P-values compare with the P-value that you obtained in part (i)?

(Hint: The results in parts (i,ii) are not unusual for samples that differ principally by a sample mean or median.)

(iii) (1/2) Write a computer program to estimate the exact two-sided Kolmogorov-Smirnov P-value for the data in Table 3 using 100,000 permutations. Also, find the 95% confidence interval for the true P-value. Does the 95% confidence interval that you found in part (ii) contain the exact P-value that you found in Problem 2? the large-sample approximate P-value?

(Hint: See the program KolmgSmirnv.c on the Math408 Web site for sample C code. If you want, you can include C code that does part (i) in your computer program.)

5. Use the data from Sample 2 in Table 3 to find a 95% confidence band for the true distribution function F_Y(t)=P(Y<=t) for the second sample in Table 3. That is, find increasing functions F_1(t),F_2(t) such that

0 <= F_1(t) <= Fhat_Y(t) <= F_2(t) <= 1

where Fhat_Y(t) is the empirical distribution function of Y determined by the Y-values in Table 3.2 and

Prob(F_1(t) < F_Y(t) < F_2(t) for every t) >= 0.95

Draw a sketch of the three functions F_1(t),Fhat_Y(t),F_2(t) on the same graph.

(Hint:: See Section 11.5, p526-528, in the text.)

6. A series of tests emphasizing dexterity in high places was carried out on cats, rats, and rabbits. The times taken in seconds to complete each of 14 different tasks are given in Table 2.

    Table 4: Times taken to do 14 different tasks for animals from three species

 Task:   1    2    3    4    5    6    7    8    9    10   11   12   13   14
 ----------------------------------------------------------------------------
 Cats    0.3  1.0  3.6  0.1  0.6  5.5  1.0  3.7  3.1  1.1  2.0  1.6  4.3  1.0
 Rats    1.5  1.1  1.8  1.3  4.3  2.0  8.4  3.7  6.6  1.1  4.0  6.5  2.6  6.5
 Rabbts  1.7  1.5  8.1  1.3  4.3  4.6  4.0  3.7  5.1  2.5  6.0  6.9  2.5  6.8

(a) Is there a significant variation in task times among the three species? Carry out the Friedman test to find out. Find P-values using the large-sample chi-square approximation for the P-value of the Friedman statistic. Don't forget to include the tie correction.

(b) Is there a significant trend for increasing task times in the order cats, rats, rabbits, which some people would say is their order in terms of increasing clumsiness? Carry out the Page test to find out. Find one-sided P-values using both (i) Table A23 in the text and (ii) the large-sample approximation for the P-value based on the Page statistic. How do the two P-values compare? If you cannot find the exact P-value using the table, then find upper and lower bounds.

Top of this page