Math 408 Homework 4 - Spring 2009

HOMEWORK #4 due Tuesday April 7

Text references are to Hollander and Wolfe,
``Nonparametric Statistical Methods'', 2nd ed.

NOTES:
(1) Whenever you are asked to test a hypothesis, state the P-value, whether the P-value is for a one-sided or two-sided test if appropriate (that is, if the statistic has a large-sample normal approximation), and whether you accept or reject H_0.

(2) If you use MATLAB to do a problem, include (hard copy of) your MATLAB output AND your MATLAB program in an APPENDIX to your homework. That is, do not mix together the answers to the questions and your computer output. In that way, for problems in which you used MATLAB, your answers become an ``executive summary'' that gives your conclusions, and interested parties can then look or not look at your actual MATLAB code and output to get more information or to see what happened if you get a wrong answer.

(3) In the following, ^ means superscript, _ (underscore) means subscript, and Sum(i=1,9) means the sum for i=1 to 9.

1. A local agricultural company is interested in selling one or more of four new types of lamb chow (food) that were developed in the company's research division. Weight gains for yearling lambs on the four new lamb chows (labeled Ch01,Ch02,Ch03,Ch04) and on a standard lamb chow (Chstd) are given in Table 1.

     Table 1: Lamb weight gains for five lamb chows
    -------------------------------------------------------
    Chstd:   58   68   28   14  150   98  138   78  124   84 
    Ch01:   148  176   90   52  132   32  128   32           
    Ch02:   168  218  158  238   72  100  192                
    Ch03:    44  206  132   12  108  148  156  182   68   70 
    Ch04:    92  150  124  136  180  128  132  216  168  220 
    -------------------------------------------------------

The sample medians of the weights of the lambs in the five groups in Table 1 are significantly different (Kruskal-Wallis test, P=0.018, large-sample approximation, 4 degrees of freedom).

(i) Which PAIRS of chows are significantly different in Table 1, NOT allowing for multiple comparisons? Use the Wilcoxon rank sum test to compare each pair of chows. Which pairs of chows are significantly different? Which pairs are highly significantly different? What are the (two-sided) P-values for the pairs that are significantly different?

(ii) Which pairs of chows are significantly different in Table 1, ALLOWING FOR multiple comparisons? Which are highly significantly different? Use the multiple-comparison-corrected comparison procedure for all pairs based on pairwise Wilcoxon Rank-Sum statistics discussed in OneWayMultComp.m on the Math408 Web site.

(iii) Now using the standard diet (Chstd) as a control, find multiple-comparison-corrected P-values that weight gains for Ch01,Ch02,Ch03,Ch04 are significantly GREATER than those for Chstd (with one-sided P-values). Which are significantly greater, correcting for multiple comparisons with a control? Which are highly significantly greater? Use the pairwise-Wilcoxon-rank-sum method of Steel (1959) discussed in Comment 76 at the end of Sections 6.7 in the text. (See also the discussion in OneWayMCCtrl.m on the Math408 Web site.)

On the basis of these multiple-comparison-corrected P-values, which (if any) of the lamb chows are significantly better than the standard Chstd? What are their P-values?

2. Table 8.6 in the text (page 380) has the following data:

    Table 2: Spending per High-School Senior by State in 1987-1988
  -----------------------------------
    4462   4164   3093   4789   3919   4457   5201   4369   2718   5329
    3368   5051   3249   4149   3623   4989   3068   4246   7151   3786
    6230   5207   4386   4747   3138   5017   3943   3434   2548   4092
    2454   3744   3408   4692   3011   4246   3608   3998   3519   4076
    2667   3858   6564   7971   5471   4124   3691   3794   2989   3840

Use the data from Table 2 to find a 95% confidence band for the true distribution function F_Y(t)=P(Y<=t). That is, find increasing functions F_1(t),F_2(t) such that

0 <= F_1(t) <= Fhat_Y(t) <= F_2(t) <= 1

where Fhat_Y(t) is the empirical distribution function of Y determined by the values in Table 2 and

Prob(F_1(t) < F_Y(t) < F_2(t) for every t) >= 0.95

Draw a sketch of the three functions F_1(t),Fhat_Y(t),F_2(t) on the same graph.

(Hint:: See Section 11.5, p526-528, in the text.)

3. A series of tests emphasizing dexterity in high places was carried out on cats, rats, and rabbits. The times taken in seconds to complete each of 14 different tasks are given in Table 3.

    Table 3: Times taken to do 14 different tasks for animals from three species

 Task:   1    2    3    4    5    6    7    8    9    10   11   12   13   14
 ----------------------------------------------------------------------------
 Cats    0.3  1.0  3.6  0.1  0.6  5.5  1.0  3.7  3.1  1.1  2.0  1.6  4.3  1.0
 Rats    1.5  1.1  1.8  1.3  4.3  2.0  8.4  3.7  6.6  1.1  4.0  6.5  2.6  6.5
 Rabbts  1.7  1.5  8.1  1.3  4.3  4.6  4.0  3.7  5.1  2.5  6.0  6.9  2.5  6.8

(a) Is there a significant variation in task times among the three species? Carry out the Friedman test to find out. Find P-values using the large-sample chi-square approximation for the P-value of the Friedman statistic. Don't forget to include the tie correction.

(b) Is there a significant trend for increasing task times in the order cats, rats, rabbits, which some people would say is their order in terms of increasing clumsiness? Carry out the Page test to find out. Find one-sided P-values using both (i) Table A23 in the text and (ii) the large-sample approximation for the P-value based on the Page statistic. How do the two P-values compare? If you cannot find the exact P-value using the table, then find upper and lower bounds.

4. In a test of perceptions of color, a picture with ambiguous colors was shown to 12 subjects, who were asked if they saw various colors in the pictures. None of the 12 subjects showed evidence of red-green or blue color blindness. The results were scored as 1 for Yes and 0 for No. The picture was designed to have attributes of all six colors. The results are in Table 4 below.

    Table 4: Perceptions of colors by 12 subjects

  Subject:   1   2   3   4   5   6   7   8   9  10  11  12
  --------------------------------------------------------
  Red:       1   1   1   0   0   1   1   1   1   0   1   1
  Green:     1   0   0   0   1   0   1   0   0   0   0   1
  Blue:      1   1   1   1   0   1   1   0   1   0   1   1
  Yellow:    0   0   0   1   0   0   1   0   0   0   0   1
  Pink:      1   1   1   1   0   1   1   1   1   0   0   0
  Orange:    1   0   1   0   0   1   1   0   1   0   1   1

Do some colors tend to stand out more to these subjects than other colors, controlling for subject effects?

(i) Carry out the Cochran test to find out. (See the comments in Cochran.m on the Math408 Web site for a discussion of Cochran's test statistic Q. Use the large-sample approximation for Q.) Recall that Cochran's test statistic is exactly the same as Friedman's test statistic S' with tie correction for 0,1 data.

(ii) Carry out Friedman's test WITHOUT the tie correction; i.e. using the statistic S in equation (7.5) on page 273 in the text instead of S'. Recall that S'=Q in part (i). How do the P-values compare? If the two approximate P-values are significantly different, which do you think is more reliable?

5. In a study of pollution in Lake Michigan, the number of ``odor periods'' was observed for each of the years 1950-1964. The numbers of days are in Table 5.

    Table 5: Numbers of bad periods in Lake Michigan (1950-1964)
  ----------------------------------------------------------------
  (1950, 10)  (1951, 20)  (1952, 17)  (1953, 16)  (1954, 12)
  (1955, 15)  (1956, 13)  (1957, 18)  (1958, 17)  (1959, 19)
  (1960, 21)  (1961, 23)  (1962, 23)  (1963, 28)  (1964, 28)

(a) Find the Kendall correlation coefficient tau for year versus the number of bad periods. Is it larger or smaller than the Pearson correlation coefficient rho? What is the value of rho? Show your calculations (or write a MATLAB program).

(b) Are the number of bad periods increasing over time? Carry out the Kendall test (Section 8.1, p363 in the text) for an increasing relation between year and numbers of bad periods to find out. Find two-sided P-values using both (i) Table A.30 in the text and (ii) the large-sample approximation with tie corrections. How do the two P-values compare? (Note: You may be only able to bracket the P-value using Table A.30, for example 0.01<P<0.05 or P<0.001. Do the best that you can.)

(Remark: The data is from Table 8.8 on page 381 in the text. See Problem 19 on page 381 for more background on this data set.)

Top of this page