Math 408 Homework 3

Text references are to Hollander and Wolfe, ``Nonparametric Statistical Methods'', 2nd ed.

NOTES:
    (1)  Whenever you are asked to test a hypothesis, state the P-value, whether the P-value is for a one-sided or two-sided test if appropriate (that is, if the statistic has a large-sample normal approximation), and whether you accept or reject H_0.

    (2)  If you use MATLAB to do a problem, include (hard copy of) your MATLAB output AND your MATLAB program in an APPENDIX to your homework. That is, do not mix together the answers to the questions and your computer output. In that way, for problems in which you used MATLAB, your answers become an ``executive summary'' that gives your conclusions, and interested parties can then look or not look at your actual MATLAB code and output to get more information or to see what happened if you get a wrong answer.

(3)    In the following, ^ means superscript, _ (underscore) means subscript, and Sum(i=1,9) means the sum for i=1 to 9.

 

1.  Soybean plants were grown in 32 pots located on 4 different heavy laboratory tables. Each table (group) of soybean plants was given a different amount of a particular nutrient. The weights of the soybean plants in grams in the four groups after two weeks are given in Table 2.

    TABLE 1 -- Weights of Soybean Plants after Two Weeks
 ----------------------------------------------------------
 LabTable #1 -   136   96  122   60   40   42   52   20
 LabTable #2 -    74   52  152   76   12  170  128   82
 LabTable #3 -   126  106   94  120   82   84   94  124
 LabTable #4 -   102  168  220  126  196   84  166  140
  

(i)  Is there a significant variation in the sample medians of the soybean weights in the table? Carry out the Kruskal-Wallis test to find out. Use the large-sample approximation with tie correction.

(ii)  The experimenter recalls that the soybean plant treatment groups were, in fact, arranged by distance from the window, so that treatment groups with higher LabTable numbers might have received more light. Do the soybean weights in Table 2 vary by sample by either monotonically increasing or monotonically decreasing with the lab-table number? (That is, with a different alternative hypothesis than in part (i).) Carry out the Jonckheere-Terpstra test to find out. Use the large-sample approximation, either with or without tie correction as you prefer. Is the test significant?

(iii)  How do the two P-values in parts (i) and (ii) compare? If the P-value in part (i) is significant, should the experimenter rethink his conclusion that different amounts of the nutrient has significantly different effects on the growth of the soybean plants? Why?

2.  The following table contains data about the amount of drying during storage of 14 similar items that were prepared for storage using 5 different methods:

   TABLE 2 -- Percentage of Drying After Storage
 -------------------------------------------------
 Method #1 -   7.8   8.3   7.6   8.4   8.3
 Method #2 -   5.4   7.4   7.1
 Method #3 -   8.1   6.4
 Method #4 -   7.9   9.5  10.0
 Method #5 -   7.1
  

(i)  Is there a significant difference among the storage methods with regards to the percentage of drying? Carry out the Kruskal-Wallis test to find out. Use the large-sample approximation with tie correction to find out.

(ii)  A curmudgeon might argue that a data set with 14 observations distributed over 5 treatment groups, with only one observation in one of the treatment groups, might not be a good candidate for a large-sample approximation that assumes that all samples sizes are arbitrarily large.

Use n=10,000 random permutations of the data among the 14 places in the 5 treatment groups to estimate the exact Kruskal-Wallis P-value, and give a 95% confidence interval for your estimate of the exact P-value. Does this procedure find that the differences are significant? Are your conclusions different than in part (i)? (Hint: See the program OneWayLayout.m with output OneWayLayout.txt on the Math408 Web site.)

3.  (See Table 6.10 p226 in the text, and Problem 32 p225 for more biological detail.) Salmonella colonies were grown under six different concentrations of AcidRed114. For each concentration, three colonies and the number of mutant clones were counted (see Table 4). In Table 4, where mug stands for micrograms per milliliter and Mg for milligrams per milliliter, so that 1Mg=1000mug. The values at 0mug correspond to the natural state of the organism. The low values at high concentrations of the mutagen may be due to the toxic effects of AcidRed114, so that fewer colonies survive to be mutant or not.

      TABLE 3: Number of Mutant TA98 Salmonella Colonies
       under Exposure to Various Levels of Acid Red 114
 
Dose:    0mug    100mug    333mug     1Mg       3.3Mg      10Mg
--------------------------------------------------------------------
          22       60        98        60         22         23
          23       59        78        82         44         21
          35       54        50        59         33         25
--------------------------------------------------------------------

(i) Find one-sided P-values for the hypothesis that the numbers of mutant colonies increases monotonically to 333mug and then monotonically decreases for higher concentration. Use both the exact critical values in Table A.14 to bracket the P-values and also the large-sample normal approximation.

(ii) Carry out the same procedures for a maximum at 1Mg (per milliliter).

(Hints: First find the 6x6 table of Mann-Whitney differences among the 6 samples and calculate A_3 and A_4 from the table. Note that the critical values in Table A14 are the same for p and k+1-p.  For example, p=1 and p=k have the same critical values as the Jonckheere-Terpstra test.)

4.  In a study of pollution in Lake Michigan, the number of ``odor periods'' was observed for each of the years 1950-1964. The numbers of days are in Table 4.

    Table 4: Numbers of bad periods in Lake Michigan (1950-1964)
  ----------------------------------------------------------------
  (1950, 10)  (1951, 20)  (1952, 17)  (1953, 16)  (1954, 12)
  (1955, 15)  (1956, 13)  (1957, 18)  (1958, 17)  (1959, 19)
  (1960, 21)  (1961, 23)  (1962, 23)  (1963, 28)  (1964, 28)  

(a)  Find the Kendall correlation coefficient tau for year versus the number of bad periods. Is it larger or smaller than the Pearson correlation coefficient? Show your calculations (or write a MATLAB program).

(b)  Are the number of bad periods increasing over time? Carry out the Kendall test (Section 8.1, p363 in the text) for an increasing relation between year and numbers of bad periods to find out. Find two-sided P-values using both (i) Table A.30 in the text and (ii) the large-sample approximation with tie corrections. How do the two P-values compare? (Note: You may be only able to bracket the P-value using Table A.30, for example 0.01<P<0.05 or P<0.001. Do the best that you can.)

(Remark: The data is from Table 8.8 on page 381 in the text. See Problem 19 on page 381 for more background on this data set.)

5.  In a study concerned with the anatomical and pathological status of the corticospinal and somatosensory tracts and parietal lobes of patients who had cerebral palsy, the investigator was interested in the relationship between brain weights and large fiber (>7.5mu in diameter) counts in the medullary pyramid. The following table gives the mean brain weights (in grams) and medullary pyramid large fiber counts for 11 cerebral palsy subjects.

    Table 5: Mean Brain Weights and Medullary Pyramid Large Fiber Counts for Cerebral Palsy Subjects
  ---------------------------------------------------------------------------------------------------
  subject number         Brain Weight           Pyramidal Large Fiber Count
  ---------------------------------------------------------------------------------------------------
         1              515                    32,500
         2              286                    26,800
         3              469                    11,410
         4              410                    14,850
         5              461                    23,640
         6              436                    23,820
         7              479                    29,840
         8              198                    21,830
         9              389                    24,650
         10             262                    22,500
         11             536                    26,000
------------------------------------------------------------------------------------------------------

 (a)  Test the hypothesis of independence versus the general alternative that brain weight and large fiber count in the medullary pyramid are correlated in subjects who have had cerebral palsy.

(b)  Use the bootstrap method to find a confidence interval for Kendall’s tau with approximate confidence level 0.9. Does it agree with your findings in (a)? (Refer to the sample program NonParmCorr.m on the math408 webpage for matlab programming for bootstrap).