Math 408 Homework 3 - Spring 2009

  • Click here for Math408 homework page

    HOMEWORK #3 due Thursday March 19

    Text references are to Hollander and Wolfe,
      ``Nonparametric Statistical Methods'', 2nd ed.

    NOTES:
        (1)  Whenever you are asked to test a hypothesis, state the P-value, whether the P-value is for a one-sided or two-sided test if appropriate (that is, if the statistic has a large-sample normal approximation), and whether you accept or reject H_0.

        (2)  If you use MATLAB to do a problem, include (hard copy of) your MATLAB output AND your MATLAB program in an APPENDIX to your homework. That is, do not mix together the answers to the questions and your computer output. In that way, for problems in which you used MATLAB, your answers become an ``executive summary'' that gives your conclusions, and interested parties can then look or not look at your actual MATLAB code and output to get more information or to see what happened if you get a wrong answer.

        (3)  In the following, ^ means superscript, _ (underscore) means subscript, and Sum(i=1,9) means the sum for i=1 to 9.

    
    

    1.  Consider the data in Table 1:

        Table 1: Two samples of numbers
      -----------------------------------
      Sample 1 (m=8)
        3.40   3.94  6.30  5.85   3.75   9.19  9.20   6.99
      Sample 2 (n=16)
        5.83  10.55  9.30  7.07   6.13  11.73  6.47  15.47
       11.49  13.69  8.27  5.02  10.20  13.08  9.13   7.39   
       
    Do parts (i) and (ii) by hand (or using a spreadsheet)
    (i)  (1/8) Is there is a significant difference in location or sample median between the two samples? Use the Wilcoxon rank-sum test to find out. What is the (two-sided) P-value? Use either the tables in the back of the book or the large-sample approximation, as you prefer.
    (ii)  (3/8) Do the two samples in Table 1 come from the same probability distribution? Apply the Kolmogorov-Smirnov test to find out. What is the (two-sided) P-value? Find the P-value using both the exact tables in the back of the book and using the large-sample approximation. Are these two P-values similar? How do these P-values compare with the P-value that you obtained in part (i)?
    (Hint: The results in parts (i,ii) are not unusual for samples that differ principally by a sample mean or median.)
    (iii)  (1/2) Write a computer program to estimate the exact two-sided Kolmogorov-Smirnov P-value for the data in Table 3 using 10,000 random permutations. Also, find the 95% confidence interval for the true P-value. Does the 95% confidence interval contain the exact P-value that you found in part (ii)? the large-sample approximate P-value?
    (Hint: See the program KolmSmir2.m with output KolmSmir2.txt on the Math408 Web site. If you like, you can also do parts (i) and (ii) in your MATLAB program as a check on what you did by hand in parts (i) and (ii).)
    
    

    2.  Soybean plants were grown in 32 pots located on 4 different heavy laboratory tables. Each table (group) of soybean plants was given a different amount of a particular nutrient. The weights of the soybean plants in grams in the four groups after two weeks are given in Table 2.

        TABLE 2 -- Weights of Soybean Plants after Two Weeks
     ----------------------------------------------------------
     LabTable #1 -   136   96  122   60   40   42   52   20
     LabTable #2 -    74   52  152   76   12  170  128   82
     LabTable #3 -   126  106   94  120   82   84   94  124
     LabTable #4 -   102  168  220  126  196   84  166  140
      
    (i)  Is there a significant variation in the sample medians of the soybean weights in the table? Carry out the Kruskal-Wallis test to find out. Use the large-sample approximation with tie correction.
    (ii)  The experimenter recalls that the soybean plant treatment groups were, in fact, arranged by distance from the window, so that treatment groups with higher LabTable numbers might have received more light. Do the soybean weights in Table 2 vary by sample by either monotonically increasing or monotonically decreasing with the lab-table number? (That is, with a different alternative hypothesis than in part (i).) Carry out the Jonckheere-Terpstra test to find out. Use the large-sample approximation, either with or without tie correction as you prefer. Is the test significant?
    (iii)  How do the two P-values in parts (i) and (ii) compare? If the P-value in part (i) is significant, should the experimenter rethink his conclusion that different amounts of the nutrient has significantly different effects on the growth of the soybean plants? Why?
    
    

    3.  A previous edition of the textbook had data about the amount of drying during storage of 14 similar items that were prepared for storage using 5 different methods:

       TABLE 3 -- Percentage of Drying After Storage
     -------------------------------------------------
     Method #1 -   7.8   8.3   7.6   8.4   8.3
     Method #2 -   5.4   7.4   7.1
     Method #3 -   8.1   6.4
     Method #4 -   7.9   9.5  10.0
     Method #5 -   7.1
      

    (i)  Is there a significant difference among the storage methods with regards to the percentage of drying? Carry out the Kruskal-Wallis test to find out. Use the large-sample approximation with tie correction to find out.

    (ii)  A curmudgeon might argue that a data set with 14 observations distributed over 5 treatment groups, with only one observation in one of the treatment groups, might not be a good candidate for a large-sample approximation that assumes that all samples sizes are arbitrarily large.
    Use n=10,000 random permutations of the data among the 14 places in the 5 treatment groups to estimate the exact Kruskal-Wallis P-value, and give a 95% confidence interval for your estimate of the exact P-value. Does this procedure find that the differences are significant? Are your conclusions different than in part (i)? (Hint: See the program OneWayLayout.m with output OneWayLayout.txt on the Math408 Web site.)
    
    

    4.  (See Table 6.10 p226 in the text, and Problem 32 p225 for more biological detail.) Salmonella colonies were grown under six different concentrations of AcidRed114. For each concentration, three colonies and the number of mutant clones were counted (see Table 4). In Table 4, where mug stands for micrograms per milliliter and Mg for milligrams per milliliter, so that 1Mg=1000mug. The values at 0mug correspond to the natural state of the organism. The low values at high concentrations of the mutagen may be due to the toxic effects of AcidRed114, so that fewer colonies survive to be mutant or not.

          TABLE 4: Number of Mutant TA98 Salmonella Colonies
           under Exposure to Various Levels of Acid Red 114
    
    Dose:    0mug    100mug    333mug     1Mg       3.3Mg      10Mg
    --------------------------------------------------------------------
              22       60        98        60         22         23
              23       59        78        82         44         21
              35       54        50        59         33         25
    --------------------------------------------------------------------
    (i) Find one-sided P-values for the hypothesis that the numbers of mutant colonies increases monotonically to 333mug and then monotonically decreases for higher concentration. Use both the exact critical values in Table A.14 to bracket the P-values and also the large-sample normal approximation.
    (ii) Carry out the same procedures for a maximum at 1Mg (per milliliter).
    (Hints: First find the 6x6 table of Mann-Whitney differences among the 6 samples and calculate A_3 and A_4 from the table. Note that the critical values in Table A14 are the same for p and k+1-p.  For example, p=1 and p=k have the same critical values as the Jonckheere-Terpstra test.)
    
    

    5.  Find the one-sided P-value for the data in Table 4 for the hypothesis of a maximum at an unknown concentration. Use the Chen-Wolfe procedure discussed in Comment 45 on page 233 of the text. Write a computer program to carry out a permutation procedure with 10,000 permutations to estimate the exact P-value for the test along with a 95% confidence interval for the true P-value. Is the P-value comparable to the P-values that you obtained in Problem 4?   (Hint: See Umbrellas.m and Umbrellas.txt on the Ma408 Web site.)

    
    

  • Top of this page