Math 408 Homework 2 - Spring 2007

HOMEWORK #2 due Tuesday Feb 27

Text references are to Hollander and Wolfe,
``Nonparametric Statistical Methods'', 2nd ed.

IN THE FOLLOWING: Do Problem 1 by hand. Problems 3 and 6 require you to write a computer program. Problems 2, 4, and 5 can be done either by hand or by a computer program or programs. If you chose, you could do Problems 2-5 in one computer program.

NOTES: Hand in your homework in the order
(a) Your written answers to all problems, with references as needed to part (c) below,
(b) The computer source for any computer programs that you used
(c) All output from the programs in part (b)
This will put the emphasis on what you think the answers should be and on your evidence for this. If a reader thinks that your answers are reasonable, then he or she may or may not want to look at your actual output and computer programs.

1. Grades on a college board test are collected for 10 students before and after a college board cram course. The grades are

  TABLE 1: Scores before and after a college board prep course
  -------------------------------------------------------------------
  Student:   #1    #2    #3    #4    #5    #6    #7    #8    #9   #10
  Before:    15    19    35    43    47    50    57    37    56    41
  After:     10    33    43    57    58    48    61    49    58    55

(i) Is there a significant improvement in the students' scores, controlling for student-to-student variation? That is, use the Wilcoxon signed rank test for the after-minus-before differences for each student to test the hypothesis that there is no difference in the before and after scores. Give two-sided P-values using EITHER the normal approximation, using the tie correction if appropriate, OR the exact distribution in Table A4 for the Wilcoxon signed rank statistic, using interpolation in the table if appropriate.

Suppose instead that the data in the Table 1 were from two different sets of 10 students (that is, from 20 students, 10 of which attended the cram course and 10 of which did not) rather than from 10 individual students.

(ii) Is there a significant improvement in the students' scores, NOT controlling for student-to-student variation? That is, use the Wilcoxon RANK SUM (or Mann-Whitney) test on the two sets of scores to test the hypothesis that there is no difference in the before and after scores. Give two-sided P-values using BOTH the normal approximation and the exact distribution of the Wilcoxon rank sum statistic in Table A6. Use the tie correction for the normal approximation if appropriate.

(iii) Why do the P-values in parts (i) and (ii) differ? Could this be related to the fact that the before-and-after scores are correlated over the students? Calculate the Pearson correlation coefficient for the before and after scores for the 10 original students. Is this correlation coefficient statistically significant, using the classical T-test for the Pearson correlation coefficient, assuming normal distribution for the samples?
(Note: See the text (equation (8.78), page 398) for a formula for the Pearson correlation coefficient rho=rho(X,Y). The ``classical t-test for rho'' is based on the fact that if X_i,Y_j are independent normal, then T=rho*(Sqrt((n-2)/(1-rho^2))) is t(n-2), where * means multiplication.)

2. Consider the ``Karate Kid'' data in Table 4.4 on page 124 of the text. These data give the lengths of time that kids who were supposedly baby-sitting two younger children spent before calling an adult after their two younger charges supposedly became violent. A control group of 21 kids (baby-sitters) had watched non-violent excerpts from the 1984 Summer Olympics while a test group of 21 kids (baby-sitters) had watched a violent TV program. The experimenters' hypotheses was that the baby-sitters who had watched the violent TV program would take longer to call an adult.

(i) Is there is significant difference in location (or time) between the two samples? Use the Wilcoxon rank-sum test to find out. Use the normal approximation with tie correction to find a two-sided P-value.

(ii) Find the Hodges-Lehmann Wilcoxon-rank-sum-like estimate of the difference in medians. How does this compare with the difference in sample means? Does the Hodges-Lehmann procedure appear to control better for outliers? (This problem is possible to do by hand, but could also be done using a computer program. See for example the program Twosamps_ranks.c on the Math408 Web site.)

3. How accurate is the normal-approximation in part (i)? Write a computer program to simulate the true 2-sided Wilcoxon P-value by using N=100,000 random permutations of the m+n=21+21=42 midranks. Find a symmetric 95% confidence interval for the true 2-sided Wilcoxon P-value. Does this contain the normal-approximation 2-sided P-value from part (i)?
(Hint: You can use the method used in the program RanksSims.c on the Math408 Web site to find the midranks and also to simulate the true P-value. If you adapt the program RanksSims.c, don't forget to delete the parts of the program that relate only to the Wilcoxon signed-rank test. It may be more convenient to do problems 2 and 3 together in the same computer program.)

4. Soybean plants were grown in 32 pots located on 4 different heavy laboratory tables. Plants with higher lab-table numbers were exposed to somewhat more light. The weights of the soybean plants in grams in the four groups after two weeks are given in Table 2.

    TABLE 2 -- Weights of Soybean Plants after Two Weeks
 ----------------------------------------------------------
 LabTable #1 -   136   96  122   60   40   42   52   20
 LabTable #2 -    74   52  152   76   12  170  128   82
 LabTable #3 -   126  106   94  120   82   84   94  124
 LabTable #4 -   102  168  220  126  196   84  166  140

Is there a significant variation in the sample medians of the soybean weights in the table? Carry out the Kruskal-Wallis test to find out. Use the large-sample approximation with tie correction.

5. Do the soybean weights in Table 2 vary by sample by either monotonically increasing or monotonically decreasing with the lab-table number? (That is, with a different alternative hypothesis than in Problem 4.) Carry out the Jonckheere-Terpstra test to find out. Use the large-sample approximation with tie correction.

6. A previous edition of the textbook had data about the amount of drying during storage of 14 similar items that were prepared for storage using 5 different methods:

   TABLE 3 -- Percentage of Drying After Storage
 -------------------------------------------------
 Method #1 -   7.8   8.3   7.6   8.4   8.3
 Method #2 -   5.4   7.4   7.1
 Method #3 -   8.1   6.4
 Method #4 -   7.9   9.5  10.0
 Method #5 -   7.1

(a) Is there a significant difference among the storage methods with regards to the percentage of drying? Carry out the Kruskal-Wallis test to find out. Use the large-sample approximation with tie correction.

(b) Some might argue that a data set with 14 observations distributed over 5 treatment groups, with only one observation in one of the treatment groups, might not be a good candidate for a large-sample approximation that assumes that all samples sizes are arbitrarily large.

Use n=100,000 random permutations of the data among the 14 places in the 5 treatment groups to estimate the exact Kruskal-Wallis P-value, and give a 95% confidence interval for your estimate of the exact P-value. Does this procedure find that the differences are significant? Are your conclusions different than in part (a)? (Hint: See the program OneWayLayout.c on the Math408 Web site.)

Top of this page