Math 408 Homework 4 - Spring 2007

  • Click here for Math408 homework page

    HOMEWORK #4 due Tuesday April 17, 2007

    Text references are to Hollander and Wolfe, ``Nonparametric Statistical Methods'', 2nd edn.

    IN THE FOLLOWING:   Do Problems 1, 2, and 5 by hand. Problem 3 asks you to write a computer program. Problems 4 and 6 can be done either entirely by hand or else by using a computer program.

    NOTES:  Hand in your homework in the order
            (a) Your written answers to all problems, with references as needed to part (c) below,
            (b) The computer source for any computer programs that you used
            (c) All output from the programs in part (b)
        This will put the emphasis on what you think the answers should be and on your evidence for this. If a reader thinks that your answers are reasonable, then he or she may or may not want to look at your actual output and computer programs.

    
    

    1.  Two chemists studied the efficiency of conversion as a function of pressure of a compound methyl glucoside to monovinyl isomers. Specifically, they were interested in the percentage of methyl glucoside converted at 5 different pressures in the presence of acetylene. Due to limited laboratory space, they were only able to measure the conversion percentage at 3 different pressures in any one experimental run, but were able to carry out 10 different experiental runs under different choices of pressures. Their data is in Table 7.14 (page 316). See Problem 63 on page 316 for more information about the background of these data.

    (i)  Is their experimental design (in Table 7.14) a balanced incomplete block design? Why? If so, what are the parameters k, n, s, p, and lambda?
    (ii)  Test the hypothesis that the pressure (which defines the treatment groups) has an effect on the percent conversion to monovinyl isomers by using the BIBD test described in Section 7.6 in the text. Find P-values using both the appropriate table and the large-sample approximation. (Note: You may be only able to bracket the P-value using Table A.26, for example 0.01<P<0.05 or P<0.001. Do the best that you can.)
    
    

    2.  In a test of perceptions of color, a picture with ambiguous colors was shown to 12 subjects, who were asked if they saw various colors in the pictures. None of the 12 subjects showed evidence of red-green or blue color blindness. The results were scored as 1 for Yes and 0 for No. The picture was designed to have attributes of all six colors. The results are in Table 1 below.

        Table 1: Perceptions of colors by 12 subjects
    
      Subject:   1   2   3   4   5   6   7   8   9  10  11  12
      --------------------------------------------------------
      Red:       1   1   1   0   0   1   1   1   1   0   1   1
      Green:     1   0   0   0   1   0   1   0   0   0   0   1
      Blue:      1   1   1   1   0   1   1   0   1   0   1   1
      Yellow:    0   0   0   1   0   0   1   0   0   0   0   1
      Pink:      1   1   1   1   0   1   1   1   1   0   0   0
      Orange:    1   0   1   0   0   1   1   0   1   0   1   1 
    Do some colors tend to stand out more to these subjects than other colors, controlling for subject effects?
    (i) Carry out the Cochran test to find out. (See the comments in CochranTest.c for a discussion of Cochran's test statistic Q. Use the large-sample approximation for Q.) Recall that Cochran's test statistic is exactly the same as Friedman's test statistic S' with tie correction for 0,1 data.
    (ii) Carry out Friedman's test WITHOUT the tie correction; i.e. using the statistic S in equation (7.5) on page 273 in the text instead of S'. Recall that S' is equivalent to the statistic Q in part (i). How do the P-values compare? If the two approximate P-values are significantly different, which do you think is more reliable?
    
    

    3. (i) Write a computer program to estimate the exact P-value for the balanced-incomplete-block design (BIBD) test described in Section 7.6 in the text for the data in Problem 1. What is the estimated P-value? What is a 95% confidence interval for your estimated P-value? How does the P-value compare with the large-sample approximate P-value that you found in Problem 1?

    (ii) Since the pressures in the data in Problem 1 are increasing, it is natural to test the hypothesis H0 of no effect of pressure versus the alternative H1 of a monotonic relationship between conversion percentage and pressure. Define a statistic L that is the analogue of Page's test for complete data by first defining, for each pressure level i (these are the ``treatments''), the sum R_i over those experimental runs that have an observation for pressure i of the within-experimental-run rank or midrank for that observation. Then define L=Sum(i=1,k) iR_i as in Page's test.
    Using the same permutations as in the BIBD test (that is, permutations of within-experimental-run midranks among the observations that were made in that run), is the observed value of L significant? highly significant? What is a 95% confidence interval for the P-value? Use 100,000 permutations.
    (iii) Is the estimated P-value of L more significant or less significant than the BIBD test discussed in the text?
    (Hint: See TwoWayBibd.c on the Math408 Web site. Add code to the program to find a P-value for L as well as for the BIBD statistic ``Dscore'', for example by either defining two ``success'' counts nbig1,nbig2 instead of one count nbig within a single loop, or else by using two permutation loops, one for Dscore and one for L.)
    
    

    4.  In a study to determine the effect of light on the release of a hormone (luteinizing hormone, LH), rats were observed both under constant light and with 14hrs of light alternating with 10hrs of darkness. The rats were given one of five different levels of a luteinizing release factor (LRF) as a control for variable LRF. Six rats were studied for each of 5 levels of LRF and for each of two light regimes (constant or alternating), for a total of 60 rats, in a two-way experimental layout with six observations per cell.

    (i)  Is there a significant difference between constant light and alternating light on the level of LH in these rats, controlling (blocking) for the level of LRF? Use the large-sample approximation for the Mack-Skillings procedure (Section 7.9) to find out. (See Problem 103 on page 339 of the text for more details about the data.)
    (ii)  Suppose that you (incorrectly) ignored the blocking due to LRF levels and considered the data as a one-way layout with 30 rats in each treatment group and applied either the Kruskal-Wallis or the Wilcoxon rank-sum test. Compute the corresponding P-value, also using the large-sample approximation. If you used the Wilcoxon rank-sum test, take the square of the resulting Z-score so that you have a statistic with an asymptotic chi-square distribution as in the Kruskal-Wallis test.
    (iii)  Compare both the observed test statistics with asymptotic chi-square distributions and the resulting P-values for the two tests in parts (i) and (ii). Which test is more significant? What is the effect of the blocking on the P-values and/or chi-square-statistic scores? Which P-value would you have more confidence in? Why?
    
    

    5.  Table 5.4 on page 171 in the text has blood platelet counts for 16 newborn infants, 10 of which were born to mothers who were given the steroid drug prednisone during their pregnancy. The mothers of the remaining 6 infants were not given prednisone. In addition to whether or not the mean platelet count in the two samples is different, whether or not the variance of the two samples is the same is also of interest. (See page 171 in the text for more background about these data.)

    Use the Miller jackknife procedure (based on log sample variances) to find a 95% confidence interval for the ratio of the variances of the two samples. Do the two samples have significantly different variances, using the Miller procedure? What is the P-value?
    
    

    6.  In a study of pollution in Lake Michigan, the number of ``odor periods'' was observed for each of the years 1950-1964. The numbers of days are in Table 2.

        Table 2: Numbers of bad periods in Lake Michigan (1950-1964)
      ----------------------------------------------------------------
      (1950, 10)  (1951, 20)  (1952, 17)  (1953, 16)  (1954, 12)
      (1955, 15)  (1956, 13)  (1957, 18)  (1958, 17)  (1959, 19)
      (1960, 21)  (1961, 23)  (1962, 23)  (1963, 28)  (1964, 28)  
    (a)  Find the Kendall correlation coefficient tau for year versus the number of bad periods. Is it larger or smaller than the Pearson correlation coefficient rho? What is the value of rho? Show your calculations (or write a computer program).
    (b)  Are the number of bad periods increasing over time? Carry out the Kendall test (Section 8.1, p363 in the text) for an increasing relation between year and numbers of bad periods to find out. Find two-sided P-values using both (i) Table A.30 in the text and (ii) the large-sample approximation with tie corrections. How do the two P-values compare? (Note: You may be only able to bracket the P-value using Table A.30, for example 0.01<P<0.05 or P<0.001. Do the best that you can.)
    (Remark: The data is from Table 8.8 on page 381 in the text. See Problem 19 on page 381 for more background on this data set.)
    
    

  • Top of this page