Math 408 Take-Home Final - Spring 2005

  • Click here for Math408 home page
  • Click here for Math408 homework page
  • Click here for Prof. Sawyer's home page

    TAKEHOME FINAL due Wednesday May 11 by 5:30 PM

    Text references are to Hollander and Wolfe,
      ``Nonparametric Statistical Methods'', 2nd ed.

    NOTE: In the following, ^ means superscript, _ (underscore) means subscript, and Sum(i=1,9) means the sum for i=1 to 9.

    Seven problems. Not all parts of problems are of equal weight.

    
    

    1.  The responses Y to an input X in 20 trials are recorded in the following table.

        Table 1: Responses Y to an input X
      ----------------------------------------------------
               X      Y                X      Y
             -----------             -----------
         1.   21      7         11.   24     26
         2.   74      7         12.   70     77
         3.   84    716         13.   30      7
         4.   56      9         14.   67     29
         5.   48      7         15.   92    337
         6.   29    116         16.   99    513
         7.   61     34         17.   45    632
         8.   79     21         18.   81    128
         9.   96    153         19.   37     30
        10.   93     95         20.   71    550
    (i)  What is the Pearson correlation coefficient rho between X and Y for the data in the table? Are the variables Y and X significantly correlated as measured by rho? What is the (two-sided) P-value?
    (ii)  What is the Kendall correlation coefficient tau? Are the variables Y and X significantly correlated as measured by the Kendall test? What is the (two-sided) P-value? Find two-sided P-values using both the appropriate table in the text and the large-sample approximation. Don't forget appropriate tie corrections. Bracket P-values from the table if need be.
    (iii)  What is the Spearman correlation coefficient R? Are the variables Y and X significantly correlated as measured by R? What is the (two-sided) P-value? Find two-sided P-values using both the appropriate table in the text and the large-sample approximation (see Section 8.5 in the text). Don't forget the appropriate tie corrections. Bracket P-values from the table if need be.
    
    

    2.  An agricultural company is interested in selling one or more of four new types of lamb chow (food) that were developed in the company's research division. Weight gains for yearling lambs on the four new lamb chows (labeled E1,E2,E3,E4) and on a standard lamb chow (Estd) are given in Table 2.

         Table 2: Lamb weight gains for five lamb chows
        -------------------------------------------------------
        Estd:  58   68   28   14  150   98  138   78  124   84 
        -------------------------------------------------------
        E1:   148  176   90   52  132   32  128   32           
        E2:   168  218  158  238   72  100  192                
        E3:    44  206  132   12  108  148  156  182   68   70 
        E4:    92  150  124  136  180  128  132  216  168  220 
        -------------------------------------------------------
    (i)  Is there a significant variation in weight gains for the five different lamb chows in Table 2? Use the Kruskal-Wallis test to find out. Find the P-value using the large-sample approximation, taking ties into account.
    (ii)  Using the standard diet (Estd) as a control, find multiple-comparison-corrected P-values that weight gains for E1,E2,E3,E4 are significantly greater than those for Estd (with one-sided P-values). Use the pairwise-Wilcoxon-rank-sum method of Steel (1959) discussed at the end of Sections 6.7 in the text.
    Specifically, if Z_{1i} is the normalized Wilcoxon rank-sum score between Ei and Estd in Table 2 (that is, Z_{1i}=W_{1i}^*/sqrt(2) for W_{ij}^* in Section 6.5), the theory in Section 6.5 says that approximate multiple-comparison-corrected one-sided P-values values can be found using the statistic Xmax=max(i=1,k-1)X_i, where X_i are k-1 standard normal random variables with a common correlation coefficient of 0.50 (k=5). (That is, P(Z_{1i}>=A_{obs}) = P(Xmax>=A_{obs}).) Use Table A21 with rho=0.50 and (ell)=k-1 to find multiple-comparison-corrected one-sided P-values for each of the new lamb chows.
    On the basis of these multiple-comparison-corrected P-values, which (if any) of the lamb chows are significantly better than the standard Estd? What are their P-values?
    
    

    3.  Consider the data on the percent conversion of methyl glusoside to monovinyl isomers in Table 7.14 (page 316) of the text.

    (i)  Is this a balanced incomplete block design? Why? If so, what are the parameters k, n, s, p, and lambda?
    (ii)  Test the hypothesis that the pressure (which defines the treatment groups) has an effect on the percent conversion to monovinyl isomers. Find P-values using both the appropriate table and the large-sample approximation. (Note that both the tabled values and the large-sample approximation require you to know k,n,s,p, and lambda. See Problem 63 on page 316 of the text for more details about the data.)
    
    

    4.  Consider the data in Table 7.25 of page 340 of the text on the amount of luteinizing hormone (LH) in rats that live either in constant light or else with 14hrs of light alternating with 10hrs of darkness. Five different levels of a luteinizing release factor (LRF) was also considered. Six rats were studied for each of 5 levels of LRF and each of two light regimes (constant or alternating), for a total of 60 rats, in a two-way experimental layout with six observations per cell.

    (i)  Use the large-sample approximation for the Mack-Skillings procedure (Section 7.9) to test whether the light regime has a significant effect on LH level, controlling for the amount of LRF. (See Problem 103 on page 339 of the text for more details about the data.)
    (ii)  Suppose that you (incorrectly) ignored the blocking due to LRF levels and considered the data as a one-way layout with 30 rats in each treatment group and applied either the Kruskal-Wallis or the Wilcoxon rank-sum test. Compute the corresponding P-value, also using the large-sample approximation.
    (iii)  Compare the P-values in parts (i) and (ii). What is the effect of the blocking? Which P-value would you have more confidence in? Why?
    
    

    5.  Consider the paired (X,Y) data in Table 1.

    (i)  Find the coefficients beta and mu in the least-squares regression line Y_i=beta*X_i+mu. What is the P-value for H_0:beta=0, assuming that the data (X_i,Y_i) are normal, using Student-t methods?
    (ii)  Find the coefficients beta and mu in the regression line Y_i=beta*X_i+mu using Theil's nonparametric procedure (see Section 9.2 in the text). Given beta from Theil's method, estimate the intercept mu as the median of the n=20 residuals Y_i-beta*X_i.
    Find the P-value for H_0:beta=0 using the method of Section 9.1. As in Problem 1, find two-sided P-values using both the appropriate table in the text and the large-sample approximation. Don't forget appropriate tie corrections. Bracket P-values from the table if need be. Note that there are no X-X ties in the data. (This is not difficult to do by hand, but see also the program NonparmRegr.c on the Math408 Web site.)
    (iii)  Compare the two regression lines in parts (i) and (ii) by computing (A) the sum of the absolute value of the errors S_1=Sum(i=1,n) |Y_i-beta*X_i-mu| and (B) the sum of the squares of the errors S_2=Sum(i=1,n) (Y_i-beta*X_i-mu)^2 in both cases. Which of the two regression lines does better under criterion (A)? under criterion (B)? By what amounts?
    
    

    6.  For the data in Table 1, as in the previous problem,

    (i)  For the regression line in part (i) of the previous problem, find the corresponding 95% confidence interval for beta, again using Student-t methods.
    (ii)  For the regression line in part (ii) of the previous problem, use the Hodges-Lehman-like method of Section 9.3 to find an exact nonparametric confidence interval for the true value of beta. Choose a coverage probability as close as possible to 95% (and state it). Use either the appropriate table in the text or the large-sample approximation.
    
    

    7.  The method that we used in Problem 2 for multiple comparisons with a control is not the method that is stressed in Section 6.7 of the book. In fact, the approximation used in Problem 2 is not very accurate if treatment group sizes are unequal and 5 or less.

    Write a computer program to estimate the true multiple-comparison-corrected one-sided P-values with a control that were approximated in Problem 2. Specifically, estimate the probability that the random value max(j=2,5) Z_{1j} under permutations is greater than or equal to each of the observed values of Z_{1i}, using N=100,000 random permutations of the 45 values in Table 2 preserving that the treatment-group sizes.
    Find 95% confidence intervals for each of the true one-sided P-values. For the P-values that are significant, do any of these confidence intervals contain the approximate P-values? Do the approximate P-values in Problem 2 seem like a good approximation of the true P-values?
    (Hints: (i) See the sample program OneWay3.c on the Math408 Web site. (ii) You may want to set N=100 in your program until you are sure that it is running correctly with output in the form that you want.)
    
    

  • Top of this page