Math 408 Take-Home Final

Math 408 Take-Home Final - Spring 2007

TAKEHOME FINAL due Wednesday May 9 by 5:30 PM

Text references are to Hollander and Wolfe, ``Nonparametric Statistical Methods'', 2nd edn.

IN THE FOLLOWING: Do Problems 2 and 3 by hand. Problems 1, 4, and 5 can be done either by hand or by using one or more computer programs.

NOTES: Hand in your homework in the order
(a) Your written answers to all problems, with references as needed to part (c) below,
(b) The computer source for any computer programs that you used
(c) All output from the programs in part (b)
This will put the emphasis on what you think the answers should be and on your evidence for this. If a reader thinks that your answers are reasonable, then he or she may or may not want to look at your actual output and computer programs.

Five problems. Not all parts of problems are of equal weight.

1. Widgets (an industrial product) were manufactured using widget coats from five different suppliers. Quality scores for 42 widgets along with the supplier's product name are listed in Table 1.

      Table 1: Widget quality by widget coat brand
  -----------------------------------------------------------
  Acme (n=9):     48   62   46   56   48   68   50   38   62
  Zenith (n=8):   74   30   52   50   56   38   54   24
  QQ21 (n=8):     52   66   74   78   98   58   52   72
  T37 (n=9):      88   90  108   62   88   66   90   72   98
  Nadir (n=8):    66   56   56   46   96   42   58   86

(a) Is there a significant variation in widget quality as a function of the widget coat supplier? Carry out the Kruskal-Wallis test to find out. Use the large-sample approximation.

(b) Which PAIRS of treatments (i.e., pairs of widget coat brands) are significantly different, allowing for multiple comparisons? For each signficantly-different pair, what is the (multiple-comparison-corrected) P-value? Use the large-sample approximation for the multiple-comparison method based on pairwise Wilcoxon rank-sum scores discussed in Section 6.5. What can you say about the pairwise differences that are not significant by this test?

2. The responses Y to an input X in 20 trials are recorded in the following table.

    Table 2: Responses Y to an input X
  ----------------------------------------------------
           X      Y                X      Y
         -----------             -----------
     1.   21      7         11.   24     26
     2.   74      7         12.   70     77
     3.   84    716         13.   30      7
     4.   56      9         14.   67     29
     5.   48      7         15.   92    337
     6.   29    116         16.   99    513
     7.   61     34         17.   45    632
     8.   79     21         18.   81    128
     9.   96    153         19.   37     30
    10.   93     95         20.   71    550

(i) What is the Pearson correlation coefficient rho between X and Y for the data in the table? Are the variables Y and X significantly correlated as measured by rho? What is the (two-sided) P-value?

(ii) What is the Spearman correlation coefficient R? Are the variables Y and X significantly correlated as measured by R? What is the (two-sided) P-value? Find two-sided P-values using both the appropriate table in the text and the large-sample approximation (see Section 8.5 in the text). Don't forget the appropriate tie corrections. Bracket P-values from the table if need be.

(iii) What is the Kendall correlation coefficient tau? Are the variables Y and X significantly correlated as measured by the Kendall test? What is the (two-sided) P-value? Find two-sided P-values using both the appropriate table in the text and the large-sample approximation. Don't forget appropriate tie corrections. Bracket P-values from the table if need be.

3. Measurements of responses to stress were measure for four different brands of product under 5 different conditions of stress. Two different measurements were made for each combination of brand and level of stress (see Table 3).

    Table 3: Responses under Stress for Four Brands of Products
  --------------------------------------------------------------------
   Stress      Brand1         Brand2          Brand3         Brand4
     0#      3.01, 3.04      3.47, 3.10      3.85, 3.87     3.41, 3.11        
     1#      2.85, 2.51      3.49, 3.45      3.64, 3.19     3.02, 3.33    
     2#      2.62, 2.60      3.11, 2.88      3.52, 3.49     3.08, 3.11    
     3#      2.63, 2.64      2.83, 3.15      3.21, 3.65     2.96, 2.97    
     4#      2.58, 2.60      3.12, 2.71      3.28, 3.25     2.67, 3.12

(i) Using the brand as blocks, is there a significant effect due to the amount of stress? Use the large-sample approximation for the nonparametric test described in Section 7.9 to find out. If there is a significant effect due to stress, what does it appear to be due to? Which particular levels of stress appear to be associated with ususually small or large responses?

(ii) Using the level of stress as blocks, is there a significant variation in the response effect over the four brands? Use the same procedure to find out. If there is significant variation with brand, which brands appear to be associated with unusually large or small responses?

4. Consider the paired (X,Y) data in Table 2.

(i) Find the coefficients beta and mu in the least-squares regression line Y_i=beta*X_i+mu. What is the P-value for H_0:beta=0, assuming that the data (X_i,Y_i) are normal, using Student-t methods?

(ii) Find the coefficients beta and mu in the regression line Y_i=beta*X_i+mu using Theil's nonparametric procedure (see Section 9.2 in the text). Given beta from Theil's method, estimate the intercept mu as either the median of the n=20 residuals Y_i-beta*X_i or else as the median of the n=210 Walsh sums of the 20 residuals. Find the P-value for H_0:beta=0 using the large-sample approximation described in Section 9.1.

(iii) Compare the two regression lines in parts (i) and (ii) by computing (A) the sum of the absolute value of the errors S_1=Sum(i=1,n) |Y_i-beta*X_i-mu| and (B) the sum of the squares of the errors S_2=Sum(i=1,n) (Y_i-beta*X_i-mu)^2 in both cases. Which of the two regression lines does better under criterion (A)? under criterion (B)? By what amounts?

5. Consider the paired (X,Y) data in Table 2.

(i) Find the coefficients beta and mu using the rank regression method discussed in Section 9.6 in the text. Given the slope beta, estimate the intercept mu as either the median of the n=20 residuals Y_i-beta*X_i or else as the median of the n=210 Walsh sums of the 20 residuals.

(ii) Compare the regression line in part (i) with the two regression lines in Problem 4. How does it compare using criterion (A)? Using criterion (B)?

(Hint: See the program RankRegress on the Math408 Web site.)

Top of this page