Math 408 Homework 1

Math 408 Final Exam

Due at 4:30pm, May 9, in Room 100, Cupples I

1. Below are data gathered from a trial of a new drug, looking at time until remission (in weeks) in patients with leukemia.

Control: 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12

Drug: 6, 6, 6, 6*, 7, 9*, 10*, 11, 11*, 16, 17*, 19*, 20*, 25, 32*, 32*

Remark: censored observations are indicated by the symbol ‘*’.

Test the null hypothesis that the distributions of survival times are the same in the two groups. And also provide the Kaplan-Meier curve for both groups.

2. The data set here contains two columns x and y.

(i) What is the Pearson correlation coefficient rho between X and Y? Are the variables Y and X significantly correlated as measured by rho (under the assumption of normally distributed data)? What is the (two-sided) P-value?
(ii) What is the Kendall correlation coefficient tau? Are the variables Y and X significantly correlated as measured by the Kendall test? What is the (two-sided) P-value? Find the two-sided P-value using the large-sample approximation, both with and without the appropriate tie correction. By how much does the tie correction change the two-sided large-sample P-value?
(iii) What is the Spearman correlation coefficient R? Are the variables Y and X significantly correlated as measured by R? What is the (two-sided) P-value? Use the large-sample approximation, ignoring the tie correction if you prefer. (See Section 8.5 in the text).

3. For the data set in Problem 2, perform a nonparametric regression of y on x. You have the flexibility in choosing which method to use, but cross-validation must be used to choose the tuning parameter. Provide a scatter plot imposed with the nonparametric regression curve. Then comment on the validity of the tests based on different correlation coefficients in Problem 2.

4. A psychologist conducted an experiment to compare the effects of two stimulants. Thirteen randomly selected subjects received the first stimulant, and six randomly selected subjects received the second stimulant. The reaction times (in minutes) were measured while the subjects were under the influence of the stimulants. Test whether there is difference between the effects of the two stimulants using both parametric and nonparametric approaches. As a statistical analyst, you also need to explain to the psychologist how to use your analysis result. Provide such a report.

Stimulant	Reaction Time
1	1.94
1	3.27
2	3.27
1	1.94
1	3.27
2	3.27
1	2.92
1	3.27
2	3.27
1	2.92
1	3.70
2	3.70
1	2.92
1	3.70
2	3.70
1	2.92
1	3.74
2	3.74
1	3.27

5. Consider the paired (X,Y) data in Table 1.

    Table 1: Failure times Y and a predictor X

  ----------------------------------------------------

           X      Y                X      Y

         -----------             -----------

     1.   30      3         21.   40    168

     2.   21      5         22.   54    170

     3.   41      5         23.   27    180

     4.   83      6         24.   77    197

     5.   76      9         25.   90    217

     6.   89     17         26.   97    235

     7.   35     22         27.   93    250

     8.   78     23         28.   80    354

     9.   39     27         29.   73    368

    10.   38     31         30.   67    441

    11.   57     31         31.   72    486

    12.   98     38         32.   88    622

    13.   64     40         33.   94    642

    14.   34     42         34.   84    659

    15.   55     56         35.   86    773

    16.   44     62         36.   60    850

    17.   22     64         37.   99    902

    18.   56     66         38.   46   1090

    19.   74    142         39.   66   1658

    20.   43    159         40.   75   4032

(i) Find the coefficients beta and mu in the least-squares regression line Y_i=beta*X_i+mu. What is the P-value for H_0:beta=0, assuming that the data (X_i,Y_i) are normal?
(ii) Find the coefficients beta and mu in the regression line Y_i=beta*X_i+mu using Theil's nonparametric procedure. Given beta from Theil's method, estimate the intercept mu as the median of the n=820 Walsh averages of the 40 residuals. Find the P-value for H_0:beta=0 using the large-sample approximation described in Section 9.1.
(iii) Compare the two regression lines in parts (i) and (ii) by computing
(A) the average absolute error, which is S_1/n for S_1=Sum(i=1,n) |Y_i-beta*X_i-mu| and
(B) the RMS error, which is the square root of S_2/n for S_2=Sum(i=1,n) (Y_i-beta*X_i-mu)^2
Which of the two regression lines does better under criterion (A)? under criterion (B)?
(iv) Find the coefficients beta and mu using the rank regression method discussed in Section 9.6 in the text. Given the slope beta, estimate the intercept mu as the median of the n=820 Walsh averages of the 40 residuals.
(v) Compare the regression line in part (iv) with the two regression lines in part (iii). How does it compare using criterion (A)? Using criterion (B)?
(Hint: See the programs RankRegression, TheilRegression, and NonParmCorr on the Math408 Web site.)