Due at 4:30pm, May 9, in Room 100, Cupples I
1. Below are data gathered from a trial of a new drug, looking at time until remission (in weeks) in patients with leukemia.
Control: 1,
1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12
Drug: 6, 6,
6, 6*, 7, 9*, 10*, 11, 11*, 16, 17*, 19*, 20*, 25, 32*, 32*
Remark: censored observations are indicated by the symbol ‘*’.
Test the null hypothesis that the distributions of
survival times are the same in the two groups. And also provide the
Kaplan-Meier curve for both groups.
2. The data set here contains two columns x and y.
(i)
What is the Pearson correlation coefficient rho between X and Y? Are the
variables Y and X significantly correlated as measured by rho (under the
assumption of normally distributed data)? What is the (two-sided) P-value?
(ii) What is the
(iii) What is the Spearman correlation coefficient
R? Are the variables Y and X significantly correlated as measured by R? What is
the (two-sided) P-value? Use the large-sample approximation, ignoring the tie
correction if you prefer. (See Section 8.5 in the text).
3. For the data set in Problem 2, perform a nonparametric regression of y on x. You have the flexibility in choosing which method to use, but cross-validation must be used to choose the tuning parameter. Provide a scatter plot imposed with the nonparametric regression curve. Then comment on the validity of the tests based on different correlation coefficients in Problem 2.
4. A psychologist conducted an experiment to compare the effects of two stimulants. Thirteen randomly selected subjects received the first stimulant, and six randomly selected subjects received the second stimulant. The reaction times (in minutes) were measured while the subjects were under the influence of the stimulants. Test whether there is difference between the effects of the two stimulants using both parametric and nonparametric approaches. As a statistical analyst, you also need to explain to the psychologist how to use your analysis result. Provide such a report.
Stimulant |
Reaction
Time |
1 |
1.94 |
1 |
3.27 |
2 |
3.27 |
1 |
1.94 |
1 |
3.27 |
2 |
3.27 |
1 |
2.92 |
1 |
3.27 |
2 |
3.27 |
1 |
2.92 |
1 |
3.70 |
2 |
3.70 |
1 |
2.92 |
1 |
3.70 |
2 |
3.70 |
1 |
2.92 |
1 |
3.74 |
2 |
3.74 |
1 |
3.27 |
5. Consider the paired (X,Y) data in Table 1.
Table 1: Failure times Y and a predictor X
----------------------------------------------------
X Y X Y
----------- -----------
1. 30 3 21. 40 168
2. 21 5 22. 54 170
3. 41 5 23. 27 180
4. 83 6 24. 77 197
5. 76 9 25. 90 217
6. 89 17 26. 97 235
7. 35 22 27. 93 250
8. 78 23 28. 80 354
9. 39 27 29. 73 368
10. 38 31 30. 67 441
11. 57 31 31. 72 486
12. 98 38 32. 88 622
13. 64 40 33. 94 642
14. 34 42 34. 84 659
15. 55 56 35. 86 773
16. 44 62 36. 60 850
17. 22 64 37. 99 902
18. 56 66 38. 46 1090
19. 74 142 39. 66 1658
20. 43 159 40. 75 4032
(i) Find the coefficients
beta and mu in the least-squares regression line Y_i=beta*X_i+mu. What is the P-value for H_0:beta=0,
assuming that the data (X_i,Y_i) are normal?
(ii) Find the coefficients beta and mu in the
regression line Y_i=beta*X_i+mu
using Theil's nonparametric procedure. Given beta
from Theil's method, estimate the intercept mu as the
median of the n=820 Walsh averages of the 40 residuals. Find the P-value for
H_0:beta=0 using the large-sample approximation
described in Section 9.1.
(iii) Compare the two regression lines in parts (i) and (ii) by computing
(A) the average absolute error, which is
S_1/n for S_1=Sum(i=1,n) |Y_i-beta*X_i-mu| and
(B) the RMS error, which is the square root
of S_2/n for S_2=Sum(i=1,n) (Y_i-beta*X_i-mu)^2
Which of the two regression lines does better under
criterion (A)? under criterion (B)?
(iv) Find the coefficients
beta and mu using the rank regression method discussed in Section 9.6 in
the text. Given the slope beta, estimate the intercept mu as the median of the
n=820 Walsh averages of the 40 residuals.
(v) Compare the regression line in part (iv) with the two regression lines in part (iii). How
does it compare using criterion (A)? Using criterion (B)?
(Hint: See the programs RankRegression
, TheilRegression
,
and NonParmCorr
on the Math408 Web site.)