## Math 434 Homework 3 - Fall 2005

• Text references are to
Statistical methods for survival data analysis., 3rd edition, by Lee and Wang

HOMEWORK #3 due 10-25

NOTES:   (THIS IS ALSO on the Math434 Web site.)

1. ORGANIZE YOUR HOMEWORK in the following manner:
(ii) Your SAS programs for any problems that require SAS
(iii) the SAS output that you got.
For problems involving SAS, add page numbers to your homework so that you can make references from part (i) to part (iii). (For example, in part (i), you can say things like, ``The answer to part (a) is 7. This answer is highly significant (P=0.007). The plot for part (b) is on page #Y in the SAS output.'')
Include your name in title statements in your SAS programs so that your name will appear at the top of each SAS output page.
2. If a problem asks you to do a statistical test, EXPLAIN CLEARLY what the null hypothesis H_0 is, what the alternative H_1 is, what test you used, what the P-value is, and whether the data is significant, highly significant, or neither. Include this as part of your answer in part (i).
```
```

• Problem 1. -- Survival times for a sample of individuals are given by
```  27  168  190  207  264  284  370  453  641
668  711  740  849  857  861  1277  1609  1804
2359  2500  2796  58+  438+  441+  503+  599+  793+
1326+  1434+  1444+   ```
Censored values are indicated by trailing plus signs. Assume that the times are exponentially distributed with some unknown exponential rate.
Find the maximum likelihood estimate of the exponential rate and the MLE of the mean of the distribution. Do this by hand, but you can use SAS to check your answer. (Hint: This is discussed in Section 7.2 in the text.)
```
```

• Problem 2 -- (a) Let X1, X2, ...,Xn be n observations of nonnegative-integer-valued random variables that have the distribution

Pr(X=k|c) = f(k,c) = ck/(1+c)k+1     for   k=0,1,2,3,...,     some c>0

(i) As consistency check, show that Sum(k=0 to infinity) f(k,c)=1 for all c>0.
(ii) Assuming that all X1, X2, ...,Xn are observed, derive a formula for the maximum likelihood estimator (MLE) (c-hat) of c.
(iii) Suppose that n=8 and that the observed values are 12, 8, 3, 1, 10, 9, 4, 5. Find c-hat.
(iv) Now suppose that some of the observations are censored. How does the formula that you derived in part (ii) change?
(v) Suppose that n=8 and that the observed values are 12, 8, 3, 1, 10, 9, 4+, 5+, where a trailing + sign indicates a right-censored value. Find c-hat.
(Hints: See Section 7.1 in the text. Also, recall the formula Sum(k=0,infinity) rk = 1/(1-r) for |r|<1 for the geometric series.)
```
```

• Problem 3. -- Survival times in weeks for subjects with and without treatment are given by
```Sample I (Xs, n=20)
1  7  12  15  15  19  23  31  36  60  61  65  67  106  115
140  156  164  231  365

Sample II (Ys, n=10)
7  7  11  16  17  24  27  27  89  105  ```
None of the values were censored. Assumed that both samples are exponentially distributed, although not necessarily with the same means.
(a) Are the exponential rates for the two samples the same? Use the likelihood ratio test to find an approximate P-value. Use either `proc lifetest` in SAS or else carry out the likelihood ratio test by hand. (Hints: This is discussed in Section 10.2 in the text. In this case, `proc lifereg` does a test that is similar to the likelihood ratio test but is not the LR test.)
(b) Are the exponential rates for the two samples the same? Use the Cox F-test to find the exact P-value. Carry out the test by hand. What are the degrees of freedom of the F-distribution involved? (Warning: An F-distribution has two degrees of freedom, one for the numerator and one for the denominator.) How does the P-value in this case compare with the P-value that you found in part (a)? (Hint: This is discussed in Section 10.2.2 in the text.)
```
```

• Problem 4. -- For the data in the preceding problem,
(a) Find an exact 95% confidence interval for the mean of Sample I. (Hint: See section 7.2.)
(b) Find an exact 95% confidence interval for the ratio of the means of Sample I over Sample II. (Hint: See section 10.2.)
```
```

• Problem 5 -- An experimenter gathers a sample of values measured in hours:
```   149  261  366  390  395    407  450  477  503  523
526  533  586  602  620    634  642  687  692  693
716  731  740  754  797    817  824  832  883  956
1028  1071  1201  1260    364+  409+  413+  455+  459+
828+  840+  1022+  1414+   ```
where the trailing plus signs indicate right-censored values.
(a) Use SAS to find the Kaplan-Meier estimate of the survival function S(t) and generate S(t), -log S(t) on t (ls), and log(-log S(t)) on log t (lls) plots. Does the ls plot look linear, which would suggest that the data are exponential? Does the lls plot look linear, which would suggest a Weibull distribution?
(b) Assuming that the data are exponentially distributed, what is the estimated rate parameter lambda? Assuming that the data are Weibull distributed, what is the Weibull estimated shape parameter alpha and rate parameter lambda?
(c) Can you reject the hypothesis that alpha=1 (that is, that the data are exponential)?
(Hint: Sometimes `proc lifereg` output has missing values for the ''Lagrange Multiplier Chi-Square Test'' for alpha=1, apparently because the numerical algorithm that it uses did not converge. If that happens, do a likelihood-ratio test by hand by comparing the log likelihoods at the MLEs for both the Weibull and Exponential models. Recall that, for nested models in which the smaller model has d parameters and the larger model has d+r parameters, under the hypothesis H_0 that the smaller model is true, then twice the difference in log likelihoods at the respective MLEs has a chi-square distribution with r degrees of freedom. The ``log likelihood'' values listed in `proc lifereg` output are the log likelihood values evaluated at the MLEs for that model.)