## Math 434 Homework 4 - Fall 2005

• Text references are to
Statistical methods for survival data analysis., 3rd edition, by Lee and Wang

HOMEWORK #4 due 11-15

NOTES:   (THIS IS ALSO on the Math434 Web site.)

1. ORGANIZE YOUR HOMEWORK in the following manner:
(ii) Your SAS programs for any problems that require SAS
(iii) the SAS output that you got.
For problems involving SAS, add page numbers to your homework so that you can make references from part (i) to part (iii). (For example, in part (i), you can say things like, ``The answer to part (a) is 7. This answer is highly significant (P=0.007). The plot for part (b) is on page #Y in the SAS output.'')
Include your name in title statements in your SAS programs so that your name will appear at the top of each SAS output page.

2. NOTE: If a problem asks you to do a statistical test, EXPLAIN CLEARLY what the null hypothesis H_0 is, what the alternative H_1 is, what test you used, what the P-value is, and whether the data is significant, highly significant, or neither. Include this as part of your answer in part (i).

• Problem 1. -- An experimenter measures failure times (YY) in days for 80 experimental subjects along with a group property (Green or Blue) and values for two additional covariates that the experimenter calls DVAL and FVAL. The observations are given in Table 1. Note the description of the variables in the table heading.
```                 Table 1 - Data for 80 subjects
Values for each subject are (a) group (Green or Blue),
(b) failure time or time last seen, (c) status (0 for observed
failure, 1 for censored event), and (d) the values of two
covariates (DVAL and FVAL).

1.  Green    19  0   43  85     41.  Blue     88  0   33  63
2.  Green    23  0   45  77     42.  Blue     89  0   38  41
3.  Blue     24  0   45  51     43.  Green    91  0   49  77
4.  Blue     26  0   40  66     44.  Blue     93  0   20  50
5.  Green    30  0   26  83     45.  Green    93  0   32  54
6.  Blue     36  0   34  68     46.  Green    95  0   27  76
7.  Green    36  0   32  62     47.  Green    96  0   48  53
8.  Blue     40  0   35  48     48.  Green    97  0   49  57
9.  Green    49  0   17  47     49.  Green    99  0   26  52
10.  Green    50  0   17  48     50.  Green   104  0   36  40
11.  Green    54  0   48  47     51.  Green   104  0   20  42
12.  Blue     55  0   40  46     52.  Green   106  0   20  44
13.  Blue     56  0   44  53     53.  Green   107  0   33  68
14.  Blue     60  0   14  63     54.  Green   107  0   13  56
15.  Blue     60  0   33  62     55.  Green   108  0   36  66
16.  Blue     62  0   44  45     56.  Green   108  0   40  57
17.  Green    62  0   33  52     57.  Green   111  0   43  38
18.  Green    67  0   28  59     58.  Green   113  0   32  65
19.  Blue     68  0   41  51     59.  Green   116  0   30  57
20.  Green    69  0   35  47     60.  Green   119  0   36  71
21.  Blue     69  0   24  52     61.  Green   122  0   30  72
22.  Green    69  0   43  58     62.  Green   132  0   31  60
23.  Green    70  0   22  66     63.  Blue    142  0   16  35
24.  Blue     70  0   29  45     64.  Green   150  0   38  42
25.  Green    71  0   41  58     65.  Green   153  0   26  35
26.  Blue     71  0   17  79     66.  Blue     23  1   32  76
27.  Blue     72  0   32  53     67.  Blue     30  1   40  72
28.  Blue     73  0   43  47     68.  Green    33  1   38  65
29.  Green    79  0   40  60     69.  Blue     34  1   43  68
30.  Blue     80  0   30  48     70.  Green    59  1   44  67
31.  Blue     80  0   41  75     71.  Green    68  1   42  51
32.  Blue     83  0   31  62     72.  Blue     72  1   35  67
33.  Green    83  0   37  73     73.  Green    78  1   49  47
34.  Blue     83  0   43  55     74.  Green    86  1   14  74
35.  Green    84  0   40  77     75.  Green    87  1   42  54
36.  Green    85  0   43  70     76.  Blue     89  1   24  62
37.  Blue     85  0   22  62     77.  Green   100  1   41  49
38.  Green    85  0   46  64     78.  Green   115  1   19  48
39.  Green    85  0   17  74     79.  Blue    115  1   39  44
40.  Green    88  0   16  42     80.  Green   131  1   37  63  ```

(i) Analyze the data in Table 1 using an AFT Weibull regression on Group (as a class variable), DVAL, and FVAL.
Do the failure times depend significantly on the group? On DVAL? On FVAL? Find the P-values for the variables that are significant.
If Group is significant, which group (Green or Blue) has the longer expected survival time? How can you tell from the output? If DVAL or FVAL is significant, do larger values of that variable lead to longer survival times or shorter survival times? How can you tell from the output?

(ii) What value of the Weibull parameter alpha does SAS estimate, if the Weibull distribution is written as SX(t) = exp(-(lambdaX t)a) for a=alpha? Does the confidence interval for alpha overlap alpha=1?

(iii) Answer the questions in part (i) for an AFT model with exponential errors. Are any of your conclusions different? In particular, does assuming exponential instead of Weibull-distributed errors increase the significance of the covariates for the data in Table 1, decrease the significances, or leave them about the same?

(iv) Does an AFT model with exponentially-distributed errors fit these data as well as an AFT model with Weibull errors? Find a P-value for the hypothesis that the failure times are consistent with an exponential model within the alternative of a Weibull model. Do you conclude that a Weibull model would be more consistent with the data, would be less consistent with the data, or would be about the same?

```
```

• Problem 2. -- Use an AFT Weibull regression model for the survival data in Table 1 using only the group variable. That is, ignore the covariates DVAL and FVAL.
(i) Is there a significant difference in survival times between individuals in the two groups? What is the P-value? Is it more significant than the P-value for Group in the last problem, less significant, or about the same?
(ii) In general, a Weibull distribution with survival function S_X(t)=exp(-(laX t)a) for a=alpha has hazard function

h_X(t) = (laX)a ata-1

What is the ratio of the estimated hazard rates between individuals in the two groups? Are Green individuals at a greater or smaller hazard? (Hint: Be careful. Recall that a longer lifetime means a smaller hazard rate, and vice versa, and keep track of confounding with alpha.)

```
```

• Problem 3. -- Variables AA, BB, and CC were measured for 32 subjects. Of these subjects, 12 later developed Condition X while the remaining 20 did not develop Condition X.
An experimenter is interesting in finding which of the variables AA, BB, and CC are significantly related to developing Condition X. The experimenter is also interested in finding a rule that, given the values of AA, BB, and CC for a subject, predicts the probability that that subject will later develop Condition X. The data are listed in Table 2.
```          Table 2 - Covariates AA BB CC for 32 subjects
that later either developed or did not develop Condition X

Developed Condition X         Did NOT develop Condition X
Subj   AA     BB     CC        Subj    AA      BB      CC
1    69     83     51         13     36      55      39
2    51     74     32         14     50      69      44
3    27     68     33         15     36      59      28
4    55     85     46         16     31      26      44
5    27     99     34         17     31      49      47
6    44     68     38         18     32      45      50
7    49     88     57         19     40      59      33
8    28     64     66         20     49      51      42
9    32     58     46         21     38      70      47
10    47     81     39         22     46      63      26
11    35     77     31         23     46      64      47
12    30     69     62         24     67      94      43
25     47      60      56
26     56      62      45
27     39      64      27
28     52      71      24
29     33      62      52
30     57      63      48
31     39      78      23
32     48      70      55  ```
(i) Use a logistic regression to predict the probability of developing Condition X given values of AA, BB, and CC.
Is there an overall statistically significant effect of the three covariates together on whether or not a subject develops Condition X? What is the P-value? (If more than one test is available, pick one of them.) What is the number of degrees of freedom of the chi-square statistic?
(ii) Which of the three variables AA, BB, and CC individually have a significant effect on the probability of developing Condition X in the logistic regression? Which have a highly significant effect? For the variables that have significant effects, what is the P-value for each? For each variable with a significant effect, does increasing the value of that variable make Condition X more likely to occur, or less likely? How can you tell from the output?
(iii) Are your answers to part (ii) consistent with the means of the variables in the two groups? That is, if increasing a covariate also increases the probability of Condition X, is this consistent with the mean of that covariate being higher among the records with Condition X?
(HINT: A fast way to compare the means of AA, BB, CC for Conditions X and NotX is ```proc means n mean; class xx; var aa bb cc; run;``` where xx takes on the values `X' or `NotX'. WARNING: Make sure that your regression is predicting Prob(X) given the covariates and NOT Prob(NotX). See the discussion of the `descending` option in the comments in the file `LGexamps.sas` on the Math434 Web site.)
```
```

• Problem 4. Consider the logistic regression from the previous problem.

(i) Suppose that you are counseling someone whose medical tests reveal AA=50, BB=70, and CC=40. What probability does your logistic regression predict that this person will later develop Condition X?

(ii) Suppose that an individual who had thought he had known his values of AA, BB, and CC found out that, in fact, his value of CC was larger by one. (That is, his value of CC went from CC to CC+1.) Quantitatively, how does that affect the odds that he will develop condition X? Is the odds ratio larger or smaller than one? (Hint: The odds of an event A with p=Prob(A) is p/(1-p). See the discussion in Section 14.2 p385ff in the text and in particular the discussion of ORs (odds ratios).)