**********************************************************; * AFT survival regression with three covariates: * Weibull, log normal, and generalized gamma regression models * * Survival times in weeks were recorded after diagnosis of AML, * which is a form of leukemia that exhibits an excess of distorted * white blood cells (Wbcs). * * One is interested in how the survival times depend on * (i) Presence or absence of `Auer rods' in microscopic * examination of the distorted Wbcs and * (ii) The Wbc count when AML was diagnosed. Since the Wbc count * can be extremely elevated, logwbc=log(Wbc) is used as a covariate * instead of Wbc itself. * * A complicating factor is that the dependence of survival on initial * Wbc count might be different in patients with Auer rods than in * patients without. Thus, we introduce a third covariate (l2wbc) * that corrects for a difference in slopes between patients with * and without Auer rods. The survival plots below suggest that * the coefficient of l2wbc might be significantly positive. * * In general, which of these covariates affect survival time? * If so, by how much? * If the initial WBC has an effect, is it the same for both groups * of patients? * Do the results differ if we assume that survival times have a * log normal or generalized gamma distribution instead of a Weibull * or exponential distribution? * * Data is entered for each patient as (Weeks,Wbc(count)). The Auer rod * state is set by a `retain'ed variable `rodtype'. Data for rodtype * is entered as `Rodtype -1' so that the data step can read pairs * of values in all cases. * * The variable haverods=1 means that patient's Wbcs have Auer rods * and haverods=0 means that they do not. SAS would specify a Zero-One * variable of this type even if we did not. However, if we define * this variable ourselves, then we can be sure which group the * `haverods' coefficient regards as the zero state. * * Note that there are no censored values in this example. * * Data from Table 10.10, p271, of the 2nd edition of the text * (E.T.Lee, ``Statistical Methods for Survival Data Analysis'') * The 3rd edition discusses a somewhat similar example for ALL * (a different form of Leukemia) in Example 11.1, p266-269. * **********************************************************; title 'PARAMETRIC DISTRIBUTIONS WITH COVARIATES -- YOURNAME'; options ls=75 ps=60 pageno=1 nocenter; title2 'SURVIVAL TIMES by initial white blood-cell count and rod type'; data lrwbc; retain rodtype haverods; input zz$ wbc @@; if wbc=-1 then do; rodtype=zz; haverods=(rodtype='WithRods'); end; else do; weeks=input(zz,12.0); logwbc=log(wbc); l2wbc=haverods*logwbc; output; end; drop zz wbc; * For neatness; datalines; WithRods -1 65 2300 156 750 100 4300 134 2600 16 6000 108 10500 121 10000 4 17000 39 5400 143 7000 56 9400 26 32000 22 35000 1 100000 1 100000 5 5200 65 100000 NoRods -1 56 4400 65 3000 17 4000 7 1500 16 9000 22 5300 3 10000 4 19000 2 27000 3 28000 8 31000 4 26000 3 21000 30 79000 4 100000 43 100000 ; proc sort; * Sort by rodtype, then survival time, then logwbc.; * Note that this sort puts `NoRods' before `WithRods'; by rodtype weeks logwbc; run; proc print; title3 'The data as SAS sees it'; run; **********************************************************; * Construct s, ls, and lls plots and do nonparametric tests for the * effect of the presence of rods on survival. Both the S(t) plots * and the printout suggest that patients with Auer rods have better * survival. This might be because having Auer rods is the normal * condition. * Note that the data appears to fit an exponential distribution within * each rod-type group, reasonably if not perfectly. * However, these plots might be misleading since they ignore the * effects of the Wbc count. **********************************************************; options ps=40; * For nicer lineprinter plots; proc lifetest plots=(s,ls,lls) notable lineprinter; title3 'COMPARING THE TWO ROD TYPES'; strata rodtype; time weeks; run; options ps=60; **********************************************************; * Carry out an AFT (Accelerated Failure Time) model with Weibull * survival times. See the discussion about AFT models in comments * in the handout `l2rats.sas'. As before, the Weibull AFT model * is equivalent to * * log Y_X = log(1/lambda) + Sum beta_i X_i + (1/alpha)log(Exp(1)) * * where the sum is now over more than one covariate and Exp(1) means a * rate-one exponential distribution. This is the same as a classical * multiple regression except that the ``error distribution'' * log(Exp(1)) has an extreme-value rather than a normal distribution. * * We first try an AFT model with exponential errors (that is, * with alpha=1 fixed), since the ls and lls plots suggest that * an Exponential model may fit as well as a Weibull model, and the * Exponential-model output contains a test for H_0:alpha=1 * (i.e, Weibull instead of Exponential). * * We carry out regressions both with and without the `interaction' * variable l2wbc. The two-variable model assumes that the log survival * times have the same slopes with respect to log(Wbc) in both * Auer-rod-type groups, although the rates (or intercepts) may differ. **********************************************************; proc lifereg; title3 'EXPONENTIAL REGRESSION with three parameters'; model weeks = haverods logwbc l2wbc / dist=Exponential; run; proc lifereg; title3 'EXPONENTIAL REGRESSION with two parameters'; model weeks = haverods logwbc / dist=Exponential; run; **********************************************************; * None of the parameters in the three-parameter model are significant, * while both parameters in the two-parameter model are significant. * This is paradoxical because, in the three-parameter model, SAS * could just have said that the `correction term' for l2wbc (which * measures an additional slope in the WithRods group) was zero. * * It is typical of maximum likelihood models in that they sometimes * need a lot of data to resolve multiple parameters. However, it may * also suggest that, in fact, logwbc has different slopes for the * two rod types, which the three-parameter model could not resolve. * We find a different model to address this below. * * The output says that it is perfectly happy with an Exponential AFT * model in comparison with a general Weibull model, and that * (in effect) a full Weibull regression is not worth the extra effort. * However, let's see what the Weibull output looks like anyway: **********************************************************; proc lifereg; title3 'WEIBULL REGRESSION with three parameters'; model weeks = haverods logwbc l2wbc / dist=Weibull; run; proc lifereg; title3 'WEIBULL REGRESSION with two parameters'; model weeks = haverods logwbc / dist=Weibull; run; **********************************************************; * The Weibull output is virtually identical to the Exponential model * output except for SCALE (or 1/Weibull alpha), which is variable * but not distinguishable from one. * * We next try a model with different slope parameters for each rod * type. This approach might have more power for detecting a * different dependence on log(Wbc) within the two Auer-rod groups, * since it amounts to two different single-variable regressions, * one in each group. * * In the new dataset (also called lrwbc), both `lognorods' and * `logwrods' (log-with-rods) are values of logwbc=log(Wbc). **********************************************************; data lrwbc; set lrwbc; if haverods=1 then do; lognorods=0; logwrods=logwbc; end; else do; lognorods=logwbc; logwrods=0; end; run; proc print; title3 'The new data as SAS sees it:'; title4 'A different slope within each Auer-rod group'; run; proc lifereg; title3 'EXPONENTIAL REGRESSION with slopes for each rod type'; title4 'Note that this is the same as having two separate regressions,'; title5 ' one within each rod type.'; model weeks = haverods logwrods lognorods / dist=Exponential; run; proc lifereg; title3 'STRATIFIED EXPONENTIAL REGRESSION for each rod type:'; title4 'Note the results are identical to the previous model.'; classes rodtype; by rodtype; model weeks = logwbc / dist=Exponential; run; **********************************************************; * Both outputs clearly show that there is an interaction: * log(Wbc) is predictive of survival time only for the patients that * still have Auer rods. Otherwise, Wbc has no predictive power. * This might be because many of the no-Auer-rod patients are so sick * that a relatively low Wbc count is a sign of terminal disease * as opposed to a sign of healthy Wbcs. * * * How do the conclusions change with other AFT models? * * The log-normal AFT model has log(T) modeled as a normal distribution, * which, if there are no censored observations, is exactly the same * as a classical multiple regression of log(T) with normal errors. * * The hazard rate for the log-normal model has h(t) approx C t * as t-->infinity, which is a reasonable approximation in many cases. * **********************************************************; proc lifereg; title3 'LOG-NORMAL REGRESSION with two slopes for logwbc'; model weeks = haverods logwrods lognorods / dist=LNormal; run; **********************************************************; * Let's compare the log-normal output with a standard normal regression: **********************************************************; data lrwbc; set lrwbc; logweeks=log(weeks); run; proc reg lineprinter; title3 'CLASSICAL REGRESSION of log(weeks) on three covariates'; model logweeks = haverods logwrods lognorods; * Add columns for predicted and residual plots; plot logweeks*logwrods='Y' predicted.*logwrods='P' /overlay; plot logweeks*lognorods='Y' predicted.*lognorods='P' /overlay; plot residual.*predicted.; run; **********************************************************; * The residuals show a tendency to be more diverse for smaller * predicted values than for larger predicted values, which suggests * that the log normal model is not a perfect fit. * * * Now let's try a generalized gamma model. This has the density * * f(t) = C t^{ac-1} exp(-(bt)^a) * * This density can be viewed as either a three-parameter generalization * of the Weibull distribution (c=1) or of a gamma distribution (a=1). * However, it is re-parametrized in SAS so that, instead, it forms an * interpolation between the Weibull and log normal distributions. * See Section 11.6, p277, of the text (L+W, 3rd edn) for a * discussion of this distribution and its re-parametrization. * * The hazard rate for the generalized gamma distribution has three * parameters instead of two. It is the only model that we have * seen so far that has `bathtub-shaped' hazard functions for some * parameter values: That is, h(t) that is infinite both as t->0 * and as t->infinity, as is likely to be the case in most examples. **********************************************************; proc lifereg; title3 'GENERALIZED GAMMA REGRESSION with two slopes for logwbc'; title4 "Note the two different `shape' parameters in the output"; model weeks = haverods logwrods lognorods / dist=Gamma; run;