************************************************************; * Compare two samples using the Cox PH survival model. * Both samples have values in the range 10-89 or 10-99, * so that they should be comparable. * * What is the relative risk of not being treated? Is the * proportional-hazards model reasonable? * * In the data step, `treat'=0 is the control, so that the relative * risk for `treat' is the relative risk for being treated. ************************************************************; title 'COMPARE TWO SAMPLES -- COX PH MODEL - YOURNAME'; options ls=75 ps=60 pageno=1 nocenter; data twosamp; retain group treat; input xx$ @@; if xx='Treated' or xx='Control' then do; group=xx; treat=(xx='Treated'); end; else do; time=input(xx,12.0); status=(time<0); time=abs(time); output; end; drop xx; datalines; Treated 10 19 47 48 51 52 53 56 59 59 61 62 62 64 65 68 72 72 79 89 Control 10 12 18 25 26 35 35 40 58 78 83 91 94 97 99 ; ************************************************************; * Construct plots and do nonparametric tests for a difference between the * two samples. * Recall that the `ls' plot (-log S(t) vs. t) is a plot of the cumulative * hazard function. ************************************************************; options ps=30; proc lifetest plots=(s,ls) notable lineprinter; strata group; time time*status(1); run; options ps=60; ************************************************************; * It looks as if the Control group is under higher hazard up to time * approx 50, and the Treatment group is under higher hazard for * time>50. Specifically, the relative risk for being treated * should be < 1 at times<50 (which means their death rate is lower) * and > 1 for times>50, which means a higher death rate. * * First, let's ignore the time-dependent behavior of the relative * risk, apply a Cox regression, and see what happens. * There are no censored individuals, but status=0 for all records. ************************************************************; proc phreg data=twosamp; title2 'PHREG FOR TWO SAMPLES'; model time*status(1)=treat; run; ************************************************************; * This implies that there is no difference in hazard rate between * the Treated and Control groups, so that they have exactly the * same hazard rates. This, of course, is because SAS is estimating * a time-averaged relative hazard rate for Treated of some kind, * which apparently averages out to close to one. * * We can extend the Cox PH model to include time-dependent hazards * like those apparently exhibited in the Kaplan-Meier curves, but * this must be done by acting on the likelihood directly and not by * applying the usual Cox regression with new dataset variables. * * The reason for this is that, with time-dependent hazards, all * individuals at risk at a time t_i compete on the basis of their * relative hazards at time t_i. When data is entered in a SAS dataset * for the k-th individual, the only time that is available for that * record is its death or censoring time Y_k, with no direct way * to enter information about its hazards at earlier observed death * times t_i < Y_k. * * However, the Cox likelihood has factors for how all individuals in * the risk set at time t_i compete at time t_i. In principle, * these factors could be changed to reflect any sort of behavior * at time t_i that is consistent with a conditional logistic * regression term at time t_i. * * `Proc phreg' has ``programming statements'' that you can use to * define a wide variety of time-dependent hazard behavior at the * distinct observed death times t_i. * * This not only provides a way to test for the presence of * non-time-proportional-hazard effects, but also a way to model * them. It also provides the ability to model a wide variety of * time-dependent hazard behavior, for example prognostic variables * that can be updated during the course of a study. * * The present data suggests a time-dependent treatment effect, with * an decreased relative hazard for the Treated group at early times * but an increased hazard at later times. We first model this by * assuming that the hazard rate covariate term increases * (or decreases) linearly in time, so that the hazard rate * increases or decreases multiplicatively in time. If the hazard * rate does change suddenly at time=50, then the slope of a linear * function of time should be significantly different from zero. * ************************************************************; proc phreg data=twosamp; title2 'PROC PHREG WITH A TIME-DEPENDENT COORDINATE'; model time*status(1)=treat trttime; * Modeling statement for Cox likelihoods; * This is NOT the same as adding `treat*time' to all data records; trttime=treat*time; run; ************************************************************; * The slope parameter (as well as the intercept) of the hazard * covariate term are significantly different from zero. This implies * that the hazard rate varies significantly in time, as we suspected. * * As mentioned earlier, the Kaplan-Meier curves suggest that the * Treated group has lower relative hazard at times<50, higher * hazards at times>50, with hazard rates for the two groups about * the same after time=80. * * One way to test this is by artificially declaring records at certain * times to be censored. This does not change the risk sets in the * Cox likelihood at any time, but does remove those factors in the * Cox likelihood at observed death times that we have artificially * declared to be censored. The effect is to estimate relative hazard * rates only for the observed death times that remain. * * We use this to estimate the relative hazard rates of the Treated * group both for times<50 (only) and for times>50 (only) by * artificially setting observed death times in other time ranges * to censored. * * We then show how to get similar results by using `proc phreg' * ``programming statements'' to define appropriate time-dependent * covariates that accormplish the same thing in one regression. ************************************************************; data censafter50; * Same records as in `twosamp', but censored after time 50; set twosamp; if time > 50 then status=1; run; proc phreg data=censafter50; title2 'RATES FOR TIME<=50 ONLY: ALL TIMES>50 ARE CENSORED'; model time*status(1)=treat; run; data censbefore50; set twosamp; * Same records as in `twosamp', but censored BEFORE time 50; if time < 50 then status=1; run; proc phreg data=censbefore50; title2 'RATES FOR TIMES>50 ONLY: ALL TIMES<50 ARE CENSORED'; model time*status(1)=treat; run; ************************************************************; * This shows that the relative risk for Treater is 0.290 < 1 for * times<50 but 6.396 for times>50, with both of those values * significant. This is what we had guessed from the Kaplan-Meier * curves. * * We now use `programming statements' to split `treat' into two * covariates, one that is active only for early times (<=50) * and one that is only active for late times (>50). * * When SAS maximizes the Cox likelihood, the factors for time<=50 * will have parameters only for `early' (with early=treat) and the * factors for time>50 will have parameters only for `late' * (with late=treat). This will give us separate `early' and `late' * estimates for the treatment-group relative risk, and should also * give us exactly the same estimates as above. ************************************************************; proc phreg data=twosamp; title2 'TWO HAZARD-RATE TIERS BY PROGRAMMING STATEMENTS'; model time*status(1)=early late; * early=treat if time<=50, otherwise early=0; * late=treat if time>50, otherwise late=0; early=treat*(time<=50); late=treat*(time>50); run;