**********************************************************;
* Two groups of rats are given two different protective treatments and
*   then exposed to a carcinogen, DMBA.  Survival times (actually, times
*   until cancer diagnosis) are given in days. There were no observed
*   failures (cancer diagnoses) until 142 days after exposure, then
*   about half of the rats were diagnosed in the next 83 days. A lag
*   or threshold time of 100 days was assumed before fitting a Weibull
*   distribution.
*
* The main question is whether the two groups differ significantly in
*   their survival times. A second question is the Weibull shape
*   parameter describing the survival times.
*
* It will turn out that the survival times differ significantly
*   (P=0.0455) assuming a Weibull distribution but are not significantly
*   different (P=0.5935 or P=0.5926) if exponential survival times are
*   assumed. In general, guessing the shape of the survival time
*   distribution incorrectly can have a very strong effect on the
*   P-values for significance of covariates.
*
* The two groups are called `Agroup' `Bgroup' rather than Group 1 and
*   Group 2 to make the surival plots clearer.
*
* Data from Example 6.2, p142, in E.T.Lee & J.W.Wang, ``Statistical
*   Methods for Survival Data Analysis'', 3rd edn.
**********************************************************;

title 'TWO GROUPS OF RATS EXPOSED TO A CARCINOGEN';
options ls=75 ps=60 pageno=1 nocenter;

data rats;
     retain group;
     input zz$ @@;  lagtime=100;   * Lag time;
     if substr(zz,2,6)='group' then  group=zz;
     else do;  days=input(zz,12.0);  status=(days<0);
               days=abs(days) - lagtime;
               output;  end;
datalines;
  Agroup
   143  164  188  188  190  192  206  209  213  216
   220  227  230  234  246  265  304  -216  -244
  Bgroup
   142  156  173  198  205  232  232  233  233  233  233
   239  240  261  280  280  296  296  323  -204  -344
   run;

proc print;
     title2 'The data as SAS sees it';
     run;


**********************************************************;
* Draw `ls' and `lls' plots for the two groups to check visually for
*   fit to Exponential and Weibull distributions, and to check if the
*   LogRank and Wilcoxon tests are significant.
*
* Note that the `-2log(LR)' P-value is very different from the LogRank
*   and Wilcoxon P-values. This is consistent with the survival-time
*   distribution being far from exponential.
**********************************************************;

options ps=35;	* For neater-looking lineprinter plots;

proc lifetest plots=(s,ls,lls) notable lineprinter;
     title2 'S, LS, and LLS PLOTS AND NONPARAMETRIC TESTS';
     strata group;
     time days*status(1);
     run;

options ps=60;	* Restore page size;


**********************************************************;
* Carry out a Weibull AFT model with lagtime=100. The Weibull
*   distribution is
*
*     Prob(Y>=t) = S(t) = exp(-(lambda t)^alpha )
*
* which is equivalent to (see the comments in lr1samp.sas)
*
*     log Y = log(1/lambda) + (1/alpha)log(Exp(1))
*
* where P(log(Exp(1))>=y) = exp(-exp(y))  has what is called an
*  ``extreme-value distribution''.
*
* Given a covariate or covariates X, the AFT (for ``Accelerated Failure
*   Time'') model for the Weibull distribution is
*
* (*)   log Y_X = log(1/lambda) + beta X + (1/alpha)log(Exp(1))
*
* exactly as in a classical regression, except that here the ``error
* distribution'' log(Exp(1)) has an extreme-value rather than a normal
* distribution.  If beta>0 and X is increased, then Y_X (the failure
* time) also increases.
*
* The effect of X in (*) is to change  lambda-->lambda exp(-beta X),  or
*
*   S_X(t) = Prob(Y_X>=t) = exp(-[lambda exp(-beta X)t]^alpha )
*
* The hazard rate for S_X(t) is
*
* (**)  h_X(t) = alpha[lambda exp(-beta X)]^alpha t^{alpha-1}
*
* If beta>0, then increasing X decreases the hazard rate and thus
*   increases the lifetime, which is consistent with (*).
*
* The SAS AFT procedure `proc lifereg' with / dist=Weibull  uses maximum
*   likelihood to estimate the parameters in the regression
*
* (***)   log(Y) = INTERCEPT + BETA X + SCALE log(Exp(1))
*
* As in lr1samp.sas, the fitted parameters INTERCEPT and SCALE can be
*   translated back to the Weibull parameters alpha and lambda by
*
*     alpha=1/SCALE,   lambda=exp(-INTERCEPT)
*
* SAS's `proc lifereg' output has `Weibull Scale' = 1/lambda
*   = exp(INTERCEPT) (as opposed to `Scale') instead of lambda. Lambda
*   has the units of rate, whereas `Weibull scale' has the units of time.
*
* The covariates X in (***) take on two values for the two different
*   samples. Thus the two-sample Weibull AFT model treats `log(days)'
*   like a two-sample t-test with `group' as a class variable, except
*   that (i) errors are extreme-value distributed rather than normally
*   distributed and (ii) parameters are estimated by maximum likelihood.
*
* The two groups of rats are assumed to have the same Weibull shape
*   parameter alpha, so that testing   H_0:beta=0   is equivalent to
*   testing whether the two groups have the same Survival-time
*   distribution.
*
* The `lls' plots in proc lifetest suggest that the survival times of
*   both groups of rats have the same Weibull shape parameter alpha,
*   so that the Weibull AFT model is reasonable. We also carry out
*   a likelihood ratio test for the hypothesis that the two groups
*   have the same Weibull shape parameter. This test is nowhere near
*   being significant, which is also consistent with the same alpha
*   for the two groups.
*
* The default in `proc lifereg' is / dist=Weibull, so that this option
*   could have been left out.
*
**********************************************************;


proc lifereg;
     title2 'FITTING A WEIBULL AFT MODEL TO BOTH GROUPS TOGETHER';
     class group;
     model days*status(1) = group / dist=Weibull;
     run;

**********************************************************;
* Are the Weibull shape parameters the same between the two groups?
*   This can be determined by fitting different Weibull distributions to
*   the two groups and then carrying out a likelihood ratio test by hand
*   by comparing the maximum log likelihoods.
*
* Specifically, this can be done by (i) fitting independent Weibull models
*   within each group (`by group' below), (ii) adding the two fitted
*   maximum likelihood values for the two groups (from the lines
*   `Log Likelihood for WEIBULL' in the output), (iii) subtracting the
*   fitted maximum likelihood for the AFT Weibull model above, and
*   (iv) comparing the result with a chi-square distribution with one
*   degree of freedom.
*
* Note that the stratified model (i.e., independent Weibull distributions
*   in each group) has 4 parameters (two intercepts, two alphas) while
*   the Weibull AFT model has 3 parameters (intercept, beta, one alpha).
*   Thus the stratified model has one more parameter, so that the test
*   statistic is asymptotically chi-square with one degree of freedom.
*
* In this case, the difference (ii)-(iii) is 0.161, so that twice the log
*   likelihood difference is 0.322, which leads to P=0.5704 for a
*   chi-square distribution with one degree of freedom. In other words,
*   there is no evidence that the two groups of rats have different
*   Weibull shape parameters.
**********************************************************;

proc lifereg;
     title2 'FITTING WEIBULL DISTRIBUTIONS WITH TWO DIFFERENT SHAPE';
     title3 ' PARAMETERS TO THE TWO GROUPS SEPARATELY';
     by group;
     model days*status(1) = / dist=Weibull;
     run;


**********************************************************;
* Note that the two groups are (barely) significantly different using the
*   Weibull AFT model, whereas they were not significantly different for
*   the log rank and Wilcoxon tests (again, barely).
*
* For comparison, let's also try an exponential distribution.  Comparing
*   log likelihoods will show that a Weibull distribution (with the same
*   shape parameter for the two groups) fits the data significantly better
*   than the exponential. Note that the effect of `group' is not significant
*   in the exponential model, and that the P-value in the exponential model
*   for GROUP is almost exactly the same as that for the `-2log(LR)' test
*   in `proc lifetest'.
**********************************************************;

proc lifereg;
     class group;
     title2 'FITTING AN EXPONENTIALLY DISTRIBUTED AFT MODEL';
     model days*status(1) = group / dist=exponential;
     run;