***********************************************************; * Given a sample, does it fit a Weibull better than an exponential * distribution? If so, how can you estimate the Weibull rate and * shape parameters? * * The following is a sample of 40 values, of which 11 are censored, with * the censored values entered as negative values. Note that the number * of large values seems excessive for an exponential distribution. * ***********************************************************; title 'TEST FOR EXPONENTIAL within Weibulls - YOURNAME'; options ls=75 ps=60 pageno=1 nocenter; data lr1samp; input tt @@; * State=1 if censored, status=0 if observed; days=abs(tt); status=(tt<0); drop tt; datalines; 6 11 17 42 43 52 75 82 144 148 168 207 212 279 -388 416 443 552 600 600 629 -655 708 743 864 -873 1061 -1225 1257 -1659 -2137 -2188 -2441 -2591 2842 2867 -2880 3509 -4864 5090 ; proc print; title2 'The data as SAS sees it'; run; ***********************************************************; * Draw some plots: Recall * * The distribution is exponential if and only if the `ls' plot is a * straight line through the origin. The slope of the line is the rate * parameter. This is because the survival function S(t) = exp(-la t) * if and only if -log S(t) = la t , where la=lambda. * * The distribution is Weibull if and only if the `lls' plot is a straight * line that does not necessarily go through the origin. This is because * S(t) = exp(-(la t)^alpha) holds iff -log S(t) = (la t)^alpha, or iff * * log(-log S(t)) = alpha log(la) + alpha log t. * * The slope of the line in the LLS plot is the Weibull shape parameter * alpha and the intercept is alpha log(la). * * The option `notable' suppresses the spreadsheet used for calculating the * Kaplan-Meier survival curve. ***********************************************************; * Change the page height for nicer lineprinter plots (on ArtSci SAS); options ps=35; proc lifetest plots=(s,ls,lls) notable lineprinter; title2 'S, LS, AND LLS PLOTS'; time days*status(1); run; * Restore the page height; options ps=60; ***********************************************************; * Note that the `ls' plot is NOT linear but the `lls' plot looks * approximately linear. This suggests graphically that the data is * Weibull but not exponential. The slope of the graph gives an estimate * of the Weibull shape parameter alpha. * * You can use `proc lifereg' to estimate the actual Weibull rate and shape * parameters (lambda and alpha) using maximum likelihood. `Proc lifereg' * also carries out a statistical test for H_0:Data is exponential versus * H_1: Data is Weibull but not exponential. (That is, H_0:alpha=1 versus * H_0:alpha ne 1 .) * * The survival function for a Weibull distribution is * * Prob(Y>=t) = S(t) = exp(-(la t)^alpha ) * * This is equivalent to * * Prob( (la Y)^alpha >= (la t)^alpha) = exp(-(la t)^alpha ) * * or if y = (la t)^alpha * * Prob( (la Y)^alpha >= y) = exp(-y) * * This means that that Y is Weibull with parameters la=lambda, alpha if and * only if (la Y)^alpha has a mean-one exponential distribution. This can * be abbreviated as (la Y)^alpha = Exp(1), which implies * * (*) log Y = log(1/la) + (1/alpha)log(Exp(1)) * * Here * * P(log(Exp(1))>=y) = Prob(Exp(1)>=e^y) = exp(-e^y) * * which is called an ``extreme-value distribution''. In particular, if Y is * Weibull with parameters la=lambda and alpha, then (*) holds where * log(Exp(1)) has a distribution that is independent of la and alpha. * * SAS estimates Weibull parameters as if the displayed equation (*) was a * regression of the logarithms of the observed values on `extreme-value' * noise. Instead of (*), SAS uses the parametrization * * (**) log(Weibull) = INTERCEPT + SCALE log(Exp(1)) * * and finds the maximum likelihood estimators of INTERCEPT and SCALE . * To convert SAS's output back to standard Weibull parameters, use * * alpha=1/SCALE and lambda=exp(-INTERCEPT) * * If Y has a Weibull distribution with parameters la=lambda and alpha, then * alpha=1 is equivalent to Y being exponential. Since the Weibull * parameters are estimated by maximum likelihood, you can use * likelihood-ratio (LR) tests. There are two LR tests that can be done: * * (1) `Proc lifreg' output with option / dist=exponential carries out an * LR test with a quadratic approximation, which SAS calls the * `Lagrange Multiplier Chisquare Test for Scale' (LMCTS). This is in the * SAS output for `proc lifereg'. This is an approximate chi-square test * for H_0:alpha=1 with one degree of freedom. * * (2) You can run `proc lifereg' with both /dist=exponential and * /dist=Weibull. Both outputs list the maximum likelihood value at the * fitted parameter values. If H_0:alpha=1 is true, then twice the * difference in log likelihoods will have approximately have a chi-square * distribution with one degree of freedom. SAS doesn't do this test, but * you can carry it out yourself. * * The fact that SAS has the LMCTS in its output rather than the usual LR * test is an indication that SAS prefers the LMCTS test in this case. * ***********************************************************; proc lifereg; title2 'USE PROC LIFEREG TO FIT A WEIBULL DISTRIBUTION'; model days*status(1) = / dist=Weibull; run; proc lifereg; title2 'NOW FIT AN EXPONENTIAL DISTRIBUTION'; title3 'Not only is there a P-value for H_0:Exponential, but you can'; title4 ' use the difference in log likelihoods to compute the P-value'; title5 ' of the standard nested LR hypothesis test'; model days*status(1) = / dist=exponential; run;