*************************************************************; * A case-control study with R=3 controls per case ; * * We assume that we want to find out covariates that are * associated with a particular condition and how these * variables affect the risk of having this condition. With * some effort we locate N=50 individuals who have this * condition (the `Cases'), but these individuals are very * heterogeneous with respect to age, income, and general * health, none of which are associated with the condition. * * We are interested in three covariates, Hist (family history), * Gender, and Wunk (the level of a blood protein). Since it * is easier to find nonaffected than affected individuals, we * locate R=3 nonaffected individuals (the `Controls') for each * Case that are matched with the Case by age, income, and * general health. Each matched group of R+1 individuals forms * a STRATUM (plural STRATA), which is the Latin word for layer, * in the terminology of the handouts PHstrata and PHstrata2. * * This model is exactly the same as the model in LGCaseCtrl, * except that we have R=3 Controls per Case instead of one * Control for each Case. * * However, the basic statistical ideas are the same. We assume * a logistic regression model * * Prob(Affected | X) = g(mu+beta*X), g(y)=e^y/(1+e^y) (1) * * for being affected. In this case, we assume that mu can * depend on the variables that we are controlling for, namely * age, income, and general helath. We assume a priori that, for * each group of R+1 Case-Control individuals, exactly one of the * R+1 individuals are affected. We then write * * L(beta) = Prod_Strata Prob(Case is Affected | * Exactly one of R+1 is Affected) * * Given a stratum of R+1 individuals, the product of the * probabilities (1) that the j-th is affected (0 <= j <= R) * and that the remaining R are not affected is * * g(mu+beta*X_j) Prod(k ne j) (1-g(mu+beta*X_k) * * = exp(mu+beta*X_j)/(Prod_k=0^R (1+exp(mu+beta*X_k))) * * The probability that EXACTLY ONE is affected is the sum of * these R+1 probabilities, which is * * Sum(j=0,R) exp(mu+beta*X_j)/(Prod_k=0^R (1+exp(mu+beta*X_k))) * * As in LGCaseCtrl, it follows that the CONDITIONAL PROBABILITY * that the Case (j=0) is affected, given that EXACTLY ONE of * the group of R+1 is affected, is * * exp(mu+beta*X_0)/(Sum(j=0,R) exp(mu+beta*X_j)) * * = exp(beta*X_0)/(Sum(j=0,R) exp(beta*X_j)) (2) * * The expression (2) is the same as a typical factor in a Cox * regression likelihood, which should not be surprising since * the Cox likelihood is derived from a conditional logistic * regression. Note that the constant mu in (1) that depended * on the strata has canceled out. * * It follows from (2) that the likelihood as a function of beta * for N different Case-Control (R+1)-tuples is * * Prod(i=1,N) exp(beta*X_i0)/(Sum(j=0,R) exp(beta*X_ij)) (3) * * where i refers to the i-th Case-Control group. * * In order to estimate the coefficients beta in (3) and test * whether they are significant, we need to define a Cox * regression model that has (3) as its likelihood. * * The first step is to notice that the individual factors in (3) * involve nonoverlapping groups of individuals. That is, the * risk-set sums in (3) for different i never contain the same * individuals. This suggests a stratified Cox regression, * with each (R+1)-tuple as a stratum. * * The second step is to notice that each within-stratum Cox * likelihood has exactly one factor. That means that there * is only one failure within each stratum, and that all the * remaining individuals in that stratum are censored either * at or after that failure time. * * We set up a Cox regression model as follows. We set a time * Time=1 for failure or censoring times for all individuals. * Cases will have an observed failure at Time=1, while * Controls will be censored. Finally, each Case-Control group * of R+1 individuals will have its own stratum. * * Under these assumptions, the (stratified) Cox likelihood is * exactly (3), and analyzing the data as a Cox regression will * give us the appropriate statistical analysis. *************************************************************; title 'A CASE-CONTROL STUDY with R=3 CONTROLS PER CASE'; title2 'Each row has one case (Case=1) followed by 3 controls (Case=0)'; options ls=75 ps=60 pageno=1 nocenter; data cases; Time=1; * For all individuals; * group=1 to 50 distinguishes the Case-Control 4-tuples ; input group hist gender wunk @@; case=1; output; input hist gender wunk @@; case=0; output; input hist gender wunk @@; case=0; output; input hist gender wunk; case=0; output; datalines; 1 1 1 77 0 0 79 0 0 80 1 0 78 2 1 0 81 1 0 81 0 1 78 0 0 77 3 1 1 70 0 1 69 0 0 88 1 1 78 4 1 0 86 0 0 80 0 0 79 0 0 74 5 0 0 80 1 1 91 0 0 81 1 0 73 6 1 0 77 0 0 77 0 1 79 0 0 82 7 0 1 82 1 1 84 1 0 83 1 0 86 8 1 1 87 1 0 99 0 0 77 0 0 83 9 1 1 85 0 1 72 1 0 86 1 0 80 10 1 1 78 0 1 76 0 1 76 1 0 78 11 1 1 82 1 1 89 1 1 81 0 0 83 12 0 1 80 0 0 84 0 1 82 1 1 85 13 1 0 80 1 0 75 0 0 77 1 0 81 14 0 1 77 0 0 91 0 0 80 0 1 81 15 0 1 82 1 1 79 0 0 81 1 0 76 16 1 0 78 1 0 75 0 1 85 0 0 81 17 1 1 87 0 1 81 1 1 82 1 0 77 18 0 0 78 0 0 77 1 0 74 0 0 81 19 1 1 75 0 0 65 0 1 80 0 1 79 20 1 0 85 0 0 84 1 0 80 0 1 86 21 0 1 83 1 1 86 1 0 78 0 1 81 22 1 0 73 0 0 80 1 0 91 0 0 90 23 0 1 76 0 0 78 1 0 85 0 0 81 24 0 1 82 1 0 76 1 0 84 1 0 81 25 0 1 85 0 1 89 0 0 88 1 1 80 26 1 1 82 1 0 77 1 1 77 0 1 75 27 1 1 79 1 0 84 0 1 83 0 0 83 28 1 1 82 1 1 81 0 0 83 1 0 68 29 0 1 69 0 0 86 1 0 71 1 0 81 30 1 1 82 0 0 74 0 0 80 0 0 70 31 0 1 77 1 1 84 0 1 75 1 1 81 32 1 1 76 0 0 72 1 1 86 0 0 72 33 1 1 78 0 0 83 1 0 80 1 1 71 34 0 0 82 0 1 77 0 0 82 0 0 77 35 1 0 71 1 1 75 0 1 77 0 0 73 36 1 1 79 0 0 70 1 0 76 0 1 77 37 1 0 70 0 0 80 0 1 77 0 0 87 38 0 1 70 0 1 80 0 0 68 0 0 78 39 1 1 83 1 1 82 1 1 74 1 1 88 40 1 1 82 1 1 75 1 1 78 1 1 85 41 0 0 76 1 0 83 1 0 81 0 0 76 42 0 1 80 0 0 76 1 1 78 1 0 75 43 1 0 83 0 0 84 1 1 77 0 0 84 44 0 1 70 0 1 74 0 0 68 1 1 75 45 0 0 78 1 0 68 1 0 83 0 0 80 46 0 0 77 1 0 72 1 0 79 0 1 79 47 1 1 71 0 0 76 0 0 74 0 0 74 48 1 0 81 1 1 80 0 1 81 1 1 81 49 1 1 79 1 0 78 1 1 72 0 0 82 50 0 1 82 0 0 75 0 1 78 1 1 85 run; proc means mean stddev; title3 'COVARIATE AVERAGES BY CASE STATUS'; class case; var hist gender wunk; run; proc phreg nosummary; title3 'ANALYSIS BY PROC PHREG'; title4 "`OBSERVED DEATHS' AT CASES, STRATIFIED BY SUBJECT"; strata group; model Time*case(0) = hist gender wunk; run; proc phreg nosummary; title4 "GENDER AND HISTORY ONLY"; strata group; model Time*case(0) = hist gender; run;