Math 434 Takehome Final - Fall 2005

  • Click here for Math434 home page
  • Click here for Math434 homework page
  • Click here for Prof. Sawyer's home page

    TAKEHOME FINAL due on or before Thu 12-22 by 5:30 P.M.
    (Return to Prof. Sawyer or to math receptionist in Cupples I Room 100.)

    NOTE: There should be NO COLLABORATION on the takehome final,
       other than for the mechanics of using the computer.

    Open textbook and notes (including course handouts).

    In general where the results of a statistical test are asked for,

    (i) EXPLAIN CLEARLY what the hypotheses H0 is and what alternative you are testing against, (ii) find the P-value for the test indicated (and state what test you used), and (iii) state whether the results are significant (P<0.05), highly significant (P<0.01), or not significant (P >= 0.05). If the P-value is based on a Student's t or Chi-square or F distribution, also give the degrees of freedom.
    ORGANIZE YOUR WORK in the following manner: (i) your answers to all questions, (ii) all your SAS programs, and (iii) your SAS output. ADD CONSECUTIVE PAGE NUMBERS to your homework so that you can make references from part (i) to part (iii). (For example, so that you can say things like, ``The answer in part (a) is 57.75. The scatterplot for part (b) is on page #Y below.'') If you do different SAS problems at different times, it may be easiest to write page numbers yourself on the SAS output.
    Different parts of problems may not be equally weighted.

    5 problems.

    Problem 1. Walloopia is a small, apocryphal country that is famous for its pure water and mild climate. A total of 1391 Walloopians died during the previous year, amounting to a crude death rate of 1.77 per thousand. The elders of the country feel that this death rate is too high given the relatively young Walloopian population and are concerned about what this says about the Walloopian health infrastructure.

    Census data for Walloopia in the previous year, along with death rates (per individual per year) for a climatically comparable U.S. population, are given in Table 1.
         Table 1. Census Data for Walloopia
    
           Age           U.S.         Walloopian
          Range       death rate        census
         0 to 15        0.0016         212000
        15 to 30        0.0011         188000
        30 to 45        0.0013         162000
        45 to 60        0.0029         143000
        60 to 75        0.0057          83000
        ----------------------------------------
          Total:        0.0032         788,000   
    The crude death rate in the climatically matched U.S. population was 0.0032, or 3.2 per thousand, which was nearly twice the Walloopian crude death rate of 1391/788000=0.00177.

    (i) The U.S. crude death rate (3.2 per thousand) times the total Walloopian population yields 2521.6, deaths, which is considerably higher than the observed Walloopian 1391 deaths. Why is this an inappropriate comparison of the public-health institutions in the two countries?

    (ii) Using the Walloopian population distribution as the standard, what was the (Walloopian-population-standardized) death rate in the U.S. population? Was it higher or lower than the observed (crude) U.S. death rate?

    (iii) Using the climatically comparable U.S. population as the standard, what was the (US-population-standardized) death rate in Walloopia during that year? Was it higher or lower than the observed crude death rate of 1.77 per thousand? Assuming that the health infrastructure is comparable, do the pure water and mild climate appear to help or hurt the Walloopians? Why?

    (iv) Which population-standardization method, direct or indirect standardization, did you use in part (ii)?  in part (iii)?
    
    

    Problem 2. Disease remission times for 40 patients, some of whom were treated and some of whom were not treated, are given in Table 2. A trailing + in Table 2 means a right-censored value. (For example, if a patient withdrew from the study at that time or died due to unrelated causes.)

      Table 2. Remission times for two groups.
    
      Control Group (not treated)
        14  43  45  52  67  83  111  145  169  175  196  225  103+
        108+  113+  158+  164+
      Treated Group
        20  24  25  25  30  31  41  42  42  45  45  68  70  75  75
        91  107  131  9+  50+  62+  63+  148+
        

    (i) Is there a significant difference in survival time between the two groups, as determined by a Cox regression? A highly significant difference? What is the P-value?

    (ii) What is the relative risk of the Control group in comparison with the Treatment group? Is it greater than one or smaller than one?

    (iii) In general, if the relative risk of one group with respect to a second group is greater than one, does this mean that a typical subject from the first group will tend to live longer than a typical subject from the second group, or that the subject will tend to die sooner? (Be careful!)
    
    

    Problem 3. Survival times in days are given in Table 3 below for patients who had been diagnosed with a particular disease and had either been given a particular treatment (Treat=1) or no treatment (Treat=0). Measurements for morphness (Morph), spatility (Spat), and hypochronicity (Hypo) were also recorded at the time of diagnosis and are given in Table 3.

    The columns in Table 3 are (i) a subject number,  (ii) survival time in days,  (iii) censoring status,  (iv) treatment state (1 if Treated, 0 if Control),  and values for (v) Morphness,  (vi) Spatility, and  (vii) Hypochronicity.
       Table 3:  Survival times in days in terms of treatment status,
            morphness, and two other variables.
       (Status: 1 if censored, 0 if observed.)
       (Treatment: Treat=1 if treated, Treat=0 if not treated.)
    
     Subj  Time Status  Treat  Morph  Spat  Hypo
        1.   35  0        1     496    62   279
        2.   60  0        1     838    24   179
        3.   96  0        1     740    72   252
        4.  114  0        1     511   106   165
        5.  165  0        1     982   112   160
        6.  173  0        1     607   127   257
        7.  178  0        1    1021   115   226
        8.  182  0        0     745    21   239
        9.  185  0        1     531    76   148
       10.  220  0        0     569    47   192
       11.  240  0        1     368    93   117
       12.  254  0        0    1013    54   145
       13.  262  0        1     588    63   210
       14.  275  0        0     881    52   144
       15.  314  0        0     902    86   236
       16.  339  0        1     842    56   201
       17.  385  0        1     947    28    51
       18.  394  0        0     994    85   221
       19.  425  0        0     822    77   194
       20.  474  0        1     926    23   104
       21.  484  0        1    1238    31   181
       22.  595  0        0    1469    48   169
       23.  605  0        1    1239    67   146
       24.  638  0        1    1321    40   226
       25.  732  0        0    1025    89   220
       26.  782  0        1    1168    99   155
       27.  884  0        0     650    99   114
       28.   38  1        0    1171    49   235
       29.   75  1        0     436    74   176
       30.  165  1        1     543   100   141
       31.  179  1        0     522    68   179
       32.  219  1        1     893   103    90
       33.  321  1        0     906   112   269
       34.  493  1        0    1197    48   182
       35.  539  1        1    1011    75   173
           

    (i) Analyze the data in the table using the Cox PH regression model. Is there an overall significant effect of the four covariates together? What is the P-value? Which version of the model test did you use?

    (ii) Which of the four variables (Treatment status, Morphness, Spatility, and Hypochronicity) individually have a significant effect on survival time? Which have a highly significant effect? For those with significant P-values, what are the P-values? For each variable with a significant effect, does increasing the value of that variable imply a higher death rate or a lower death rate?

    (iii) Suppose that it is known that the average morphness level in the general population is 1000. Suppose that a given patient has a Morphness level of 1500. What is her estimated increased or decreased survival rate or risk due to her increase morphness level? Is she under increased or decreased risk due to her increased morphness?
    
    

    Problem 4.. Samples of two groups were followed over 17 years. The numbers of deaths and censoring events (that is, individuals who were last seen at that time) over the 17 years are recorded in Table 4. All individuals in Groups O and X are accounted for in Table 4, so that the last 8 individuals in the combined dataset were recorded as censored in Year 17.

       Table 4:  Survival times in years for two groups.
    
                   Group O               Group X
       Year    deaths    censored    deaths     censored
         1        21        0         114         0     
         2         8        2          57        10     
         3         7        2          38         6     
         4         6        2          43         6     
         5         7        2          34         6     
         6         6        7          31        27     
         7         5        8          21        33     
         8         3        8          19        26     
         9         4        6          13        17     
        10         3        7          11        16     
        11         1        6          11        11     
        12         1        6           9        13     
        13         1        5           5         8     
        14         1        2           2         7     
        15         1        2           2         6     
        16         1        3           0         0     
        17         0        0           0         8     
     

    (i) Using a Cox regression model, is there a significant differences between the lifetimes of the two groups? What is the P-value? What is the estimated relative hazard rate of Group X with respect to Group O? Is Group X at greater hazard than Group O, or vice versa? Use the default Breslow tie-handling method for the ties in the data.
    (Hints: See ltangina.sas on the Math434 Web site for clues about how to read tabled data of this form into a useful SAS dataset. If num is the name of your variable for the counts in Table 4, DON'T FORGET to include freq num in SAS procedures that need to know that your data set is describing groups of individuals and not individual records. See Section 12.1 in the text for a discussion of tie-correction methods. (See also phresid.sas on the Math434 web site.) )

    (ii) Since the ties in Table 4 result from individuals dying at different times of the year and then being grouped by year, the ``exact'' tie-correction method should be more accurate than the Breslow method in this case. Redo the analysis in part (i) using the exact tie-correction method instead of the Breslow method. How do your results change? What is the estimated hazard rate using the more accurate ``exact'' method? (See the hints in part (i). )

    (iii) Does Group (that is, Group X or Group O) have a time-dependent effect on mortality in Table 4? Test for a time-dependent effect of Group using either the Breslow or the exact tie-correction method. Recall that by ``time-dependent'' we mean an effect that is the same for both individuals and their risk sets at any particular time and not the result of a covariate that can be attached to records. (Hint: See comments about time-dependent variables in ph2samp.sas and in other example SAS datasets on the Math434 Web site.)
    
    

    Problem 5.. Forty (40) subjects were recruited for a study of the effectiveness of a particular treatment. Remission times for the subjects were recorded over a period of 90 days with all surviving subjects recorded as censored on day 91. It is known that remission is also strongly affected by a variable called X that can vary over time.

    The value of X was recorded for each subject initially (X=X0), at day 30 (X=X30), and also at day 60 (X=X60). It is assumed that X0 is a good approximation for X for days 0 to 29, that X30 can be used for days 30 to 59, and that X60 is a good approximation for days 60 to 90. The data from the study are given in Table 5.
     Table 5. Remission times in terms of Sex, Treatment status, and
        values of X initially (X0), at 30 days (X30), and at 60 days (X60).
     (Status: 1 if censored, 0 if observed.)
     (Treatment status: Treat=1 if treated, Treat=0 if not treated.)
    
     Subj Time/Status  Sex Treat    X0   X30   X60
       1.     1  0      1   0       41    33    12
       2.     2  0      1   0       42    37    30
       3.     4  0      0   1       10    44    42
       4.     5  0      1   0       24    29    19
       5.    10  0      1   1       17    37    36
       6.    24  0      0   0       33    31     7
       7.    26  0      1   0       26    18    32
       8.    29  0      1   1       28    32     9
       9.    31  0      1   1       13    42     7
      10.    32  0      1   1        8    40    20
      11.    32  0      1   0       22    35    36
      12.    36  0      1   0       14    11    45
      13.    38  0      1   0       40    38    32
      14.    44  0      0   0       31    20    44
      15.    50  0      1   0       21    40    29
      16.    54  0      0   1       33    43    28
      17.    59  0      1   1       11    43    39
      18.    61  0      1   1       15    31    45
      19.    66  0      1   0       10    35    23
      20.    67  0      0   0        6     6    40
      21.    67  0      0   0       21    24    34
      22.    68  0      1   1        7    25    45
      23.    68  0      0   1       19    32    42
      24.    69  0      0   1       21    23    40
      25.    70  0      0   0        5    26     8
      26.    74  0      0   1       37    29    20
      27.    91  1      1   0        9    16    11
      28.    91  1      0   1       10    33    27
      29.    91  1      0   1       11    21    32
      30.    91  1      0   0       25    25    20
      31.    91  1      0   1       27    21    18
      32.    91  1      0   0       38     7    12
      33.    91  1      1   1       43    16    42
      34.    91  1      0   1       44    24    35
      35.    91  1      0   1       44    42     6 

    (i) Using the appropriate model to analyze the data, is there a significant effect for Sex, Treatment, and X together? Which of the three covariates have a significant effect? Which have a highly significant effect? For each variable that has a significant effect, do larger values of this variable tend to increase or decrease the time to remission?
    (Hint: See comments in the sample programs ph2samp.sas and phresid.sas on the Math434 Web site for remarks about modeling time-dependent variables.)

    (ii) Does Treatment have a significant affect? If so, what is the relative risk of NOT being treated?
    
    

  • Top of this page