Math 475 Homework 2 - Fall 2005

  • Click here for Math475 home page
  • Click here for Math475 homework page
  • Click here for Prof. Sawyer's home page

    HOMEWORK #2 due 10-4

    Text references are to Cody & Smith,

    ``Applied statistics and the SAS programming language''

    Organize your homework in the following manner:
    (i) your answers to all questions,
    (ii) all of your SAS programs, and
    (iii) all of the SAS output that you got.
    Add page numbers to your homework so that you can make references from part (i) to part (iii): for example, so that you can say things like, ``The answer in part (a) is 17. The scatterplot for part (b) is on page #Y below.'' Include your SAS output even if you don't refer to it explicitly. Except for forward references like these, a grader should not have to look beyond part (i) of your homework unless he or she thinks that you have done something wrong.
    Include your name in a title statement so that your name will appear at the top of each output page.
    If a problem asks you to do a statistical test, EXPLAIN CLEARLY what the null hypothesis H_0 is, what test you used, what the P-value is, and whether the data is significant, highly significant, or neither. Include this as part of your answer in part (i).
    (See also the main page on the Math475 Web page.)

    1.   Twenty five (25) individuals volunteered for a study. Confidential identifiers for the 25 individuals are given in the following table.
             Table 1. Confidential identifiers for 25 volunteers
                A11     B33     C22     D61     E88
                F07     G21     H91     I37     J19
                K90     L30     M98     N48	    O11
                P77     Q07     R54     S18	    T31
                U45     V11     W71     X76	    Y32
     
    (i) Write a SAS program to randomly assign these 25 individuals to a treatment group with m=17 individuals and a control group with n=8 individuals. In your SAS program, format the 25 identifiers in a datalines; block exactly as they appear in the table above (WITHOUT the ``Table 1'' line). Write the (SAS) data step so that these identifiers appear in a single column in the output data set. (Hint: See randtwosamp.sas.)

    (ii) Which 8 individuals did you (or your program) assign to the treatment group? List their confidential identifiers in alphabetical order.
    
    

    2.   (Similar to Problem #3-7 in the text) Some summary statistics for the occurrence of asthma and SES (socioeconomic class) are
                   Asthma       Yes       No
                   -------------------------
                   LowSES        39      101
                   HighSES       29      137
     
    Create a SAS data set from these data and test the hypothesis of independence of rows and columns. Make sure that the 2x2 table appears in your output with the same row and column order as above. For this table, what is the Pearson chi-square P-value? The P-value for the two-sided Fisher exact test? On the basis of these data, do you accept or reject the hypothesis that there is no association between SES and Asthma?
    
    

    3.   A total of 2000 observations are made of individuals that can have any of three different levels of Zubricity (A,B,C) and any of four different levels of Income. The counts are
                               Income
                               1     2     3     4
                        A     66    98   127   180
         Zubricity      B    111   136   170   228
                        C    168   193   240   283
     
    (i) Is there an association between Zubricity and Income in this table? Have SAS do the Pearson chi-square test on the 3 by 4 table to find out. What is the degrees of freedom? What is the P-value?

    (ii) Have SAS also compute the Mantel-Haenszel (trend) chi-square test (for a trend). What is its number of degrees of freedom? Why is the P-value different? What is this test designed to detect? That is, what alternative H_1 should one conclude if the P-value is significant?
    
    

    4.   Suppose that the same treatment as in Problem 1 is given to patients suffering from four different but related diseases, which are labeled as Dis#A, Dis#B, Dis#C, and Dis#D. The numbers of individuals surviving for or dying within six months were collected in the following table.
        Table 2. Morbidity results for four diseases
                     Dis#A          Dis#B          Dis#C          Dis#D   
                   Surv  Die      Surv  Die      Surv  Die	    Surv  Die
        Treated     250  107       390  702       218  141       317  757
        Control     454  240       173  390       488  436       113  348
     
    Note that Dis#B and Dis#D appear to be more severe than the others, although all four diseases have high mortality rates in both treatment groups.
    (i) Does the treatment have an overall positive or negative effect on mortality over the four strata? Carry out a test that gives you a single P-value and that is not subject to Simpson's Paradox. (For example, the Mantel-Haenszel (strata) test.) Do you accept or reject the hypothesis that treatment has no effect on survival? Do you get the same results for each of the diseases separately?
    (ii) Is the effect of the treatment positive or negative? That is, do relatively more treated individuals survive than control individuals? (Hint: Consider the phi coefficient for each disease.) Would you recommend that this treatment be given for individuals with these conditions, assuming that no other treatment was available? Would your recommendation depend on which of the four conditions?
    (iii) What is the P-value for the Breslow-Day test in the output? Does this suggest that an instance of Simpson's Paradox might ensue if the counts for the three diseases are combined into one table?
    (iv) Combine the diseases into one 2x2 table. What is the Pearson Chi-Square P-value for this possibly-incorrect table? Is this consistent with your answer to part (i)? What is the phi coefficient for the combined table? Is it consistent with your results in part (ii)? In the combined table, do relatively more treated individuals survive than control individuals?
    
    

    5.   A test is made of the effects of a new drug on people who are occasional sufferers from a newly discovered allergy that affects people only during the winter. Eighty (80) people are enrolled in the study. Forty (40) subjects are first asked if they had allergic symptoms during a particular year, then given the drug, and then asked again if they had allergic symptoms after the following year. The other half (40) are given the drug the first year but not the second year and, again, asked if had allergic symptoms with and without the drug. Thus, there are two Yes-or-No responses from each enrollee, and, in particular, 8 individuals had no symptoms with the drug but did have symptoms without the drug. The experimenters state that this experimental design helps to control for variable severity of the allergy among the subjects. The results were
               Numbers of individuals with allergic symptoms
                 with and without a drug over two seasons
               
                               Without Drug
                                Yes     No     Totals
                          Yes    11     22       33
               With Drug       
                          No      8     39       47
               ------------------------------------
               Totals            19     61       80
     
    (i) On the basis of these data, does the drug tend to change significantly the incidence of allergy in vulnerable individuals?
    (ii) If the drug has an effect, would you recommend the drug to someone who suffers from this allergy? That is, does the drug help or hurt?
    (Warning: Although the data is in the form of a 2x2 contigency table, the Pearson chi-square test may not be appropriate. For example, a large number of (Yes,Yes) counts may simply mean that these particular individuals would have allergic symptoms no matter what. Similarly, a large number of (No,No) counts might be due to a subset of the sample who are almost never affected. Thus all of usable information in the table is in the (Yes,No) and (No,Yes) counts. Before using either the Pearson or Fisher exact tests, read about McNemar's test in the text.)

  • Top of this page