HOMEWORK #2 due Tuesday 10-12
Six problems.
Text references are to the textbook, Cody & Smith, ``Applied statistics and the SAS programming language''
NOTE: See the main Math475 Web page for how to organize a homework
assignment using SAS. In particular,
ALWAYS INCLUDE YOUR NAME in a title statement in your SAS
programs, so that your name will appear at the top of each output page.
ALL HOMEWORKS MUST BE ORGANIZED in the following order:
(Part 1) First, your answers to all the problems in the homework,
whether you use SAS for that problem or not. If the problem asks you to
generate a graph or table, refer to the graph or table by page number in
the SAS output (see below). (Xeroxing a page or two from the SAS output or
cutting and pasting into a Word file or TeX source file is also OK.)
(Part 2) Second, all SAS programs that you used to obtain the output for
any of the problems. If possible, similar problems should be done with the
same SAS program. (In other words, write one SAS program for several
problems if that makes things easier, using Better yet would be one SAS
title or title2 statements to separate the problems in
your output.)
(Part 3) Third, all output for all the SAS programs in the previous
step.
If an answer in Part 1 requires a table or a scatterplot that you need to
refer to, make sure that your SAS output has overall increasing (unique)
page numbers and make references to Part 3 by page number, such as
``The scatterplot for Problem 2 part (b) is on page #X in
the SAS output below.'' DO NOT say, ``see Page 3 in the SAS output''
if Part 3 has output from several SAS runs, each of which has its own
Page 3. In that case, either write your own (increasing) page numbers
on the SAS output, or else (for example) refer to ``Page 2-7 in the
SAS output'' (for page 7 in the second set of SAS output) and write
page numbers in the format ``2-7'' at the top of pages in your output.
Table 1. Numbers of individuals with allergic symptoms
with and without a drug over two seasons
Without Drug
Yes No Totals
Yes 11 22 33
With Drug
No 8 39 47
------------------------------------
Totals 19 61 80
Table 2. Morbidity results for four diseases
Dis#A Dis#B Dis#C Dis#D
Surv Die Surv Die Surv Die Surv Die
Treated 250 107 390 702 218 141 317 757
Control 454 240 173 390 488 436 113 348
Note that Dis#B and Dis#D appear to be more severe than the others,
although all four diseases have high mortality rates in both treatment
groups.
(i) Does the treatment have a significant overall positive or negative effect on mortality over the four strata? Carry out a test that gives you a single P-value for all four tables and that is not subject to Simpson's Paradox. Do you accept or reject the hypothesis that treatment has no effect on survival? Do you get the same results for each of the diseases separately?
(ii) Is the effect of the treatment positive or negative? That is, do relatively more treated individuals survive than control individuals? (Hint: Consider the phi coefficient for each disease.)
(iii) Combine the diseases into one 2x2 table. What is the Pearson Chi-Square P-value for this possibly-incorrect table? Is this consistent with your answer to part (i)? What is the phi coefficient for the combined table? Is it consistent with your results in part (ii)? In the combined table, do relatively more treated individuals survive than control individuals, or vice versa?
Table 3. Output in tons per acre in test plots
for five different levels of an insecticide
Level1 79 79 95 109 118 150
Level2 84 95 100 105 119 135
Level3 109 114 121 123 124 145
Level4 91 106 119 150 151 151
Level5 110 113 129 131 145 165
yy and
stress) for each of 16 gnus under various conditions of
stress are given in the following table. (In each of the 16 pairs of data
in Table 4, yy is the first variable and
stress the second variable.)
Table 4. Blood pressure and stress for 16 gnus
47 3.0 50 1.8 110 7.9 1655 15.7
179 9.1 55 5.2 1310 12.9 2773 15.1
56 3.6 62 2.9 3052 16.8 126 7.2
866 12.6 175 8.6 2731 16.7 249 9.0
yy on
stress with this data? What P-value does SAS report? What
is the model R2 ?
yy versus
stress. Include the predicted values on the same plot with
plot symbol P as a comparison. Does the plot of
yy versus stress look linear? How well does it
follow the predicted values? (Hint: It might look slightly bowed
down in the middle.)
yy on stress against stress. Do
the residuals look consistent with the assumptions of a linear
regression? Do their signs and absolute values appear to be randomly
distributed with respect to stress? (Hint: The
negative residuals may be bunched together in the center.)
yy on both stress and
stress*stress. (Hint: Introduce a new SAS variable
stress2 for stress*stress.) What is the new
model R2 ? In a plot of yy on
stress, do the predicted values appear to match
yy more closely? Do the residuals have a more
random-looking plot on stress? (Hint: Observations
with higher values of stress may also have larger
residuals.)
logyy=log(yy) on
stress and stress*stress. What is the new
model R2 ? Do the predicted values of
logyy appear to match the observed values more closely?
Does the residual plot show less dependence on stress?
Table 5: Zubricity and Covariates
-----------------------------------
OBS Zubric Drubn Visc Speed
-----------------------------------
1 310 16 27 12
2 210 17 36 10
3 450 24 40 20
4 390 24 44 15
5 780 26 44 8
6 330 28 53 18
7 580 39 55 19
8 330 22 56 24
9 400 29 57 16
10 230 28 58 17
11 470 34 60 24
12 510 35 61 17
13 490 37 66 20
14 450 36 68 11
15 630 46 73 21
16 400 38 78 6
17 760 34 80 22
18 590 47 83 17
19 520 43 84 12
20 540 44 89 17
proc reg or proc glm) to find out.
What is the model P-value? What is the model R2? What is the
value of the F-statistic that led to the model P-value? How many degrees
of freedom does it have in its numerator and denominator? How did SAS
arrive at these numbers?
proc reg or else one run of proc
glm plus a proc print for associated variables.)
plot statements within a
proc reg procedure, enter a ``paint'' command like (for
example) paint obs=17 / symbol='X'; BEFORE the
plot statement, where obs stands for the ordinal
value of the point (that is, the row number or OBS value in
the data set), or
proc plot, enter the
plot statement as (for example) plot Y*X $ obs;
or plot Y*X='*' $ obs;. The $
obs option causes the ordinal value to displayed next to each
plotted point.
proc iml. (Hint: See
ThreeRegIml.sas on the Math475 Web site.)