Homework #12, Math 320, Spring 2001
Name:____________________________
Section:____
Math 320 Homework #12 --- Due 4/23
Include your name, section number, and homework number on every page that
you hand in. Enter ``Section 1'' for the morning class (10-11AM) and
``Section 2'' for Professor Sawyer's class (12-1PM).
Begin the exposition of your work on this page. If more room is needed,
continue on sheets of paper of exactly the same size (8.5 x 11 inches),
lined or not as you wish, but not torn from a spiral notebook. You should
do your initial work and calculations on a separate sheet of paper before
you write up the results to hand in.
Output from Excel must have your name and the homework number in
cell A1.
1. (15 points) Do exercise 12.6 on page 538, parts (a)-(e).
Most of the parts of this question can be answered from the
computer output on page 539, which is output from the statistical
program ``Minitab''. Note that Minitab displays the regression
line explicitly. The regression coefficients can also be found in the
``Coeff'' column in the ``Coefficient Table'', which is where you
usually have to find them.
``s = xxxx
'' in the Minitab output is the estimate of
the standard deviation of the residuals or regression errors. Note that
it is the square root of the MSError (or MSResidual) entry in the ANOVA
table.
2. (15 points) Course grades at Midwestern Normal University are
determined by 11 weekly exams. A student had good grades in 10 of the 11
exams in one of his courses, but missed the 8th exam for
health reasons. His grades and the class averages for the exams are
Exam: 1 2 3 4 5 6 7 8 9 10 11
---------------------------------------------------------------
Student: 35 69 92 72 94 64 59 .. 87 75 80
ClassAv: 32 43 100 54 99 43 57 88 71 65 65
---------------------------------------------------------------
The student's average for the 10 exams that he took is 73. The course
instructor does not feel that it would be fair to give the student that
grade for Exam 8, since it is below the class average for that exam
and most (but not all) of the student's other grades were above the
class average. The class average of 88 on that exam would be unfair for
the same reason. Instead, the instructor finds the regression line
Y = mu + (beta)X
for Student exam grades (Yi) on
the class average grades (Xi) for the 10 exams that the
student took. The instructor then assigns the grade Y = Y(88) = mu +
(beta)88 for the missing examination.
(i) Find the regression coefficients mu and beta and find the grade
Y(88) that was assigned. Is it more or less than the student's average
score of 73 on the other exams?
(ii) How accurate is the value Y(88) as an estimate of the expected
grade that the student should achieve when the class average is 88? Find
a 95% confidence interval for this mean value.
(Hints: The easiest way to do this problem is to enter the 10
pairs of values into a TI-83 using the functions
LinReg(ax+b)
on the STAT
then CALC
menu or else
LinRegTTest
on the STAT
then
TESTS
menu. You can also use
Excel.
Alternatively, all of the formulas that you need can also be evaluated
directly from the sums Sum(Yi)=727,
Sum(Y2i)=55641, Sum(Xi)=629,
Sum(X2i)=44179, and
Sum(XiYi)=48848 for the 10 exams that the student
took. )
Use Excel to do the following two problems:
3. (15 points) A study is done to test air quality as a function of
three possible pollutants CO2, NOx, and
SOx. Data gathered for an air quality index on a typical
Summer day in 20 cities are
Air Qual CO2 NOx SOx
197 61 22 17
191 33 16 27
224 65 18 18
183 55 18 32
236 50 26 23
200 60 24 26
226 59 16 25
164 54 22 24
100 83 10 17
285 50 21 29
207 73 22 21
336 61 26 35
299 42 36 21
192 44 25 4
264 54 20 29
244 24 17 27
227 53 22 42
216 41 18 30
263 57 29 24
265 87 23 25
(i) Use Excel to run a regression of Air Quality (Y) on the three
pollution covariates (CO2, NOx, SOx). What is the model R2
for the regression? Is it greater than 0.50? (Hint:
Try Tools | Data Analysis... | Regression
in Excel.)
(ii) Do the three pollutant variables provide a fit to the Air
Quality measurements that is significantly better than no fit at all?
(This information can be found in the ANOVA part of the Excel output.)
What are the numbers of degrees of freedom (both numerator and
denominator) of the F-test that answers this question? What is the
P-value?
(iii) What is the regression function that Excel finds? Suppose
that the statistical study is done in a city with pollutant levels
CO2=90, NOx=20, and SOx=12. What air quality value would the regression
function predict for that city?
(iv) Which of the three pollutant variables have coefficients in
the regression function that are significantly different from zero? What
are their P-values? Those pollutants whose regression slope is not
significantly different from zero may not have a significant effect on
Air Quality.
(v) What are the correlation coefficients between Air Quality and
each of the three pollutant variables? Are these correlation
coefficients consistent with your answer to part (iv)?
(Hint: You can use the CORRELATION function under
Tools | Data Analysis...
to create a 4x4 table of
correlation coefficients for Air Quality and the three pollutants.)
4. (15 points) A physician is interested in how the time to recover from
a newly-discovered viral illness depends on the patient's initial
antibody levels. The physician also wants to know if the recovery time
depends on the sex of the patient, both overall and also after the
initial antibody level is allowed for. Recovery times in days (Y) and
antibody levels (X) are recorded below for 10 female and 10 male
patients. Note that the results for the 10 female subjects are listed
first.
Recovery Sex Init.Antibody
32 F 103
36 F 113
24 F 100
31 F 86
25 F 99
31 F 107
24 F 103
30 F 117
27 F 102
33 F 117
27 M 91
37 M 111
31 M 97
38 M 116
43 M 111
38 M 105
34 M 113
32 M 117
36 M 109
34 M 112
(i) Is there a significant difference in recovery time between the
two sexes, ignoring Antibody for the moment? Carry out a t-test to find
out. (Use either the classical or the Satterthwaite t-test. You can use
Data Analysis Tookpak functions to do either.) What is the P-value?
(ii) Is there a significant linear relationship between Recovery
Time and initial Antibody level, ignoring Sex for the moment? Do a
linear regression for Recovery Time (Y) on Antibody level (X),
ignoring sex. What is the P-value? What is the model R2?
(iii) What is the linear regression for Recovery Time (Y) on
both Sex (X1) and initial Antibody level (X2)? What recovery
time would this linear regression predict for Sex=Female and
Antibody=100?
(Hint: Assign numerical codes to Sex, for example M=1
and F=2 (this stands for the number of X chromosomes in a typical
individual, which might make it easier to remember) or
M=1, F=0 . Since the linear regression is Yhat = beta0 +
beta1*Sex + beta2*Antibody, this allows you to include both a mean
difference for the two sexes and a linear term depending on Antibody in
the same regression. The fitted values and the P-values for the
regression coefficients will be the same no matter what codes you assign
to the two sexes, as long as you don't assign the same code to both
sexes.)
(iv) Does this regression fit Recovery times better than no
variables at all? What is the P-value of the F-test? What are the
numbers of degrees of freedom of the F-test (both numerator and
denominator)? What is the model R2? How much has it improved
over part (ii)?
(v) Which of the variables in the regression (Sex and Antibody) have
regression coefficients that are significantly difference from zero?
What are their P-values?
(vi) Find a 95% confidence interval for the slope parameter for
Antibody in the regression on two variables. (Hint: If you
use Excel, this is part of the output.)