Homework #12, Math 320, Spring 2001

Name:____________________________      Section:____

Math 320 Homework #12 --- Due 4/23

Include your name, section number, and homework number on every page that you hand in. Enter ``Section 1'' for the morning class (10-11AM) and ``Section 2'' for Professor Sawyer's class (12-1PM).

Begin the exposition of your work on this page. If more room is needed, continue on sheets of paper of exactly the same size (8.5 x 11 inches), lined or not as you wish, but not torn from a spiral notebook. You should do your initial work and calculations on a separate sheet of paper before you write up the results to hand in.

Output from Excel must have your name and the homework number in cell A1.

1. (15 points) Do exercise 12.6 on page 538, parts (a)-(e).

Most of the parts of this question can be answered from the computer output on page 539, which is output from the statistical program ``Minitab''.  Note that Minitab displays the regression line explicitly. The regression coefficients can also be found in the ``Coeff'' column in the ``Coefficient Table'',  which is where you usually have to find them.
``s = xxxx'' in the Minitab output is the estimate of the standard deviation of the residuals or regression errors. Note that it is the square root of the MSError (or MSResidual) entry in the ANOVA table.

2. (15 points) Course grades at Midwestern Normal University are determined by 11 weekly exams. A student had good grades in 10 of the 11 exams in one of his courses, but missed the 8th exam for health reasons. His grades and the class averages for the exams are

Exam:       1    2    3    4    5    6    7    8    9   10   11
---------------------------------------------------------------
Student:   35   69   92   72   94   64   59   ..   87   75   80
ClassAv:   32   43  100   54   99   43   57   88   71   65   65
---------------------------------------------------------------
The student's average for the 10 exams that he took is 73. The course instructor does not feel that it would be fair to give the student that grade for Exam 8, since it is below the class average for that exam and most (but not all) of the student's other grades were above the class average. The class average of 88 on that exam would be unfair for the same reason. Instead, the instructor finds the regression line Y = mu + (beta)X for Student exam grades (Yi) on the class average grades (Xi) for the 10 exams that the student took. The instructor then assigns the grade Y = Y(88) = mu + (beta)88 for the missing examination.
(i) Find the regression coefficients mu and beta and find the grade Y(88) that was assigned. Is it more or less than the student's average score of 73 on the other exams?
(ii) How accurate is the value Y(88) as an estimate of the expected grade that the student should achieve when the class average is 88? Find a 95% confidence interval for this mean value.
(Hints: The easiest way to do this problem is to enter the 10 pairs of values into a TI-83 using the functions LinReg(ax+b) on the STAT then CALC menu or else LinRegTTest on the STAT then TESTS menu. You can also use Excel. Alternatively, all of the formulas that you need can also be evaluated directly from the sums Sum(Yi)=727, Sum(Y2i)=55641, Sum(Xi)=629, Sum(X2i)=44179, and Sum(XiYi)=48848 for the 10 exams that the student took. )

Use Excel to do the following two problems:

3. (15 points) A study is done to test air quality as a function of three possible pollutants CO2, NOx, and SOx. Data gathered for an air quality index on a typical Summer day in 20 cities are

 Air Qual   CO2   NOx   SOx
   197       61    22    17
   191       33    16    27
   224       65    18    18
   183       55    18    32
   236       50    26    23
   200       60    24    26
   226       59    16    25
   164       54    22    24
   100       83    10    17
   285       50    21    29
   207       73    22    21
   336       61    26    35
   299       42    36    21
   192       44    25     4
   264       54    20    29
   244       24    17    27
   227       53    22    42
   216       41    18    30
   263       57    29    24
   265       87    23    25 
(i) Use Excel to run a regression of Air Quality (Y) on the three pollution covariates (CO2, NOx, SOx). What is the model R2 for the regression? Is it greater than 0.50?   (Hint: Try Tools | Data Analysis... | Regression in Excel.)
(ii) Do the three pollutant variables provide a fit to the Air Quality measurements that is significantly better than no fit at all? (This information can be found in the ANOVA part of the Excel output.) What are the numbers of degrees of freedom (both numerator and denominator) of the F-test that answers this question? What is the P-value?
(iii) What is the regression function that Excel finds? Suppose that the statistical study is done in a city with pollutant levels CO2=90, NOx=20, and SOx=12. What air quality value would the regression function predict for that city?
(iv) Which of the three pollutant variables have coefficients in the regression function that are significantly different from zero? What are their P-values? Those pollutants whose regression slope is not significantly different from zero may not have a significant effect on Air Quality.
(v) What are the correlation coefficients between Air Quality and each of the three pollutant variables? Are these correlation coefficients consistent with your answer to part (iv)? (Hint:  You can use the CORRELATION function under Tools | Data Analysis... to create a 4x4 table of correlation coefficients for Air Quality and the three pollutants.)

4. (15 points) A physician is interested in how the time to recover from a newly-discovered viral illness depends on the patient's initial antibody levels. The physician also wants to know if the recovery time depends on the sex of the patient, both overall and also after the initial antibody level is allowed for. Recovery times in days (Y) and antibody levels (X) are recorded below for 10 female and 10 male patients. Note that the results for the 10 female subjects are listed first.

 Recovery  Sex  Init.Antibody
   32       F      103
   36       F      113
   24       F      100
   31       F       86
   25       F       99
   31       F      107
   24       F      103
   30       F      117
   27       F      102
   33       F      117
   27       M       91
   37       M      111
   31       M       97
   38       M      116
   43       M      111
   38       M      105
   34       M      113
   32       M      117
   36       M      109
   34       M      112 
(i) Is there a significant difference in recovery time between the two sexes, ignoring Antibody for the moment? Carry out a t-test to find out. (Use either the classical or the Satterthwaite t-test. You can use Data Analysis Tookpak functions to do either.) What is the P-value?
(ii) Is there a significant linear relationship between Recovery Time and initial Antibody level, ignoring Sex for the moment? Do a linear regression for Recovery Time (Y) on Antibody level (X), ignoring sex. What is the P-value? What is the model R2?
(iii) What is the linear regression for Recovery Time (Y) on both Sex (X1) and initial Antibody level (X2)? What recovery time would this linear regression predict for Sex=Female and Antibody=100?
(Hint:  Assign numerical codes to Sex, for example M=1 and F=2 (this stands for the number of X chromosomes in a typical individual, which might make it easier to remember) or M=1, F=0 .  Since the linear regression is Yhat = beta0 + beta1*Sex + beta2*Antibody, this allows you to include both a mean difference for the two sexes and a linear term depending on Antibody in the same regression. The fitted values and the P-values for the regression coefficients will be the same no matter what codes you assign to the two sexes, as long as you don't assign the same code to both sexes.)
(iv) Does this regression fit Recovery times better than no variables at all? What is the P-value of the F-test? What are the numbers of degrees of freedom of the F-test (both numerator and denominator)? What is the model R2? How much has it improved over part (ii)?
(v) Which of the variables in the regression (Sex and Antibody) have regression coefficients that are significantly difference from zero? What are their P-values?
(vi) Find a 95% confidence interval for the slope parameter for Antibody in the regression on two variables. (Hint:  If you use Excel, this is part of the output.)