## Math 320 Homework #12 --- Due 4/23

1. (15 points) Do exercise 12.6 on page 538, parts (a)-(e).

Most of the parts of this question can be answered from the computer output on page 539, which is output from the statistical program ``Minitab''.  Note that Minitab displays the regression line explicitly. The regression coefficients can also be found in the ``Coeff'' column in the ``Coefficient Table'',  which is where you usually have to find them.
```s = xxxx`'' in the Minitab output is the estimate of the standard deviation of the residuals or regression errors. Note that it is the square root of the MSError (or MSResidual) entry in the ANOVA table.

2. (15 points) Course grades at Midwestern Normal University are determined by 11 weekly exams. A student had good grades in 10 of the 11 exams in one of his courses, but missed the 8th exam for health reasons. His grades and the class averages for the exams are

```Exam:       1    2    3    4    5    6    7    8    9   10   11
---------------------------------------------------------------
Student:   35   69   92   72   94   64   59   ..   87   75   80
ClassAv:   32   43  100   54   99   43   57   88   71   65   65
---------------------------------------------------------------```
The student's average for the 10 exams that he took is 73. The course instructor does not feel that it would be fair to give the student that grade for Exam 8, since it is below the class average for that exam and most (but not all) of the student's other grades were above the class average. The class average of 88 on that exam would be unfair for the same reason. Instead, the instructor finds the regression line `Y = mu + (beta)X` for Student exam grades (Yi) on the class average grades (Xi) for the 10 exams that the student took. The instructor then assigns the grade Y = Y(88) = mu + (beta)88 for the missing examination.
(i) Find the regression coefficients mu and beta and find the grade Y(88) that was assigned. Is it more or less than the student's average score of 73 on the other exams?
(ii) How accurate is the value Y(88) as an estimate of the expected grade that the student should achieve when the class average is 88? Find a 95% confidence interval for this mean value.
(Hints: The easiest way to do this problem is to enter the 10 pairs of values into a TI-83 using the functions `LinReg(ax+b)` on the `STAT` then `CALC` menu or else `LinRegTTest` on the `STAT` then `TESTS` menu. You can also use Excel. Alternatively, all of the formulas that you need can also be evaluated directly from the sums Sum(Yi)=727, Sum(Y2i)=55641, Sum(Xi)=629, Sum(X2i)=44179, and Sum(XiYi)=48848 for the 10 exams that the student took. )

Use Excel to do the following two problems:

3. (15 points) A study is done to test air quality as a function of three possible pollutants CO2, NOx, and SOx. Data gathered for an air quality index on a typical Summer day in 20 cities are

``` Air Qual   CO2   NOx   SOx
197       61    22    17
191       33    16    27
224       65    18    18
183       55    18    32
236       50    26    23
200       60    24    26
226       59    16    25
164       54    22    24
100       83    10    17
285       50    21    29
207       73    22    21
336       61    26    35
299       42    36    21
192       44    25     4
264       54    20    29
244       24    17    27
227       53    22    42
216       41    18    30
263       57    29    24
265       87    23    25 ```
(i) Use Excel to run a regression of Air Quality (Y) on the three pollution covariates (CO2, NOx, SOx). What is the model R2 for the regression? Is it greater than 0.50?   (Hint: Try `Tools | Data Analysis... | Regression` in Excel.)
(ii) Do the three pollutant variables provide a fit to the Air Quality measurements that is significantly better than no fit at all? (This information can be found in the ANOVA part of the Excel output.) What are the numbers of degrees of freedom (both numerator and denominator) of the F-test that answers this question? What is the P-value?
(iii) What is the regression function that Excel finds? Suppose that the statistical study is done in a city with pollutant levels CO2=90, NOx=20, and SOx=12. What air quality value would the regression function predict for that city?
(iv) Which of the three pollutant variables have coefficients in the regression function that are significantly different from zero? What are their P-values? Those pollutants whose regression slope is not significantly different from zero may not have a significant effect on Air Quality.
(v) What are the correlation coefficients between Air Quality and each of the three pollutant variables? Are these correlation coefficients consistent with your answer to part (iv)? (Hint:  You can use the CORRELATION function under `Tools | Data Analysis...` to create a 4x4 table of correlation coefficients for Air Quality and the three pollutants.)

4. (15 points) A physician is interested in how the time to recover from a newly-discovered viral illness depends on the patient's initial antibody levels. The physician also wants to know if the recovery time depends on the sex of the patient, both overall and also after the initial antibody level is allowed for. Recovery times in days (Y) and antibody levels (X) are recorded below for 10 female and 10 male patients. Note that the results for the 10 female subjects are listed first.

``` Recovery  Sex  Init.Antibody
32       F      103
36       F      113
24       F      100
31       F       86
25       F       99
31       F      107
24       F      103
30       F      117
27       F      102
33       F      117
27       M       91
37       M      111
31       M       97
38       M      116
43       M      111
38       M      105
34       M      113
32       M      117
36       M      109
34       M      112 ```
(i) Is there a significant difference in recovery time between the two sexes, ignoring Antibody for the moment? Carry out a t-test to find out. (Use either the classical or the Satterthwaite t-test. You can use Data Analysis Tookpak functions to do either.) What is the P-value?
(ii) Is there a significant linear relationship between Recovery Time and initial Antibody level, ignoring Sex for the moment? Do a linear regression for Recovery Time (Y) on Antibody level (X), ignoring sex. What is the P-value? What is the model R2?
(iii) What is the linear regression for Recovery Time (Y) on both Sex (X1) and initial Antibody level (X2)? What recovery time would this linear regression predict for Sex=Female and Antibody=100?
(Hint:  Assign numerical codes to Sex, for example M=1 and F=2 (this stands for the number of X chromosomes in a typical individual, which might make it easier to remember) or M=1, F=0 .  Since the linear regression is Yhat = beta0 + beta1*Sex + beta2*Antibody, this allows you to include both a mean difference for the two sexes and a linear term depending on Antibody in the same regression. The fitted values and the P-values for the regression coefficients will be the same no matter what codes you assign to the two sexes, as long as you don't assign the same code to both sexes.)
(iv) Does this regression fit Recovery times better than no variables at all? What is the P-value of the F-test? What are the numbers of degrees of freedom of the F-test (both numerator and denominator)? What is the model R2? How much has it improved over part (ii)?
(v) Which of the variables in the regression (Sex and Antibody) have regression coefficients that are significantly difference from zero? What are their P-values?
(vi) Find a 95% confidence interval for the slope parameter for Antibody in the regression on two variables. (Hint:  If you use Excel, this is part of the output.)