Homework #12, Math 320, Spring 2001

Name:____________________________      Section:____

## Math 320 Homework #13 --- Due 4/27

Include your name, section number, and homework number on every page that you hand in. Enter ``Section 1'' for the morning class (10-11AM) and ``Section 2'' for Professor Sawyer's class (12-1PM).

Begin the exposition of your work on this page. If more room is needed, continue on sheets of paper of exactly the same size (8.5 x 11 inches), lined or not as you wish, but not torn from a spiral notebook. You should do your initial work and calculations on a separate sheet of paper before you write up the results to hand in.

Output from Excel must have your name and the homework number in cell A1.

All problems count 15 points.

1. Twenty employees in a factory are rated according to their job satisfaction (Y) and years of service (X). Their job type --- A for hourly employees and B for managerial --- was also recorded. The data are

```  (Y) (JobType) (X)
25   A   10.3
28   A   11.3
22   A   10.0
26   A    8.6
23   A    9.9
26   A   10.7
22   A   10.3
25   A   11.7
24   A   10.2
27   A   11.7
15   B    9.1
28   B   11.1
25   B    9.7
29   B   11.6
32   B   11.1
29   B   10.5
29   B   11.3
32   B   11.7
31   B   10.9
31   B   11.2 ```
(i) Use Excel to construct scatterplots for years of service (X) versus job satisfaction (Y) for each job type (one scatterplot for A and one for B). Do the slopes of the best lines through the plots appear to be the same?
(ii) Introduce a dummy variable for job type and use Excel to analyze a regression of Y on job type and X. What is the model R2 for this regression? What is the P-value of the model test? How many degrees of freedom does the F-statistic have for the model test (both numerator and denominator)? Which of the two variables have coefficients that are statistically significant? What are their P-values?
(iii) Analyze the same regression with an interaction term added. That is, use Excel to analyze the regression of Y on job type, X, and job type * X. What is the model R2 for this regression? What is the P-value of the model test? How many degrees of freedom does the F-statistic have for the model test (both numerator and denominator)? Which of the three variables have coefficients that are statistically significant? What are their P-values?
(iv) Write down the estimated regression line for Y versus X within group A and the estimated regression line within group B. (Each line will be of the form Y = C0 + C1*X for particular numbers C0 and C1.)

2. This refers to the data about grandfather clocks on page 613 of the text:

(i) For the regression in problem 13.35, what is the model R2? What is the estimate of the standard deviation of the error terms? Which variables in the model have coefficients that are statististically significantly different from zero? What are their P-values? (Hint: This can be answered from the Minitab output on page 614.)
(ii) Answer the questions in problem 13.36 on page 613.
(iii) For the regression in problem 13.37, what is the model R2? What is the estimate of the standard deviation of the error terms? Is it smaller than before? Which variables in the model have coefficients that are statististically significantly different from zero? What are their P-values? (Hint: This can be answered from the Minitab output on page 615.)
(iv) Answer the questions in problem 13.38 on page 613.
(v) For the regression in problem 13.39, what is the model R2? What is the estimate of the standard deviation of the error terms? Is it smaller than before? Which variables in the model have coefficients that are statististically significant? What are their P-values? (Hint: This can be answered from the Minitab output on page 616.)

3. Consider a regression of a variable Y on four covariates X1, X2, X3, and X4:

```    Y      X1    X2    X3    X4
291     43   167   279    39
354     40   173   228    29
333     53   167   214    29
301     44   166   210    29
100     15   100   169    19
192     39   171   156    19
138     17   111   217    29
280     42   179   216    27
201     45   160   217    29
392     70   231   221    31
184     44   179   100    10
221     38   168   151    20
297     43   172   211    29
300     40   171   213    28
166     42   167   162    19
355     70   240   219    29
503    100   288   215    29
318     42   169   221    29
185     10   114   216    29
269     43   153   222    29 ```
(i) Use Excel to analyze the regression of Y on the four covariates X1, X2, X3, X4 together. What is the model R2 for the regression on four variables? What is the P-value of the model test? How many degrees of freedom does the F-statistic have for the model test?
What are the P-values of the coefficients in the regression function corresponding to X1, X2, X3, and X4? Do these results seem paradoxical to you?
(ii) Use the CORRELATION function in the Excel Data Analysis Toolpak to find the correlation matrix for the five variables Y, X1, X2, X3, and X4. Which pairs of the four variables X1, X2, X3, X4 are highly correlated with one another? (For definiteness, say that two variables X, Y are `highly correlated' if |r|>0.85 for their sample correlation coefficient r, `nearly uncorrelated' if |r|<0.20, and `moderately correlated' otherwise.)
(iii) Now use Excel to analyze the regression of Y on the two covariates X1 and X4. What is the correlation coefficient between X1 and X4? What is the model R2 for this regression on two variables? What is the P-value of the model test? How many degrees of freedom does the F-statistic have for the model test?
What are the P-values of the coefficients in the regression function corresponding to X1 and X4? Are these results more consistent with the model-test P-value than was the case for the regression on four variables?

4. The lengths of trout caught in four mountain lakes were recorded in millimeters as:

```Blue Lake     139  149  157  159  162  182  206
Clear Lake    146  168  175  217  224
Crystal Lake  175  197  203  215  215  224  232
Black Lake    193  205  208  228  253 ```
(i) Use Excel or a TI-83 to test the hypothesis H0 that the means of the trout caught in the four lakes are the same. (That is, H0: mu1=mu2=mu3=mu4.) What is the P-value? Do you accept or reject H0 at alpha=0.05? How many degrees of freedom does the F-statistic have that this test is based upon?
(ii) Which pairs of lakes out of the six possible pairs differ significantly in the lengths of the trout caught? What are the corresponding P-values? Use the test statistic

T = (Ximean-Xjmean/(root(MSError)*root(1/ni + 1/nj))

to test the differences between the ith and jth sample means instead of pairwise Student t-tests. (The statistic T uses the data for all four lakes to estimate the standard deviation, not just the data in the two samples. If mui=muj, then T has a Student's t distribution with the same number of degrees of freedom as in MSError.)