Math 408 Homework 4

Text references are to Hollander and Wolfe, ``Nonparametric Statistical Methods'', 2nd ed.

NOTES:
    (1)  Whenever you are asked to test a hypothesis, state the P-value, whether the P-value is for a one-sided or two-sided test if appropriate (that is, if the statistic has a large-sample normal approximation), and whether you accept or reject H_0.

    (2)  If you use MATLAB to do a problem, include (hard copy of) your MATLAB output AND your MATLAB program in an APPENDIX to your homework. That is, do not mix together the answers to the questions and your computer output. In that way, for problems in which you used MATLAB, your answers become an ``executive summary'' that gives your conclusions, and interested parties can then look or not look at your actual MATLAB code and output to get more information or to see what happened if you get a wrong answer.

(3)    In the following, ^ means superscript, _ (underscore) means subscript, and Sum(i=1,9) means the sum for i=1 to 9.

1.  Gerstein (1965) studied the long-term pollution of Lake Michigan and its effect on the water supply for the city of Chicago. One of the measurements considered by Gerstein was the annual number of “odor periods” over the period of years 1950-1964. The following table contains this information for Lake Michigan for each of these 15 years.

Year

Number of Odor Periods

1950

10

1951

20

1952

17

1953

16

1954

12

1955

15

1956

13

1957

18

1958

17

1959

19

1960

21

1961

23

1962

23

1963

28

1964

28

(i)                  Using Spearman’s method in section 8.5, test the hypothesis that the degree of pollution (as measured by the number of odor periods) had not changed with time against the alternative that there was a general increasing trend in the pollution of Lake Michigan over the period of 1950-1964. (See Comment 48 on page 405 of the textbook.)

(ii)                If we set year 1950 as the baseline, i.e. recording it as year 0, so other years are recorded as (year-1950). For example, year 1959 will be recorded as 9. Will Spearman’s correlation change?

2. Suppose that an investigator is studying the association between sulfur dioxide (SO2) concentrations in a national park and the rate of emission of SO2 by a coal burning power plant 25 miles away. To assess the power plant's SO2 contribution to the national park, recordings were made of X, the SO2 output by the plant in tons/hour, as well as Y, the SO2 concentrations at the national park in micrograms/cubic meter. The investigator would like a straight line regression equation relating Y to X.

x

y

5.21

1.92

7.36

3.92

16.26

6.80

10.10

6.32

5.80

2.00

8.06

4.32

4.76

2.40

6.93

2.96

9.36

3.52

10.90

4.24

12.48

5.12

11.70

5.84

7.44

3.60

6.99

2.80

  

(i)  Use ordinary least squares to fit a regression line.

(ii)  Use Theil’s method to test whether the slope is equal to 0.44.

(iii) Estimate the regression line using Theil’s method, and compare with the OLS regression line.

 

3.  Fifteen children were given a visual discrimination test during their first week of kindergarten and a reading-achievement test at the end of first grade. The scores on the two tests are given below.

 

 (i) Calculate Spearman’s and Kendall’s correlation coefficient between the two test scores.

(ii) Using either approach in (i), would we reject the null hypothesis that there is no association between the two scores

(iii) Would you recommend using either of these nonparametric statistics instead of Pearson’s correlation coefficient? Why?

4.  The following data are from a study on relating the survival of three species of Drosophila under increasing levels of insecticide. Four batches of medium, identical except for the levels of insecticide they contained, were prepared. One hundred eggs from each of three Drosophila species were deposited on each of the four medium preparations and the numbers of Drosophila flies that survived to adulthood are recorded. Using the method in section 9.5, test the hypothesis that the three species of Drosophila exhibit the same response to increasing levels of insecticide in the medium studied. State the required assumption to perform this analysis. (For computation, refer to the sample Matlab program ParallelSlopes.m or Parallel2.m.)

   

species

level of insecticide (ppm)

Number survived to adulthood

Drosophila melanogaster

0

91

0.3

71

0.6

23

0.9

5

Drosophila pseudoobscura

0

89

0.3

77

0.6

12

0.9

2

Drosophila serrata

0

87

0.3

43

0.6

22

0.9

8

 

5. The crime.csv data set appears in Statistical Methods for Social Sciences, Third Edition by Alan Agresti and Barbara Finlay (Prentice Hall, 1997).  The variables are state id (sid), state name (state), violent crimes per 100,000 people (crime), murders per 1,000,000 (murder),  the percent of the population living in metropolitan areas (pctmetro), the percent of the population that is white (pctwhite), percent of population with a high school education or above (pcths), percent of population living under poverty line (poverty), and percent of population that are single parents (single).  Before you perform any analysis, drop the observation for Washington, D.C. (sid=51) because it is not a state. The goal of the analysis is to fit a multiple linear regression model by regressing the crime rate on the percent of population living under poverty line (poverty) and percent of population that are single parents (single).

(i) Use both ordinary least squares (use Matlab function regress()) and rank regression (use the program written by Shapour Mohammmadi at University of Tehran) to fit the regression line.

(ii) Remove the observations for Florida and Mississippi, redo the analyses in (i). Compare the change in the estimated regression line from the two methods. Do you observe that rank regression is more robust than OLS?