Math 475 Homework 1 - Fall 2005

  • Click here for Math475 home page
  • Click here for Math475 homework page
  • Click here for Prof. Sawyer's home page

    HOMEWORK #1 due Tuesday 9-20

    Text references are to Cody & Smith,

    ``Applied statistics and the SAS programming language'', 5th edn
    (Recall that the answers to odd-numbered problems are in the back of the book.)

    ORGANIZE YOUR HOMEWORK in the following manner:

    (i) Your answers to all questions (for problems #1 and #2, in which you are just asked to create output, just say that the output is in part (iii) ),
    (ii) All of your SAS programs, and
    (iii) The SAS output that you got.
    Add page numbers to your homework so that you can make references from part (i) to part (iii). (For example, so that you can say things like, ``The answer in part (a) is 5. The scatterplot for part (b) is on page #Y below.'')
    Include your name in a title statement so that your name will appear at the top of each output page.
    If a problem or a part of a problem asks you to do a statistical test, EXPLAIN CLEARLY what the null hypothesis H_0 is, what test you used, what the P-value is, and whether the data is significant, highly significant, or neither. Include this as part of your answer in part (i).
    (The reason for the (i,ii,iii) order is to have your conclusions first, then your SAS programs, then the SAS output on which you based your conclusions. This will make the organization of your homework much clearer in later assignments, which will have a larger number of more complex problems.)

    The problems:

    1. Text page 19, problem #1-1.

    
    

    2. Text page 19, problem #1-3, and page 64, problem #2-3.
    (Do as one problem.)

    
    

    Problems 3-5 depend on the following data for the 47 current employees of Vaporlock Computer Services:

    Table 1.   Height (inches),  weight (pounds),  and sex for 47 employees:
         67  123  F       67  143  M       69  174  M       64  127  F
         61  116  F       70  159  M       71  142  M       66  146  F
         61  128  F       59  139  F       65  127  F       69  172  M
         64  166  M       63  120  F       69  166  M       67  152  F
         62  153  F       60  152  F       66  168  M       66  155  M
         71  145  M       64  164  M       72  168  M       64  123  F
         64  135  F       68  158  M       63  159  M       71  177  M
         65  158  M       63  169  M       60  139  F       71  177  M
         65  150  F       63  145  M       62  141  F       64  118  F
         64  168  M       66  151  F       68  171  M       63  158  M
         63  146  M       68  149  M       66  162  M       68  144  F
         61  131  F       72  179  M       62  142  F
     

    3. (i) Enter the data in Table 1 into a SAS program in a data step with variables for height, weight, and sex. Construct a scatter plot of heights (Y-variable) by weights (X-variable) using sex as the plotting symbol.

    Do the heights and weights appear to be correlated? (That is, do taller individuals appear also to be heavier?) Do heights and weights appear to be correlated within each sex; that is, for Fs only and for Ms only?
    (Hint: It may be easier (and safer) to copy and paste the data from the Math475 Web site into your program than to enter it by hand.)

    (ii) The company's insurance company is interested in the distribution of the employees over various height and weight categories. The state insurance commission requires that they use the following codes for height and weight ranges:
         Height:  1:  le 63     Weight:  1:  le 119
                  2:  ge 64              2:  120 to 137
                                         3:  138 to 170
                                         4:  ge 171
     
    where le means `less than or equal to' and ge means `greater than or equal to'.
    Using these codes, use SAS's proc freq to construct tables for (a) heights, (b) weights, and (c) height by weight (a 2 by 4 table) using these height and weight codes. (Hint: Define new variables htcode=1,2 and wtcode=1,2,3,4 by if--then--else statements in the data step. See the first program in Section 1C of the text for an example.)
    
    

    4. For the data in Table 1,

    (i) Are the males in this sample significantly taller than the females? Have SAS conduct a Student's t-test to find out. What is the P-value?
    (ii) In part (i), did you use the classical t-test or the Satterthwaite t-test? Why? (That is, pick one of the two methods and justify it.)
    (iii) What does Prob>F' = .... mean in the output? What hypothesis H_0 is SAS testing here? Does SAS accept it or reject it? What is the P-value?
    (iv) Analyze the same data using the Wilcoxon rank-sum test. What is the P-value? (Use the ``chi-square'' P-value, not the continuity-corrected P-value.)
    
    

    5. For the data in Table 1,

    (i) Are the height and weight of these employees significantly correlated, as measured by a Pearson correlation coefficient? What is the correlation coefficient? (Hint: See the description of proc corr in Chapter 5 in the text.)
    (ii) Are the height and weight of employees significantly correlated within each sex? Use a Pearson correlation coefficient within each sex. What are the two correlation coefficients? (Hint: If you add the option by sex; to SAS's proc corr, then SAS will stratify by sex and run proc corr within each sex.)
    (iii) Why are your answers different in parts (i) and (ii)? Can you deduce anything from the scatterplot in the problem 3?
    (iv) What statistical test did SAS perform to find the P-value of the correlations? What standard statistical distribution is this test based on? Does this test assume that the data is normally distributed?
    (NOTE for part (ii): In general if you say ``by var;'' in SAS, then SAS stratifies by contiguous groups with the same value of var. Thus to compute within-sex correlations, you must first sort the data by sex, so that all Fs occur together and all Ms occur together. In contrast, ``class var;'' in SAS will usually let you stratify by values of var without sorting, but proc corr; does not currently support ``class var;''. Make sure that you get valid output.)

  • Top of this page