Math 439 Takehome Final - Fall 2010

  • Click here for Math439 homework page

    TAKEHOME FINAL due Wednesday 12-22 by 5:30 P.M.
    Hand in either to Professor Sawyer or to the receptionist in the Mathematics Office.

    NOTE: There should be NO COLLABORATION on the takehome final,
       other than for the mechanics of using the computer.

    Open textbook and notes (including course handouts). References to the text are to Rencher, Methods of multivariate analysis, 2002, 2nd edn. Wiley Series in Probability and Statistics.

    ORGANIZE YOUR WORK in the following manner:

    (i) your answers to all questions written out separately,
    (ii) all SAS programs that you use, if you use SAS for any problems, followed by
    (iii) all SAS output.
    ADD CONSECUTIVE PAGE NUMBERS to the output so that you can make references from part (i) to part (iii). For example, so that you can say things like, ``The answer to part (a) is 7.71. The scree plot for part (b) is on page #Y below.''

    NOTE: In the following, _ means subscript, ^ means superscript, le means `less than or equal', and ge means `greater than or equal'.

    Whole problems are equally weighted, but different parts of problems may be weighted differently.
    Four (4) problems.

    
    

    Problem 1. Let Z = (Y' X')' be a random 5x1 column vector written in partitioned form for a 3x1 column vector Y and and 2x1 column vector X. Suppose that Z has a five-dimensional joint normal distribution with covariance and mean given in block matrix form by

                 (  3   1   4  |  -2  -3 )              (  1  )
                 (  1   8   1  |   0   1 )              (  3  )
      Cov(Z) =   (  4   1   9  |  -4  -5 )      E(Z) =  (  9  )
                 ( --------------------- )              ( ----)
                 ( -2   0  -4  |   9   5 )              (  3  )
                 ( -3   1  -5  |   5   5 )              (  1  )  

    (i)   Find the (3x3) conditional covariance matrix Cov(Y | X=(5 3)') and the (3x1) conditional mean vector E(Y | X=(5 3)'). Do this either by hand or else by using SAS's proc iml or a comparable matrix language. (Hint: The text Section 4.2, page 88 for the formulas stated without proof, and Corollary 10.1 in the Multivariate Linear Models handout on the Math439 Web site for the formulas with a proof.)

    (ii)   Find the eigenvalues of the 3x3 covariance matrices Cov(Y) and Cov(Y | X=(5 3)'). Are they similar or very different?
    (Hints: You can create matrices with given values in proc iml by either setting (for example) yy = { 1 2 3, 4 5 6, 7 8 9 }; (note curly braces), where spaces mean ``same row'' and commas mean ``start of new row''. (Thus yy is a 3x3 matrix.) Similarly, you can define submatrices of another matrix by, for example, xx = yy[2:3,1:2], which is the same as xx = { 4 5, 7 8 };. You can also define data sets using SAS datasteps and import columns to a matrix in proc iml. (See examples on the Math439 Web site.) You can find eigenvalues and eigenvectors of symmetric matrices in proc iml by using the function call eigen. See for example ThreeRegIml.sas on the Math439 Web site.)

    (iii) Find trace[Cov(Y)] and trace[Cov(Y|X=x)] and show trace[Cov(Y)] ge trace[Cov(Y|X=x)].

    (iv) If A and B are symmetric dxd matrices, we say A ge B if  y'Ay  ge  y'By  for all y in R^d. In general for joint normal Z = (Y' X')' in R^{a+b} where Y is ax1 and X is bx1, show that Cov(Y) ge Cov(Y|X=x). (Hint: Use the formula that you used in part (i).) Note: It is easy to show that, for symmetric matrices, A ge B implies trace(A) ge trace(B). Both are generalizations of Var(Y) ge Var(Y|X=x) for the bivariate normal distribution (that is, a=b=1).
    
    

    Problem 2.   An experimenter measures a response variable Y_i along with four covariates, which she imaginatively calls X1, X2, X3, and X4. (More exactly, Xi1, Xi2, Xi3, and Xi4 for the i-th observation.) She carries out a linear regression of Y_i on X1,X2,X3,X4 (including an intercept term) under the assumption that the errors are independent normal with the same error covariance sigma^2.
            Data for the n=50 observations are contained in the file Experiment4.dat on the Math439 Web site. Note that the first row of Experiment4.dat is column headings and not data.

    (i) Use proc iml in SAS or or a similar matrix package for the regression
       Y_i  =  mu + beta_1 Xi1 + beta_2 Xi2 + beta_3 Xi3 + beta_4 Xi4 + error_i
    to find (a) the least-square or ML estimators for the five coefficients in the regression, (b) T-statistics for the four tests H_0:beta_i=0, and (c) Student-t P-values for H_0 in each case.

    (ii) What is the number of degrees of freedom in the associated T-tests? How was the number of degrees of freedom calculated? (Hint: See the use of proc iml in ThreeRegIml.sas on the Math439 Web site.)
    
    

    Problem 3.   A colleague of the experimenter in Problem 2 asks if the estimates of the coefficients beta_1 and beta_2 in Problem 2 are significantly different.
            Carry out a Student t-test of the hypothesis H_0:beta_1=beta_2. What is the value of the t statistic? What is the P-value? How many degrees of freedom does the resulting t-test have?
            (Hints: If t =(0 1 -1 0 0)' and beta=(mu beta_1 .. beta_4)' in the regression, then t'beta=beta_1-beta_2. Given the theoretical distribution of betahat, show that t'betahat is normally distributed with mean t'beta with a variance V that depends on t and X'X. Given H_0:t'beta=0, conclude that T=t'betahat/sqrt((MSE)*V) has a Student-t distribution. You should be able to do this problem by adding a few more lines of matrix code to the program that you wrote for Problem 2.)

    
    

    Problem 4.   Ten (10) rabbits of two types, 5 brown and 5 white, are measured for aortic thickening at five positions along the descending aorta. The data in collected in Table 1.

            Table 1 --- Degeneration at five positions in the aorta in 10 rabbits
          Type    Subj  Pos1  Pos2  Pos3  Pos4  Pos5 
          Brown    S01   640   566   427   475   306
                   S02   578   504   525   577   409
                   S03   683   380   342   461   530
                   S04   292   318   576   466   284
                   S05   464   574   459   440   729
          White    S06   287   276   276   449   560
                   S07   271   297   574   421   501
                   S08   262   378   396   350   554
                   S09   331   344   330   340   510
                   S10   175   302   625   362   526  

    (i) Are the two factors Type (with two levels) and Position (with levels Pos1-Pos5) crossed, nested, or neither? If they are nested, which is nested under which?
            The data in the table also has a third factor Subject, for the 10 individual rabbits. Is Subject crossed with Type? nested within Type? crossed with Position? nested within Position? Why?

    (ii) Run a full factorial ANOVA model to test Type, Position, and its interaction. Use nested Subject effects in the standard way to carry out the tests. (Hint: See MACorSinDogs.sas on the Math439 Web site, including for the ``standard way'' to test effects in nested subject models with one observation per cell.)
    Which of these three effects are significant? highly significant? For the significant effects, what are the P-values, and what are the degrees of freedom in the numerator and denominator for the F-distributions involved?

    (iii) Display an interaction plot for the two principal factors, Type and Position, with the factor with the larger number of levels on the X-axis. Is an interaction suggested? Why? Did you find that it was significant in part (ii)?

    (iv) Use the data in Table 1 to run a MANOVA analysis to test the effect of Type, assuming one vector-valued observation for each rabbit, with five observations for each Type. Is Type significant in the MANOVA analysis? What is the P-value? How does it compare with the P-value for Type in part (ii)?

  • Top of this page