Math 439 Homework 2 - Fall 2010

  • Click here for Math439 homework page

    HOMEWORK #2 due 10-19

    Arrange your answers in three parts in the following order:
              Part I: Your answers to all questions, either written by hand or using a word processor,
              Part II: The SAS programs (*.sas files) that you used for all problems in which you used SAS
              Part III: The output from the SAS programs in Part II.

              For all problems in which you use SAS, either copy or transcribe answers from the SAS output to Part I or else refer in Part I to specific pages in Part III by saying (for example) ``The scatterplot or matrix for Problem 3 is on page 17 of the SAS output (Part III).'' Make sure that you have consecutive page numbers on the SAS output in Part III by adding your own page numbers to the SAS output if necessary, so that (for example) you don't have several different page 1s in Part III. If you like, you can number pages as (for example) ``Page 3-2'' for the second page of output for Problem 3.

    Do problems 1-4 by hand, and problems 5-6 using SAS.

    1. (10)  Let X_1 and X_2 be two real-valued random variables. Show that

          Cov(X_1-X_2, X_1+X_2) = Var(X_1) - Var(X_2)  
    
    

    2. (10) Let A=A' be a 2 x 2 symmetric matrix with tr(A)>0 and det(A)>0. Prove that A is positive definite.
            (Hint: Use the spectral decomposition of A.)

    
    

    3. (20) Let X = (X1   X2   X3)'  be a random vector in R3 with (vector) mean E(X)=0 and covariance matrix

                  (  3   -4    1 )
        Cov(X) =  ( -4   10   -2 )
                  (  1   -2    3 )  
    Let   Y = X1 + 2X2 + 3X3   and   Z = X2 +4X3 .  Recall that Var(Z) means the variance of Z.
              (i) Find Var(X1) and Var(X2).
              (ii) Find Var(Y) and Var(Z) .
              (iii) Find the covariance Cov(Y,Z) = E(YZ) .
              (iv) Let W be the two-dimensional random vector W = (X_1   X_3)'.  Find the matrix Cov(W).
    
    

    4. (20) Assume that X = (X_1   X_2   X_3)'  is a vector-valued normal random variable with distribution N(mu_X, B) where

                (  5 )                 ( 0   0   0 )
         mu_X = ( -3 )    and    B  =  ( 0   2   3 )
                ( -2 )                 ( 0   3   5 )
     
    Consider the random vector Y = A X for
         A  =  ( 1   2   3 )
               ( 0   1   2 )  
              (i) What is the dimension of Y?   That is, if Y is R^r-valued, what is r?
              (ii) By results proven in class and in the text, Y is normal N(mu_Y, C) for some vector mu_Y and matrix C. Find mu_Y and C.
    
    

    5. (20) Use Proc IML in SAS to do the following:
              (i) Define and display a 40x6 matrix X whose entries are realizations of independent normally-distributed random variables with mean zero and variance one. The 40x6=240 displayed values should be mostly in the range -2 to 2 with a few values outside that range.
              (Hints: To start proc iml, just enter proc iml; and begin using proc iml commands, as in ThreeRegIml.sas or MLizards.sas on the Math 439 Web site. In proc iml, the command B=J(m,n) generates a mxn matrix all of whose entries equal one. If Y is any mxn matrix, W=normal(Y) generates an mxn matrix W whose entries are realizations of independent normally-distributed random variables with mean zero and variance one. Thus W=normal(Y) depends on Y only on its dimensions, although it also uses the value Y[1,1] (which you can change) as the starting seed of its random numbers.
              That is, consecutive runs of the program with the same starting seed Y[1,1] will yield identical random numbers. Setting Y[1,1]=0 tells SAS to seed its random numbers from the system clock, so that consecutive runs will give yield different results.)
              (ii) Show theoretically that the matrix W=X'X is an instance of a Wishart distribution W(6,40,I_6) (or W_6(40,I_6) in the textbook's notation). (Hints: Do this directly, or else use Problem 7(ii) on HomeWork #1. The beginning of Section 9, page 19, in the Multivariate Linear Models handout on the Math 439 Web site has a clearer definition of a Wishart distribution than does the text.)
              (iii) Display the matrix W=X'X.   As noted in part (ii), this is a 6x6 matrix that is an instance of a Wishart distribution W(6,40,I_6). In particular, the diagonal elements will be independent realizations of a chi-square distribution with 40 degrees of freedom while the off-diagonal terms will be generally smaller.
              (iv) Find and display the 6x6 correlation matrix Q of the columns of X.
              (Hint: See the proc IML code at the end of ThreeRegIml.sas or in MLizards.sas on the Math439 Web site. The file ThreeRegIml.sas has been updated since it was first handed out. As a check, the diagonal elements of Q should be all 1s and the off-diagonal terms should be in the range of -1 to 1.)
              (v) Find and display the 6 eigenvalues of Q. (Hint: Note the use of the function eigen(evals,evecs,...) at the end of the proc IML code in ThreeRegIml.sas on the Math 439 Web site.)
              (vi) Compute la_max/la_min, where la_max is the largest and la_min is the smallest of the 6 eigenvalues. (You can do this part by hand.) (Remark: If you have done this correctly, then la_max/la_min should be somewhere in the range of 2 to 8 or nearby. For many real data sets with more than 3 or 4 covariates, the value of la_max/la_min is much larger than this. This is an indication that the true dimensionality of many multidimensional data sets is much smaller than the actual number of covariates.)

    
    

    6. (20) Table 5.5 (page 150) in the text has four measurements on m=19 beetles from the flea beetle species Haltica oleracea and corresponding measurements from n=20 beetles of another flea-beetle species, H. carduorum. (See also the data file FleaBeetles.dat.)
              (i) Use SAS to carry out the Hotelling T^2 test for all four measurements y_1,y_2,y_3,y_4 to test the hypothesis H_0:E(X)=E(W), where X_1,...,X_m (each in R^4) represent the measurements from m=19 beetles from H. oleracea and W_1,...,W_n the measurements from the second flea-beetle species. Do you accept or reject H_0?
              (ii) From the output, what is the value of the associated F statistic for the multivariate test? What is the number of degrees of freedom, both in the numerator and in the denominator? How were these derived from the number of components in the observations (d=4) and the sample sizes (m,n)?
              (iii) Carry out two-sample t-tests on the four measurements y_1,y_2,y_3,y_4 individually. Which of these are significantly different between the two samples? What are the two-sided P-values?
              (Hints: See MLizards.sas on the Math439 Web site. Do not log-transform the data. If you use proc format to assign descriptive tags to the Species variable (=1,2), make sure that you use the correct species names. See Section 5.4.2 (page 122) in the text and Section 10 in the Multivariate Linear Models notes for the relationship between a Hotelling T^2 statistic and its associated F distribution.)
              (Warning: Make sure that SAS reports that you have measurements for 39 individual beetles.)

    
    

  • Top of this page