HOMEWORK #4 due 11-11
In the following, _ means subscript and ^ means superscript.
NOTE: In all problem sets that use SAS, arrange your answers into three parts, in the following order:
In Part I, you can refer to plots or tables or large matrices that problems ask for by saying (for example), ``The scatterplot or matrix for Problem 3 is on page 17 of the SAS output.'' If necessary, add page numbers to the SAS output, so that (for example) you don't have several different page 1s in Part III.
1. Heights and weights for 58 tribesmen on a tropical island are given in Table 1, with 33 from one tribe and 25 from a second tribe. Both tribes are known to be highly inbred. Both tribes are considered to be odd by other tribes on the island.
Table 1 - Heights and Weights for Tribesmen on a Tropical Island Tribe A (n=33): 47 166 49 163 51 152 51 157 52 143 53 145 53 153 54 136 54 152 54 155 54 163 55 136 56 140 56 149 57 132 57 133 58 125 58 135 60 129 60 129 62 138 66 110 66 130 66 145 67 113 68 118 68 135 69 116 70 124 70 132 74 119 75 100 77 97 Tribe B (n=25): 49 170 51 158 51 164 54 162 57 148 59 135 59 155 61 123 61 133 64 122 65 121 65 129 65 132 65 138 66 142 67 134 67 141 67 148 68 121 68 136 69 148 70 114 71 128 73 120 73 130
(i) Do the two tribes differ significantly by height? by weight? Do Student t-tests to test the appropriate hypothesis in each case.
(ii) Do the two tribes differ in (height,weight) together, considered as a vector? Do a Hotelling T^2 test to find out.
(iii) What are the (Pearson) correlation coefficients between height and weight within each tribe? Are they similar?
(iv) What are the covariance matrices of (height,weight) within each
tribe? What is the pooled covariance matrix for the two tribes together?
(Hint: Use either dedicated SAS procedures or proc
iml
within SAS or both.)
2. Use Proc IML
in SAS to do the following:
Proc IML
, the command B=J(m,n)
generates a mxn matrix all of whose entries equal one. If Y is any mxn
matrix, W=normal(Y)
generates an mxn matrix W whose entries
are realizations of independent normally-distributed random variables with
mean zero and variance one. Thus W=normal(Y)
depends on Y
only through its dimensions.)
proc iml
in
MPairedSamp.sas
or PCAApples.sas
on the Math439
Web site. As a check, the diagonal elements of Q should be all 1s and the
off-diagonal terms will be in the range of -1 to 1.)
eigen(evals,evecs,aa)
in
PCAApples.sas
.)
3. Let X be a normally distributed random dx1 column vector with E(X)=0 and Cov(X)=A. (That is, X is N(0,A) where A is dxd.) Assume that A is invertible. Show that X'A^{-1}X has a chi-square distribution with d degrees of freedom. (Hint: Find a dxd matrix B such that N=BX is N(0,I_d) and note that X=B^{-1}N.)
4. Aggregate data for five demographic variables in 14 census tracts in the Madison, Wisconsin, area are given in Table 2.
Table 2 - Data for 14 US census tracts near Madison, Wisconsin # From Johnson&Wichern, ``Applied Multivariate Statistical Analysis'', # 5th ed, 2002, Table 8.5, p470 # Variables: TotalPopn(1000s), Median Years of Schooling, TotalEmployed(1000s) # Health Services Employment (100s), Median Home Value ($10,000s) TotPop MedSchYr TotEmploy HealthEmp MedValHom Tract01 5.935 14.2 2.265 2.27 2.91 Tract02 1.523 13.1 0.597 0.75 2.62 Tract03 2.599 12.7 1.237 1.11 1.72 Tract04 4.009 15.2 1.649 0.81 3.02 Tract05 4.687 14.7 2.312 2.50 2.22 Tract06 8.044 15.6 3.641 4.51 2.36 Tract07 2.766 13.3 1.244 1.03 1.97 Tract08 6.538 17.0 2.618 2.39 1.85 Tract09 6.451 12.9 3.147 5.52 2.01 Tract10 3.314 12.2 1.606 2.18 1.82 Tract11 3.777 13.0 2.119 2.83 1.80 Tract12 1.530 13.8 0.798 0.84 4.25 Tract13 2.768 13.6 1.336 1.75 2.64 Tract14 6.585 14.9 2.763 1.91 3.17
proc iml
to
do a Principal Components Analysis for the data in Table 2. How many
principal components are required to explain at least 85% of the total
variation in the data? (Hint: See e.g. PCAApples.sas
on the Math439 Web site.)
5. Annual reports for 1990 for the 10 largest US companies are given in Table 3.
Table 3 - Data for the 10 largest US Corporations in 1990 # From Johnson&Wichern, ``Applied Multivariate Statistical Analysis'', # 5th ed, Problem 1.4, p39, 2002 # Source: Fortune Magazine (April 23, 1990) p346-367 Co 1990 Time Inc. # All numbers are in millions of dollars. Sales Profits Assets General_Motors 126974 4224 173297 Ford 96933 3835 160893 Exxon 86656 3510 83219 IBM 63438 3758 77734 General_Electric 55264 3939 128344 Mobil 50976 1809 39080 Philip_Morris 39069 2946 38528 Chrysler 36156 359 51038 Du_Pont 35209 2480 34715 Texaco 32416 2413 25636
proc princomp
(or a matrix package) to
do a Principal Components Analysis for the data in Table 3. How many
principal components are required to explain at least 90% of the variation
in the data? What percentage of the variability of the data in
Table 3 is explained by these principal components?
MensTrackPCA.sas
on the Math439 Web site.)
Company
for the company
name in Table 3 in a SAS data step, include the command
length Company $16;
before the input
command.
Otherwise SAS will truncate the company name to 8 characters and you
won't be able to tell General_Motors from General_Electric. The
length
command tells SAS to allow up to 16 characters.)