TAKEHOME FINAL due Thursday 12-18 by 5:30 P.M.
Hand in either to Professor Sawyer or to the receptionist in the
Mathematics Office.
NOTE: There should be NO COLLABORATION on the takehome final,
other than for the mechanics of using the computer.
Open textbook and notes (including course handouts).
ORGANIZE YOUR WORK in the following manner:
NOTE: In the following, _ means subscript, ^ means superscript, le means `less than or equal', and ge means `greater than or equal'.
Whole problems are equally weighted, but different parts of problems may
be weighted differently.
Seven (7) problems.
1. (i) Let Z = (Y X)' be a 5x1 vector written in block form for a 3x1 vector Y and and 2x1 vector X. Suppose that Z has a five-dimensional joint normal distribution with covariance and mean given in block matrix form by
( 6 3 3 | -1 -1 ) ( 1 ) ( 3 2 1 | 0 -1 ) ( 3 ) Cov(Z) = ( 3 1 7 | 1 1 ) E(Z) = ( 1 ) ( --------------------- ) ( ----) ( -1 0 1 | 7 3 ) ( 1 ) ( -1 -1 1 | 3 5 ) ( 3 )Find the (3x3) conditional covariance matrix Cov(Y | X=(5 3)') and the (3x1) conditional mean vector E(Y | X=(5 3)'). Do this either by hand or else by using SAS's
proc iml
or a
comparable matrix language.
(ii) Find the eigenvalues of the 3x3 matrices Cov(Y) and Cov(Y | X=(5 3)'). Are they similar or very different?
proc iml
by either setting (for example)
yy = { 1 2 3, 4 5 6, 7 8 9 };
(note curly braces),
where spaces mean ``same row'' and commas mean ``start of new row''. (This
defines a 3x3 matrix.) Alternatively, you can define data sets using SAS
datasteps and import columns to a matrix in proc iml
. (See
examples on the Math439 Web site.) You can find eigenvalues and
eigenvectors of symmetric matrices in proc iml
by using the
function call eigen
. (See PCAApples.sas
on the
Math439 Web site for an example.))
2. An experimenter measures a response Y_i along with four
covariates, which she imaginatively calls X1, X2, X3, and X4. (More
exactly, Xi1, Xi2, Xi3, and Xi4 for the i-th observation.) She carries out
a linear regression of Y_i on X1,X2,X3,X4 (including an intercept term)
under the assumption that the errors are independent normal with the same
error covariance sigma^2. Data for the n=50 observations are contained in
the file Experiment.dat
.
proc iml
in SAS or a similar matrix package,
find (a) the least-square or ML estimators of the five coefficients
beta_i in the regression (including the intercept), (b) T-statistics
for the tests H_0:beta_i=0, and (c) Student-t P-values for H_0 in
each case.
ExampReg3
on the Math439 Web site.)
3. A colleague of the experimenter remarks that while the estimates of the coefficients beta_2 and beta_3 in Problem 3 appear different, theory suggests that the parameters may be the same. The colleague wonders if the experimenter has found evidence that beta_2 ne beta_3.
4. The file HRatWeights.dat
contains weekly
gains over four weeks for three groups of rats. The first group (Group=1)
was the control and was given no extra ingredients in their water,
Group 2 was given thyroxin, and Group 3 was given thiouracil.
proc means mean; class group; var
y1-y4; run;
, but use your own variable names instead of
group
and y1 y2 y3 y4
.)
5. A naturalist makes 4 measurements (Height, Width, Tail
Length, Length) on 50 lizards of a particular species as a function of the
Altitude at which the lizard was collected. The data is in the file
Dat4aLizards.dat
on the Math439 Web site.
MrEgyptSkulls.sas
on the Math439
Web site, as well as the handout on Multivariate Linear Models on the
Math439 Web site.)
L
for Altitude less than or equal to 10 and H
for Altitude greater than 10. Can you see a trend from the upper left
to the lower right in the scatterplot as the altitude increases or
decreases? (If so, in which way?) Is this consistent with the parameter
estimates and P-values in part (iii)?
ASym
in the SAS data step by an if-then-else
statement like, ``if Altitude<10 then ASym='L'; else
Asym='H';
''.)
6. Consider the lizards whose measurements are in the data
set Dat4aLizards.dat
and whose Altitude is either less than
or equal to 8.0 (call these Type=1) or greater than or equal to 12.0 (call
these Type=2). (The remaining lizards are discarded.) For simplicity, call
the 4 lizard measurements y1 y2 y3 y4 instead of Height Width Tail_Length
Length. The naturalist is interested in finding a rule that depends only
on y1-y4 and that will classify most Type=1 lizards as Type=1 (i.e., low
altitude) and most Type=2 lizards as Type=2.
with the property that L(data)>0 predicts Type=1 and L(data)<0
predicts Type=2. Assume that SAS's default assumptions for proc
discrim
holds for these lizards.
Dgaussdiscrim.sas
. In
particular, the coefficients of the function L(data) can be computed as
the difference between two vectors in the output of SAS's proc
discrim
.)
proc stepdisc
in the comments in
Dgaussdiscrim.sas
.) Do you end up with the same variables
that were significant in Problem 5(iii)?
if Type=1 or
Type=2 then output;
'' in the data step that reads
Dat4aLizards.dat
, assuming that lizards found at altitudes
strictly between 8.0 and 12.0 are assigned Type=3. In a SAS data step, if
you ever use the command ``output'', then ONLY those records for which you
say ``output'' will appear in the corresponding SAS dataset. This gives
you a way of dropping the excess Type=3 records. Make sure that this works
by using a proc print
statement.
Dgaussdiscrim.sas
.)
7. Apply a logistic regression to the lizard variables y1-y4
for the lizards that you used in Problem 6. (That is, Type=1 or
Type=2.) (Hint: See Dlogistic.sas
on the Math439 Web
site.)
Dlogistic.sas
on the Math439 Web site.)
/
selection=backwards;
to the end of the model
statement
in proc logistic
.) Do you end up with the same subset of
variables as in Problem 6?