Recommended (By-Hand) Homework Assignments:
(HW1 due Jan 23)
Jan 14 - Chapter 2 - Section 2.1 - 6,9,10,11,12,14
Jan 16 - Chapter 2 - Section 2.2 - 16,17,18,20,22,27
Jan 18 - Chapter 2 - Section 2.3 - 28,29,30,32,33,34
HW1 Answers
Jan 21 - Martin Luther King Day
(HW2 due Jan 28)
Jan 23 - Chapter 2 - Section 2.4 - 35,38,40,41,42,46
Jan 25 - Chapter 2 - Section 2.5 - 48,49,50,52,53,54
HW2 Answers
(HW3 due Feb 4)
Jan 28 - Chapter 2 - Section 2.7 - 59,60,61,62,64,70
Jan 30 - Chapter 2 - Section 2.8 - 71,72,73,74,75,76
Feb 01 - Chapter 2 - Section 2.9 - 78,79,80,81,82,83
HW3 Answers
(HW4 due Feb 11)
Feb 04 - Chapter 3 - Section 3.1 - 1,2,3,4,5,6
Feb 05 - FIRST EXAMINATION
Feb 06 - Chapter 3 - Section 3.2 - 7,8,9,10,11
Feb 08 - Chapter 3 - Section 3.3 - 12,14,15,16,17,18
HW4 Answers
(HW5 due Feb 18)
Feb 11 - Chapter 3 - Section 3.4 - 20,21,22,23,24,26
Feb 13 - Chapter 4 - Sections 4.1,4.2 - 2,3,4,5,6,8
Feb 15 - Chapter 4 - Section 4.3 - 9,10,11,12,14
HW5 Answers
(HW6 due Feb 25)
Feb 18 - Chapter 4 - Section 4.4 - 26, 30,31,33,34,38
Feb 20 - Chapter 5 - Section 5.1 - 1,4,6,7,8
Feb 22 - Chapter 5 - Section 5.2 - 16,18,19,20,22,23
HW6 Answers
HW6a Answers
(HW7 due Mar 3)
Feb 25 - Chapter 5 - Sections 5.3,5.4 - 24,25,26,29,30,32
Feb 27 - Chapter 6 - Section 6.1 - 1,2,3,4,7,8
Feb 29 - Chapter 6 - Section 6.2 - 11,12,13,14,15,16
HW7 Answers
(HW8 due Mar 17)
Mar 03 - Chapter 6 - Section 6.3 - 17,18,20,22,24,30
Mar 04 - SECOND EXAMINATION
Mar 05 - Chapter 7 - Sections 7.1,7.2 - 1,7,8,12,13,16
Mar 07 - Chapter 7 - Sections 7.3,7.4 - 17,18,19,20,21,22
HW8 Answers
Mar 10 - Spring Break
Mar 12 - Spring Break
Mar 14 - Spring Break
(HW9 due Mar 26)
Mar 17 - Chapter 8 - Sections 8.1,8.2 - 1,2,3,6,7,8
Mar 19 - Chapter 8 - Sections 8.3,8.4 - 9,10,13,16,18,20
Mar 24 - Chapter 9 - Sections 9.1,9.2 - 5,6,8,11,14,16
HW9 Answers
(HW10 due Apr 04)
Mar 28 - Chapter 9 - Sections 9.3,9.4 - 17,20,22,27,28,32
Apr 02 - Chapter 10 - Sections 10.1,10.2 - 2,4,5,6,7,8
Apr 04 - Chapter 10 - Sections 10.3,10.4 - 9,10,15,16,20,24
HW10 Answers
(HW11 due Apr 14)
Apr 07 - Chapter 10 - Section 10.5 - 28,29,30,31,32,34
Apr 08 - THIRD EXAMINATION
Apr 09 - Chapter 11 - Sections 11.1,11.2,11.4 - 2,3,4,11,12,17
Apr 11 - Chapter 11 - Sections 11.5,11.6 - 22,23,28,30,34,37
HW11 Answers
(HW12 due Apr 21)
Apr 14 - Chapter 11 - Section 11.7 - 40,41,42,44,45,46
Apr 16 - Chapter 12 - Section 12.1,12.2 - 1,2,5,7,10,12
Apr 18 - Chapter 12 - Sections 12.3,12.4 - 18,19,20,21,22,28
HW12 Answers
(HW13 due Apr 28)
Apr 21 - Chapter 13 - Sections 13.1 - 2,3,6,7,9,13
Apr 23 - Chapter 14 - Sections 14.1,14.2 - 1,2,4,12,13,16
(Sections 14.3-14.4 will not be covered on the Final)
HW13 Answers
Apr 25 - Chapter 14 - Sections 14.3,14.4 - 19,20,21,23,24,25
Apr 28 - Reading Period
Apr 30 - Reading Period
May 02 - Final Examination (10:30 AM - 12:30 PM)
Required Computer Homework Assignments:
All assignments are to be done using SAS.
CHW1 due Feb 20 - | Computer HW1 | |
---|---|---|
CHW2 due Feb 27 - | Problems 4.12, 4.26, 5.2 - | (SAS hints for Computer HW2) |
CHW3 due Mar 03 - | Problems 5.22, 5.29, 6.7 - | (SAS hints for Computer HW3) |
CHW4 due Mar 17 - | Computer HW4 | |
CHW5 due Mar 24 - | Computer HW5 | |
CHW6 due Mar 31 - | Computer HW6 | |
CHW7 due Apr 07 - | Computer HW7 | |
CHW8 due Apr 14 - | Computer HW8 | |
CHW9 due Apr 21 - | Computer HW9 |
NOTE: See How to Format Computer Homework on the main Math3200 Web page for how Computer Homeworks should be formatted.
Note: `le' means less than or equal to. `ge' means greater than or equal to.
Computer HW1 due Wednesday Feb 20 by 4:45 PM:
See Suggestions for HOW TO ORGANIZE your answers
for Computer HW1 below.
See How to
Format Computer Homework in general.
(This is on the main Math3200 Web page.)
1. (a) Write a SAS program that generates 100 random
variables Xi that are uniformly distributed between 0
and 1. Use SAS to find the mean and standard deviation of the sample
of 100 r.v.s. How do they compare to the theoretical mean and standard
deviation of 100 random variables with that distribution?
Use the Xi to generate random integers
Yi with a discrete uniform distribution P(Yi=k)=1/10
for k=1,2,...,10. Is the sample distribution of the 100 random integers
close to 1/10 for each k with 1 le k le 10?
(b) Do the same as in part (a) with
10,000 r.v.s instead of 100. Do the results appear to improve?
(Hints: The function floor(x) in SAS (and
many other computer languages) returns the greatest integer m le x. If X
is U(0,1), then Y=1+floor(10*X) satisfies Prob(Y=k)=1/10 for k=1,2,...,10.
Use proc means
to keep track of X_i and proc
freq
to keep track of Y_i. See randlist.sas
and
SRSexamp.sas
on the Math3200 Web site for examples of the use
of proc means
and proc freq
.)
2. In a shaft and bearing assembly in a factory, the diameters
of the bearings, X, are normally distributed with mean = 0.526 inches and
standard deviation = 0.0035 inches. The diameters of the shafts, Y, are
normally distributed with mean = 0.525 inches and standard deviation =
0.0043 inches. An engineer who works at the factory, who had read
Section 2.9 of our textbook, said that the probability that the shaft
would fit inside the bearing (P(Y < X)) is 0.5716.
Test the engineer's assertion by using SAS to
generate 10,000 pairs of independent normally distributed r.v.s
Xi and Yi with the specified means and standard
deviations and count the proportion of pairs for which
Yi < Xi. What results do you obtain? Do
the results of this simulation appear to be consistent with the engineer's
statement, within the limits of sampling error?
(Hints: Use the SAS help pages to find the
syntax for using the rand
function to generate independent
normal r.v.s with mean mu and standard deviation sigma. Specifically, look
up either the RAND function
or streaminit
in the
SAS help and documentation pages under Help
on the SAS Main
Menubar. Look under Index
, but Search
should
also work. Once you generate independent random normal Xi and
Yi with the proper means and standard deviations, set
Ki=1 if Yi < Xi and
Ki=0 otherwise. You can define Ki by either an
``if-then-else'' construct or by setting
Ki=(Yi < Xi); SAS (and
other computer languages) evaluate a true expression as the number 1
and a false expression as the number 0. Since K is discrete, it can
be tabulated as in Problem 1.)
3. As in Problem 3.23 in the text, three pretzel workers
are each to use each of three different methods of formulating pretzels (a
hard pastry) by making use of nine preformulated blocks of dough. Each
block of dough is to be made into 50 pretzels and then baked, for a total
of 450 pretzels. It is important that not only the order of the three
methods be randomized for each worker, but also the order of times that
the nine batches of pretzels are baked in the same oven be randomized.
Use SAS to define a randomized schedule giving the
order of the nine procedures to be carried out.
(Hints: There are many ways of doing this.
One is to make up nine text strings of the form that Worker#X should use
Method#N for X,N=1,2,3, randomly permute the nine text messages, and then
display them. See e.g. SRSexamp.sas
on the Math3200 Web.)
HOW TO ORGANIZE your answers for Computer HW1:
Organize the homework that you hand in into three parts:
Part (I): Your answer to all three questions in your own words
Part (II): All of your SAS programs together
Part (III): All of your your SAS output
See How to Format Computer
Homework for more detail. (This is on the main Math3200 Web page.)
For Computer HW1, your answers in Part (I) should be something
like
Problem 1: For N=100, the mean and standard deviation of the 100 X
values and what their theoretical value are. (You could also use X_i as
the variable name if you like: X_i may be clearer in context than X.)
Then comment about whether the mean and standard of the 100 values seem
close to the theoretical values within sampling error. Then have a table
with the counts of the 100 Y (or Y_i) values for various
k=1,2,3,4,5,6,7,8,9,10 and their percentages as well, or else refer to an
explicit page number in Part (III) of your homework that has this
information. Comment about whether the percentages seems close to their
theoretical values within sampling error. Do all of the above again for
N=10,000. Comment about whether you see an improvement in fit to the
theoretical values.
NOTE: DO NOT PRINT OUT any SAS dataset with N=10,000 values! This
will produce approximately 130 pages of output that no one will ever look
at. You do not need this output for any sane reason. Deleting the `proc
print' statement that displays them will NOT AFFECT anything else in the
program unless you use some very exotic `proc print' options.
In general, the sample SAS programs have a `proc print'
statement after each SAS data step to make sure that the constructed SAS
data set is what it should be. After you check that the dataset looks
reasonable, you should delete these `proc print' statements unless
their output is extremely short (or unless the output is moderately short
and you want to keep them for reassurance). Your SAS output should never
be more than a few pages long unless you are analyzing a very complicated
problem. (This will not come up in Math3200, but might if you get a job
using SAS or in more advanced Stat courses.)
NOTE: In general, you can use `proc means' to calculate the
Mean and Standard Deviation of a column in a SAS data set. See
randlist.sas
and randlist.list
on the Math3200
Web site for an example. Similarly, you can use `proc freq' to
construct a table of values for a discrete variable. See
SRSexamp.sas
and SRSexamp.lst
on the Math3200
Web site for an example of such a table. `Proc means' and `proc freq' are
two of the most commonly-used SAS routines.
Problem 2: To answer this question, write down the fraction of
times that Y<X (or Y_i<X_i) in N=10,000 random simulations. Compare
this fraction with the asserted answer of 0.5716.
AN HISTORICAL NOTE: As you might have guessed, this problem is very
similar to Problem #18 on the Exam 1. Once you get using to SAS
or any computer language, this simulation might be considered an easier
way to do this problem. My impression is that, at least during the 1950s
and 1960s, most U.S. engineers would not have been able to do
Problem #18 theoretically but could easily do a simulation. During
the Cold War, many defense industries liked to hire perhaps one
mathematician for every 10 engineers.
SUGGESTION FOR DOING PROBLEM 2: Run this program with a
`proc print' statement with N=10 records to see if (a) the X,Y values
look like they might be normally distributed random variables with those
means and variances and (b) the K values are such that K=1 if Y<X and
K=0 otherwise. The proportion of K=1 values can be found from a `proc
freq' statement, but will not be very accurate for N=10.
Then change N=10 to N=10,000, DELETE THE `PROC PRINT'
STATEMENT, and run the SAS program again for a more accurate result.
Problem 3: To answer this problem, write down the randomized time
order of the nine experimental pretzel runs. If you must (a lazier
response but still OK), refer to Part (III) to a `proc print'
display with the same information after explaining what the permuted
strings mean. For a more complete answer, comment about exactly what the
experimenter should tell the three individual pretzel workers to do in
such a way that they don't get annoyed and get jobs at another pretzel
company.
SAS Hints for Computer HW2 due Wednesday Feb 27 by 4:45 PM:
(Problems 4.12, 4.26, 5.2)
See How to
Format Computer Homework on the main Math3200 Web page.
SAS hints for CHW2:
(i) SAS's proc univariate; var xx; run;
displays a huge number of single-sample statistics, including stem-leaf
and box plots. Note however that SAS's quantiles are overly rounded. You
can get text normal plots by saying
proc univariate normal plot; var xx; run;
and high-resolution
normal plots by proc univariate; probplot xx / normal; run;
WARNING: SAS's normal plots have (X,Y)
reflected in comparison with the normal plots in the text. In particular,
the appearance of outliers and light tails of distributions is reversed.
FOR EXAMPLE, outliers to the left appear in
SAS's (or STATA's) normal plots on the left and ABOVE a straight line
defined by the center of the distribution in SAS normal plots, as opposed
to BELOW the straight line on textbook normal plots, with the reverse for
outliers on the upper tail.
(ii) To enter space-separated numbers without regard for different lines
into a single SAS variable, enter (for example)
data mydata; input xx @@;
datalines;
14 25 32 17 99 12 221
114 2 57 577 14 11 14 -7
run;
This reads 15 numbers into a single column in the SAS dataset mydata.
Normally SAS reads one line, copies numbers into one or more variables,
then ignores the rest of the line. The ``trailing @@'' in this case
tells SAS to, instead, read one word at a time and to ignore line
structure. You still need a final run;
on its own line,
however.
(iii) See Problem 1, CHW1, for hints about generating discrete
uniform integer-valued r.v.s.
(iv) To make SAS's proc chart use integer values for its histograms rather than split up the range arbitrarily, try either
proc chart; vbar xx / discrete; run;
or
proc chart;
vbar xx / midpoints=1 to 6 by 0.50; run;
The first syntax tells SAS that xx is a discrete variable and it should use the discrete values for histogram blocks.
The second syntax tells SAS to use the histogram intervals that you
want SAS to use, not to split up the range into (for example)
seven equally-spaced intervals.
SAS Hints for Computer HW3 due Monday Mar 3 by 4:45 PM:
(Problems 5.22, 5.29, 6.7)
See How to
Format Computer Homework on the main Math3200 Web page.
SAS hints for CHW3:
(i) You can use rand('normal')
to return a random
N(0,1) and rand('normal',mu,sigma)
to return a random
N(mu,sigma^2)
(ii) Use SAS's function finv
to return F-distribution
quantiles and compare with Table A.6. (See Samptt.sas
on
the Math3200 Sample SAS programs Web site.) Remember than
finv()
returns quantiles while Table A.6 has critical
values, so don't forget to convert. Alternatively, use simulation to check
the Table A.6 values. (Do this one way or the other, but you needn't
do both ways. The second way requires more programming.)
(iii) If you say proc means Mean Stddev Stderr;
then
SAS will print the SEM as well as the data sample standard deviation.
Computer HW4 due Monday Mar 17 by 4:45 PM:
See How to
Format Computer Homework in general.
(This is on the main Math3200 Web page.)
Prob 1. Coverage Probabilities for two Confidence Intervals: The coverage probability of a confidence interval for an unknown parameter mu is the probability that the confidence interval actually contains mu. Thus the coverage probability of a true 95% confidence interval is 0.95, but may differ for an approximate or an incorrect 95% confidence interval.
Probs 2 and 3. Problems 7.13 and 7.16. Also, find the P-values in both problems.
Hints: Prob 1:: Note that you will have to generate 1000*5
independent N(7,5^2)s. You can EITHER (I) Within a ``do'' loop of
size N=1000, generate five N(7,5^2)s, set XZ=1 if the corresponding
Z-interval contains the value 7 (otherwise XZ=99), and set XT=1 if
the T-interval contains 7 (otherwise XT=99). Then use proc freq; with
tables XZ XT; to find the sample proportions. ALTERNATIVELY, you can
(II) generate 5000 N(7,5^2)s in 1000 sets of 5 random normals, with
each set of 5 having the same value of an index varible i. Then say
``proc means noprint data=mydata; var xx; by i; out=myoutdata Mean=Xbar
std=Stddev n=n; run;'' to write a second dataset `myoutdata' with 1000
rows, each of which has Xbar and Stddev (Sx) for a set of 5 random
variables. (See samptt.sas
on the Math3200 Web site for a
similar use of ``proc means noprint.... out=...'', which is a clever idea
due to Ed Spitznagel.) Then re-open `myoutdata' to add values for XZ and
XT to the dataset and proceed as in (I). (If you really want to
become a SAS expert, you should do it both ways, but you should only hand
in one.)
WARNING: DO NOT PRINT OUT the samples for N=1000, which will
take approximately 88 printed pages. Good programming practice would be to
first write the code with N=10, then print out the N=10 samples, then
DELETE THE PROC PRINT STATEMENT and change N=10 to N=1000.
Probs 2 and 3: Consider using SAS's proc ttest
. By
default, proc ttest; var xx; run;
does a two sided test for
H0:mu=0. Say proc ttest H0=Muval; ...
to test for H0:Muval.
(You can also use proc means
.) Note that the lower limit of a
one-sided lower 95% confidence interval for the mean is the same as
the lower limit of a symmetric 90% confidence interval, which you
can get by saying proc ttest ... alpha=0.10; ...
.
Warning: Problem 7.13(b) asks for a one-sided P-value, which you
may have to convert from a two-sided P-value.
Computer HW5 due Monday Mar 24 by 4:45 PM:
See How to
Format Computer Homework the main Math3200 Web page.
Problems 8.8, 8.13, 8.21 (see following)
Problem 8.8: Also find the two-sided P-value for a difference in
means. (Hint: The `proc ttest' syntax for a matched-pair design is
proc ttest data=mydata; paired yy*xx; run;
. See the SAS Help
and Documentation for proc ttest;
.)
Problem 8.13: Hints: The `proc ttest' syntax for two
independent samples is proc ttest; class group; var zz; run;
.
WARNING: The CI in the output for the mean after Diff(1-2)
is
the CI using the pooled variance assuming equal variances and Std
Err
is the SEM using the pooled variance. Since the sample sizes
are equal in this case, the pooled and unpooled variances are the same.
The pooled and unpooled SEMs (standard errors) are also the same, but the
degrees of freedom may differ. Find the two CIs from Xbar-Ybar and
SEM(Diff(1-2)) in the output using the two appropriate critical values.
Problem 8.21: Do (a) and (b): (a): Instead of a 90% confidence interval for the ratio of variances sigma_1^2/sigma_2^2, find the two-sided P-value for H_0:sigma_1^2=sigma_2^2. In `proc test' output, the Folded F is max(s_1^2/s_2^2, s_2^2/s_1^2) for the two samples and the Folded-F P-value is the correct two-sided P-value for a difference in population variances.
Computer HW6 due Monday Mar 31 by 4:45 PM:
See How to
Format Computer Homework the main Math3200 Web page.
Problems 9.14, 9.18, 9.32 (see following)
Hint: See the sample SAS program SampTables.sas
and
output on the Math3200 Web site.
On Problem 9.32: ALSO FIND which two cells in the 4x4 table make
the largest individual contributions to the overall Pearson chi-square
statistic. Is this consistent with what you would have expected for how
hair and eye color is distributed in the general population? (Hint:
See the last example in SampTables.sas
, which shows how to
tell SAS to display ```Observed-Expected'' and
`(Observed-Expected)^2/Expected'' for each cell in the table. Recall that
the overall Pearson chi-square statistic is the sum of the latter for all
cells in the table.)
Computer HW7 due Monday Apr 07 by 4:45 PM:
See How to
Format Computer Homework the main Math3200 Web page.
Problems 10.4, 10.24, 11.2 (see below)
Hint: See the sample SAS program Samp_Reg1.sas
and
output on the Math3200 Web site.
Problem 10.4: ALSO FIND the P-value for a two-sided test of H_0:b_1=0. Do you accept or reject at alpha=0.05? What is the degrees of freedom of the associated t-statistic?
Problem 10.24: ALSO FIND the P-value for a two-sided test of H_0:b1=0 and find a symmetric 95% confidence interval for b_1.
Problem 11.2: (Hints: See the last proc reg
call in Samp_Reg1.sas
. Make sure that your dataset has the
correct three columns.)
Computer HW8 due Monday Apr 14 by 4:45 PM:
See How to
Format Computer Homework the main Math3200 Web page.
Problems 11.15, 11.34, 11.37
Hints: (i) See the sample SAS program
Computer HW9 due Monday Apr 21 by 4:45 PM:
Problems 11.44, 12.7cd, 12.26 (see below)
Problem 11.44:
Problem 12.7: Parts (c,d) for post-treatment differences only.
Problem 12.26:
HINTS: (1) To have SAS generate side-by-side boxplots of a
variable YY for different values of a variable TYPE, try
Samp_Reg3.sas
on
the Math3200 Web site for options that tell SAS to add to
(ii) If you enter proc corr data=mydata; var xx yy zz; run;
,
in SAS, then SAS will provide a 3x3 matrix whose non-diagonal entries have
the appropriate correlation coefficients r_{ij} along with P-values for
H_0:rho_{ij}=0.
See How to
Format Computer Homework the main Math3200 Web page.
For the data in Exercise 11.39, introduce dummy variables Oil for
Industry=1 and Drug for Industry=2 and consider 4 possible predictors,
Profit, Growth, Oil, and Drug. (See the answers for Problem 11.39 on
p702 at the back of the book.) Then
(i) Do a stepwise regression with SAS' default settings of SLENTRY=0.15
and SLSTAY=0.15. (Hint: See the comments and procedures in
Apples.sas
on the Math3200 Web site.) Which variables are
included in the model? In a regression on these variables only, which are
significant? What are their P-values?
(ii) Do a ``best-subsets'' regression with the Mallow C_p criterion.
Compare with the results from part (i).
(i) Have SAS create side-by-side boxplots for the five time values. Do the
plasma citrate levels stand out at any particular time? (Note the warning
about boxplots below.)
(ii) Have SAS generate an ANOVA table for a randomized block design with
Time as the treatment factor and Person as a blocking factor. Are there
significant differences between times? between persons? What are the
P-values in both cases?
(iii) Use both the LSD and the Tukey method to determine which pairs of
the five times are significantly different at alpha=0.05. Is this what you
would have guessed from the boxplots?
proc boxplot data=mydata; plot YY*TYPE; run;
WARNING: The dataset must be sorted by TYPE. More exactly,
all observations with the same values of TYPE must be together. Otherwise,
the result may be dozens of very tiny boxplots.
(2) To have SAS test whether the means of a response variable YY
are the same for all levels of a categorical variable Cat, try
proc glm data=mydata; class Cat;
model YY=Cat; run;
See for example OneWay.sas
on the Math3200 Web site. The
class
statement is necessary to tell SAS that Cat is a
categorical variable and not a numerical variable on which it should do a
simple regression.
If there is also a blocking factor (Bloc) as in a randomized block
design, change this to
proc glm data=mydata; classes Cat Bloc;
model YY=Cat Bloc; run;
Last modified April 26, 2008