Math 322 Biostatistics
Professor Wickerhauser
|
|
NEWS
|
R EXAMPLES
Example R commands for:
- mean and median.
- histograms and samples.
- deviation and diversity.
- combinations and
permutations.
- box plots and confidence
intervals.
- use, power and sample size
in t tests.
- Student-t test power demo
- Student-t tests from reduced data
- multinomial pdf calculation and sampling.
- Mann-Whitney, Wilcoxon,
McNemar, and median comparison tests.
- single-factor analysis of
variance with unequal replication.
- multiple comparison of
means.
- homoscedasticity tests.
- two- and three-factor ANOVA.
- MANOVA demo.
- bivariate normal density
calculation, sampling and estimation.
- simple linear regression.
- simple correlation.
- multiple linear regression
and prediction.
- Kendall's W.
- partial correlation
coefficients.
- goodness of fit.
- tests of independence in
contingency tables.
- Fisher's exact test.
- binomial and hypergeometric
densities.
- Poisson density and one
randomness test.
- tests of serial
randomness.
- installing the contributed
package "tseries".
- Gibbs sampling (from the February
28th, 2011 lecture).
- rejection, Metropolis, and Metropolis-Hastings
sampling (from the March 2nd, 2011 lecture).
- Multivariate visualization, principal
components, and linear discriminant analysis.
- Classification trees, with gene data clean-up.
- Clustering by means, medoids,
agglomerative and divisive trees.
- Multidimensional scaling by IsoMap.
- Bootstrap method to estimate sampling
error in non-normal PDFs.
|
LINKS
- Cleaned-up NCI microarray data, 57 samples of 8 cancers with top
12 expressed genes: nci57x13.R, to be saved
into your R folder and read into the R session with load("nci57x13.R").
- NCI microarray data on 14 cancers:
- nci.info: some information on the data
- nci.names: just the 64 names identifying
14 cancers, to label the 64 rows of gene expression data.
- nci.data: gene expression data, 64 rows
of 6830 gene expression values. HINT: Save this to a file.
- Get the function bvnpdf() from this link,
to compute bivariate normal pdfs. Read the code for usage instructions.
- There is an online Octave
to R dictionary, useful
for those who know MatLab or Octave well and want to learn the
corresponding R commands.
- Example R program dagopear.R to perform
the D'Agostino-Pearson test of normality and compute the associated
statistics.
-
Example R program cochran.R for Cochran's test of a
dichotomous variable without replication.
-
Standard C program cochran.c for Cochran's test of a
dichotomous variable without replication.
-
R program kendall.w.R to compute Kendall's
coefficient of concordance (Kendall's W). Call it using
kendall.w(tab)
where tab is a matrix with scores (or ranks) along its rows.
-
In R, use solve(A) to compute the matrix inverse of a matrix A. Use
A <- matrix( c(1,3,2,4), nrow=2, ncol=2)
to get a 2x2 matrix with columns (1;3) and (2;4), rows (1,2),(3,4).
- Function deduct.R to solve HW 1 problem 6.
-
Standard C program deduct2.c for the sequence
counting example done in class.
-
Standard C program anova.c for single-factor
analysis of variance with unequal replication.
-
Standard C program anova2.c for two-factor
analysis of variance with equal replication.
-
Standard C program anova3.c for three-factor
analysis of variance with equal replication.
-
Standard C program m2anova2.c for bi-variate
two-factor analysis of variance with equal replication.
- Three-way ANOVA formulas.
- Three-way ANOVA SAS
documentation.
- Download old free MatLab (for
Windows or Linux PCs) from this site.
- Open-source software R
for statistical computing, and its manual.
- Download R from WUStL's software
archive.
- R program test.R with function "runs.test" to compute
the nonparametric runs test for serial randomness. Call it using
runs.test(x)
where x is a factor time series with two levels. Source:
R project tseries home page
|
Syllabus
Topics. This is a second course in elementary statistics with
applications to life sciences and medicine. It reviews basic statistics using
biological and medical examples. New topics include incidence and prevalence,
medical diagnosis, sensitivity and specificity, Bayes rule, decision
making, maximum likelihood, logistic regression, ROC curves and survival
analysis. Each student will be required to perform and write a report on a
data analysis project.
Prerequisites. Math 3200, or Math 2200 and the permission of
the instructor.
Time. Classes meet Mondays, Wednesdays and Fridays, 12:00
noon to 1:00 pm, in the Psychology Building, Room 251.
Text. The lectures will follow
Statistics Using R with Biological
Examples by Kim Seefeld and Ernst Linder, an e-text that you
may download freely. (Alternative
local link.)
If you desire a paper copy, you may have it
printed and bound at any copy shop from this PDF file.
Supplementary reading:
-
The
Analysis of Variance by Hardeo Sahai and Mohammed I. Ageel. ISBN
0-8176-4012-6, Birkhaeuser, 2000.
-
Manual for
R, an
open-source statistical computing software package.
Homework. You are encouraged to collaborate on homework and to
work additional exercises from the indicated problem sections, although the
homework grade will be based only on the exercises listed below. Please
return your solutions to the instructor by the end of class. Problem sets
will be assigned as follows:
- HW #1, due Fri, Jan 27
(Solutions)
- HW #2, due Fri, Feb 3
(Solutions)
- HW #3, due Fri, Feb 10
(Solutions)
- HW #4, due Fri, Feb 17
(Solutions)
- HW #5, due Fri, Feb 24
(Solutions)
- HW #6, due Fri, Mar 2
(Solutions)
|
|
- HW #7, due Fri, Mar 23
(Solutions)
- HW #8, due Fri, Mar 30
(Solutions)
- HW #9, due Fri, Apr 6
(Solutions)
- HW #10, due Fri, Apr 13
(Solutions)
(R codes)
- HW #11, due Fri, Apr 20
(Solutions)
(R codes)
- HW #12, due Fri, Apr 27
(Solutions)
(R codes)
|
|
Solutions are due at the end of class on the due date. Late homework
will not be accepted.
Tests. There will be one midterm examination in class on
Friday, March 9th, 2012.
There will be one cumulative take-home
final examination emphasizing the remaining
material. It is due on
Wednesday May 9th, 2012, at 4:00 PM, in Cupples I, room 100 (the
Mathematics Department office). Classroom time is set aside on
Wednesday May 9th, 2012, from 10:30 AM until 12:30 PM for students'
convenience.
Project. There will be one data analysis project due at 4:00 pm
on Friday, May 4th, 2012, in Cupples I, room 100. Late projects
will not be accepted. Projects may be selected from this list, or chosen by the student with the prior
approval of the instructor.
Grading. One score will be assigned for homework, one for the
midterm examination, one for the final examination, and one for the term
project. These four will contribute in respective shares of 40%, 20%, 20%,
and 20% to the course score. Letter grades, computed from the course score,
will be at least the following:
| Course score at least: | 90% | 80% | 70% | 60% |
Letter grade at least: | A | B | C | D |
Students taking the Cr/NCr or P/F options will need a grade of D or better to
pass.
Computing. Students are encouraged to use R on their own computers or on
the computers available in the Arts and Sciences Computing
Center for both symbolic and numerical computations.
Office Hours. Mondays 4-5pm, or by appointment.
Questions? Return to
M. Victor Wickerhauser's home page for contact information.