Math 322
Biostatistics

Professor Wickerhauser

NEWS

R EXAMPLES

Example R commands for:

LINKS

  • Cleaned-up NCI microarray data, 57 samples of 8 cancers with top 12 expressed genes: nci57x13.R, to be saved into your R folder and read into the R session with load("nci57x13.R").
  • NCI microarray data on 14 cancers:
    • nci.info: some information on the data
    • nci.names: just the 64 names identifying 14 cancers, to label the 64 rows of gene expression data.
    • nci.data: gene expression data, 64 rows of 6830 gene expression values. HINT: Save this to a file.
  • Get the function bvnpdf() from this link, to compute bivariate normal pdfs. Read the code for usage instructions.
  • There is an online Octave to R dictionary, useful for those who know MatLab or Octave well and want to learn the corresponding R commands.
  • Example R program dagopear.R to perform the D'Agostino-Pearson test of normality and compute the associated statistics.
  • Example R program cochran.R for Cochran's test of a dichotomous variable without replication.
  • Standard C program cochran.c for Cochran's test of a dichotomous variable without replication.
  • R program kendall.w.R to compute Kendall's coefficient of concordance (Kendall's W). Call it using
     kendall.w(tab)
    where tab is a matrix with scores (or ranks) along its rows.
  • In R, use solve(A) to compute the matrix inverse of a matrix A. Use
     A <- matrix( c(1,3,2,4), nrow=2, ncol=2)
    to get a 2x2 matrix with columns (1;3) and (2;4), rows (1,2),(3,4).
  • Function deduct.R to solve HW 1 problem 6.
  • Standard C program deduct2.c for the sequence counting example done in class.
  • Standard C program anova.c for single-factor analysis of variance with unequal replication.
  • Standard C program anova2.c for two-factor analysis of variance with equal replication.
  • Standard C program anova3.c for three-factor analysis of variance with equal replication.
  • Standard C program m2anova2.c for bi-variate two-factor analysis of variance with equal replication.
  • Three-way ANOVA formulas.
  • Three-way ANOVA SAS documentation.
  • Download old free MatLab (for Windows or Linux PCs) from this site.
  • Open-source software R for statistical computing, and its manual.
  • Download R from WUStL's software archive.
  • R program test.R with function "runs.test" to compute the nonparametric runs test for serial randomness. Call it using
     runs.test(x)
    where x is a factor time series with two levels. Source: R project tseries home page

Syllabus

Topics. This is a second course in elementary statistics with applications to life sciences and medicine. It reviews basic statistics using biological and medical examples. New topics include incidence and prevalence, medical diagnosis, sensitivity and specificity, Bayes rule, decision making, maximum likelihood, logistic regression, ROC curves and survival analysis. Each student will be required to perform and write a report on a data analysis project.

Prerequisites. Math 3200, or Math 2200 and the permission of the instructor.

Time. Classes meet Mondays, Wednesdays and Fridays, 12:00 noon to 1:00 pm, in the Psychology Building, Room 251.

Text. The lectures will follow Statistics Using R with Biological Examples by Kim Seefeld and Ernst Linder, an e-text that you may download freely. (Alternative local link.) If you desire a paper copy, you may have it printed and bound at any copy shop from this PDF file.

Supplementary reading:

Homework. You are encouraged to collaborate on homework and to work additional exercises from the indicated problem sections, although the homework grade will be based only on the exercises listed below. Please return your solutions to the instructor by the end of class. Problem sets will be assigned as follows:
Solutions are due at the end of class on the due date. Late homework will not be accepted.

Tests. There will be one midterm examination in class on Friday, March 9th, 2012. There will be one cumulative take-home final examination emphasizing the remaining material. It is due on Wednesday May 9th, 2012, at 4:00 PM, in Cupples I, room 100 (the Mathematics Department office). Classroom time is set aside on Wednesday May 9th, 2012, from 10:30 AM until 12:30 PM for students' convenience.

Project. There will be one data analysis project due at 4:00 pm on Friday, May 4th, 2012, in Cupples I, room 100. Late projects will not be accepted. Projects may be selected from this list, or chosen by the student with the prior approval of the instructor.

Grading. One score will be assigned for homework, one for the midterm examination, one for the final examination, and one for the term project. These four will contribute in respective shares of 40%, 20%, 20%, and 20% to the course score. Letter grades, computed from the course score, will be at least the following:

Course score at least:90%80%70%60%
Letter grade at least:ABCD

Students taking the Cr/NCr or P/F options will need a grade of D or better to pass.

Computing. Students are encouraged to use R on their own computers or on the computers available in the Arts and Sciences Computing Center for both symbolic and numerical computations.

Office Hours. Mondays 4-5pm, or by appointment.


Questions? Return to M. Victor Wickerhauser's home page for contact information.