Example R commands for:
- mean and median.
- histograms and samples.
- deviation and diversity.
- combinations and
- box plots and confidence
- use, power and sample size
in t tests.
- Student-t test power demo
- Student-t tests from reduced data
- multinomial pdf calculation and sampling.
- Dirichlet pdf plotting in 3 variables.
- Matrix entry; Mann-Whitney, Wilcoxon,
McNemar, and median comparison tests.
- single-factor analysis of
variance with unequal replication.
- multiple comparison of
- homoscedasticity tests.
- two- and three-factor ANOVA.
- MANOVA demo.
- bivariate normal density estimation,
sampling, and plotting with persp() and contour().
- simple linear regression.
- simple correlation.
- multiple linear regression
- Kendall's W.
- partial correlation
- goodness of fit.
- tests of independence in
- Fisher's exact test.
- binomial and hypergeometric
- Poisson density and one
- tests of serial
- installing the contributed
- Gibbs sampling (from the February
28th, 2011 lecture).
- rejection, Metropolis, and Metropolis-Hastings
sampling (from the March 2nd, 2011 lecture).
- GenBank data and goodness of
fit (from the March 30th, 2011 lecture).
- Multivariate visualization, principal
components, Mahalanobis distance, and linear discriminant analysis.
- Classification trees, with gene data clean-up.
- Clustering by means, medoids,
agglomerative and divisive trees.
- Multidimensional scaling by IsoMap.
- Bootstrap method to estimate sampling
error in non-normal PDFs.
- Example Midterm (2016).
- Open-source software R
for statistical computing, and its manual.
- Download R from WUStL's software
Studio from its developer's website.
- Download a precompiled executable Maxima, for Windows,
- Maxima project home page,
for sources, documentation, links, and precompiled binary downloads for Linux,
Macintosh and other systems.
- Download old free MatLab (for
Windows or Linux PCs) from my website.
- There is an online Octave
to R dictionary, useful
for those who know MatLab or Octave well and want to learn the
corresponding R commands.
- R program in file deduct.R to solve HW 1's DNA
sequence counting problem.
- R program in file faker.R. Then "faker(n, mu, sd)"
generates n>1 samples with exact prescribed mean mu and exact standard
- Notes (condprob.pdf) on conditional
probabilities and continuous densities, for HW 3.
- R program in file bvnpdf.R,
to compute bivariate normal pdfs. Read the code for usage instructions.
- R program in file dagopear.R to perform
the D'Agostino-Pearson test of normality and compute the associated
R program in file cochran.R for Cochran's test of a
dichotomous variable without replication.
R program in file kendall.w.R to compute Kendall's
coefficient of concordance (Kendall's W). Call it using
where tab is a matrix with scores (or ranks) along its rows.
- NCI microarray data on 14 cancers:
- nci.info: some information on the data
- nci.names: just the 64 names identifying
14 cancers, to label the 64 rows of gene expression data.
- nci.data: gene expression data, 64 rows
of 6830 gene expression values. HINT: Save this to a file.
- Cleaned-up NCI microarray data, 57 samples of 8 cancers with top
12 expressed genes: nci57x13.R, to be saved
into your R folder and read into the R session with
load("nci57x13.R"). The result is a data frame named "nci12".
- Download nci57x7.R and
load("nci57x7.R") to get the top 6 genes data. The result is a
data frame named "nci6".
- Download nci57x6831.R and
load("nci57x6831.R") to get the full gene expression data frame,
named simply "nci".
- Article and Table
1 on ABO blood types and cancer in Northern India, for the term project.
- WinBUGS and tutorials:
- Saed Sayad's notes on classifier evaluation:
Topics. This is a second course in applied statistics with
examples from biology and medicine. Topics include Bayes rule, Markov
chains, maximum likelihood estimation with MCMC, classical statistical
inference, ANOVA and MANOVA, multivariate visualization, multiple
regression, correlation, and classification. Each student will be
required to perform and write a report on a
data analysis project.
Prerequisites. Math 3200, or Math 2200 and the permission of
Time. Classes meet Mondays, Wednesdays and Fridays, 1:00pm
to 2:00pm, in Duncker Hall room 101.
Text. The lectures will follow
Statistics Using R with Biological
Examples by Kim Seefeld and Ernst Linder, an e-text that you
may download freely. (Alternative
If you desire a paper copy, you may have it
printed and bound at any copy shop from this PDF file.
Supplementary readings and software may be found in the "LINKS" column above.
Homework. You are encouraged to collaborate on homework,
although each student must turn in solutions individually. Please
return your solutions to the instructor by the end of class.
For full credit, homework solutions should be on paper with the
answers properly labeled. For computations, include the R commands
used, the input provided, and the output with
labels indicating which part of the solution is thus computed.
Suggestion: copy and paste your R session into a text editing
program and delete unnecessary text and space, then print and
annotate by hand as needed. Hand in homework as you would like to
get it if you were the grader.
will be assigned as follows:
Solutions are due at the end of class on the due date. Late homework
will not be accepted.
- HW #1, due Fri, Jan 27
- HW #2, due Fri, Feb 3
- HW #3, due Fri, Feb 10
- HW #4, due Fri, Feb 17
- HW #5, due Fri, Feb 24
- HW #6, due Fri, Mar 3
- HW #7, due Fri, Mar 24
- HW #8, due Fri, Mar 31
- HW #9, due Fri, Apr 7
- HW #10, due Fri, Apr 14
- HW #11, due Fri, Apr 21
- HW #12, due Fri, Apr 28
Tests. There will be one midterm examination in class on
Wednesday, March 8th, 2017. No reference
material or electronic devices will be allowed.
There will be one cumulative take-home final
examination, emphasizing the later material. It is due on
Wednesday, May 10th, 2017, by 4:00pm, in my office (room 105a,
Cupples I Hall).
Project. There will be one data analysis project due at
4:00pm on Wednesday, May 3rd, 2017, in my office (room 105a, Cupples
I Hall). Late projects will not be accepted. Projects may be
selected from this list, or chosen by the
student with the prior approval of the instructor.
Grading. One score will be assigned for homework, one for the
midterm examination, one for the final examination, and one for the term
project. These four will contribute in respective shares of 40%, 20%, 20%,
and 20% to the course score. Letter grades, computed from the course score,
will be at least the following:
|Course score at least:||90%||80%||70%||60%||Letter grade at least:||A||B||C||D
Students taking the Cr/NCr or P/F options will need a grade of D or better to
pass. Students taking the Audit option will need to attend 36 of the
40 class meetings to obtain a Successful Audit grade.
Computing. Students are encouraged to use R on their own computers or on
the computers available in the Arts and Sciences Computing
Center for both symbolic and numerical computations.
Office Hours. Mondays and Wednesdays 2:00-3:00pm (after
class), or by appointment.
Questions? Return to
M. Victor Wickerhauser's home page for contact information.