## Math 322 Biostatistics

### NEWS

• The final examination is now available. It is due on Friday, May 1st, 2020, at 4:00pm CDT.
• Because of the heightened dependence on technology this semester, unexpected problems such as internet interruptions and device failures have created inequities. I wish to compensate for these inequities in a manner fair to all by way of this updated HW grading policy:
• HWs 1 through 6 will together count for 30% of the course score.
• HWs 7 through 11 will together count for 10% of the course score.
Any submitted HW of these last five will be accepted and graded.
Those with CrowdMark tech problems may email me their HW as attached PDFs or JPEGs or text files.

### R EXAMPLES

Example R commands for:

• Example Midterm (2016).
• Open-source software R for statistical computing, and its manual.
• Download old free MatLab (for Windows or Linux PCs) from my website.
• There is an online Octave to R dictionary, useful for those who know MatLab or Octave well and want to learn the corresponding R commands.
• R program in file deduct.R to solve HW 1's DNA sequence counting problem.
• R program in file faker.R. Then "faker(n, mu, sd)" generates n>1 samples with exact prescribed mean mu and exact standard deviation sd.
• Notes (brillouin.pdf) on Brillouin and Shannon diversity, for HW 1.
• Notes (condprob.pdf) on conditional probabilities and continuous densities, for HW 3.
• R program in file bvnpdf.R, to compute bivariate normal pdfs. Read the code for usage instructions.
• R program in file dagopear.R to perform the D'Agostino-Pearson test of normality and compute the associated statistics.
• R program in file cochran.R for Cochran's test of a dichotomous variable without replication.
• R program in file kendall.w.R to compute Kendall's coefficient of concordance (Kendall's W). Call it using
kendall.w(tab)
where tab is a matrix with scores (or ranks) along its rows.
• NCI microarray data on 14 cancers:
• nci.info: some information on the data
• nci.names: just the 64 names identifying 14 cancers, to label the 64 rows of gene expression data.
• nci.data: gene expression data, 64 rows of 6830 gene expression values. HINT: Save this to a file.
• Cleaned-up NCI microarray data, 57 samples of 8 cancers with top 12 expressed genes: nci57x13.R, to be saved into your R folder and read into the R session with load("nci57x13.R"). The result is a data frame named "nci12".
• Download nci57x7.R and load("nci57x7.R") to get the top 6 genes data. The result is a data frame named "nci6".
• Article and Table 1 on ABO blood types and cancer in Northern India, for the term project.
• WinBUGS and tutorials:
• Saed Sayad's notes on classifier evaluation:

### Syllabus

Topics. This is a second course in applied statistics with examples from biology and medicine. Topics include Bayes rule, Markov chains, maximum likelihood estimation with MCMC, classical statistical inference, ANOVA and MANOVA, multivariate visualization, multiple regression, correlation, and classification. Each student will NO LONGER BE REQUIRED to perform and write a report on a data analysis project.

Prerequisites. Math 3200, or Math 2200 and the permission of the instructor.

Time. Classes meet Mondays, Wednesdays and Fridays, 3:00pm to 3:50pm, in Cupples I Hall room 113.

Text. The lectures will follow Statistics Using R with Biological Examples by Kim Seefeld and Ernst Linder, an e-text that you may download freely. (Alternative local link.) If you desire a paper copy, you may have it printed and bound at any copy shop from this PDF file.

Supplementary readings and software may be found in the "LINKS" column above.

Homework. You are encouraged to collaborate on homework, although each student must turn in solutions individually. Please complete your solutions on CrowdMark by 11pm on the due date.

For full credit, homework solutions should be clearly legible with the answers properly labeled. For computations, include the R commands used, the input provided, and the output with labels indicating which part of the solution is thus computed.

Suggestion: copy and paste your R session into a text editing program and delete unnecessary text and space, then comment and annotate as needed. Hand in homework as you would like to get it if you were the grader.

Problem sets will be assigned as follows:
 HW #1, due Fri, Jan 24 HW #2, due Fri, Jan 31 HW #3, due Fri, Feb 7 HW #4, due Fri, Feb 14 HW #5, due Fri, Feb 21 HW #6, due Fri, Feb 28
 HW #7, due Fri, Mar 27 HW #8, due Fri, Apr 3 HW #9 (hw09data.txt), due Fri, Apr 10 HW #10 (hw10data.txt), due Fri, Apr 17 HW #11, due Fri, Apr 24 (last class)
Solutions, via CrowdMark, are due at 11:00pm on the due date. Late homework will not be accepted.

Tests. There will be one midterm examination in class on Wednesday, March 4th, 2020. No reference material or electronic devices will be allowed.
There will be one cumulative take-home final examination, emphasizing the later material. It is due on Friday, May 1st, 2020, by 4:00pm, on CrowdMark.

Grading. One score will be assigned for homework, one for the midterm examination, and one for the final examination. These three will contribute in respective shares of 40% (30% for HWs 1-6, 10%for HWs 7-11), 30%, and 30% to the course score. Letter grades, computed from the course score, will be at least the following:

 Course score at least: Letter grade at least: 90% 80% 70% 60% A B C D

Students taking the Cr/NCr or P/F options will need a grade of D or better to pass. Students taking the Audit option will need to attend 36 of the 40 class meetings to obtain a Successful Audit grade.

Computing. Students are encouraged to use R on their own computers or on the computers available in the Arts and Sciences Computing Center for both symbolic and numerical computations.

Office Hours. Mondays and Wednesdays 2:00-3:00pm (before class), Fridays 4:00-5:00pm (after class), or by appointment.