Math 322 - Biostatistics

Computer resources

When dealing with large data sets, the computer is an invaluable tool. A big part of this course is learning how to use a computer package for doing statistical analysis. There are many statistical packages available, with the most famous ones probably being Minitab, R, SAS, S-Plus, and SPSS. In addition Microsoft Excel is capable of doing a decent amount of statistics.

For your own analysis you will be free to use any computer package you like. However, all examples in class and on this web page will use R.

R

R is a freely available statistical package, and is commonly used both by statisticians, researchers and big corporations (see for instance this recent New York Times article).

Information about R, including files needed for installation, and extra packages that extend R's capabilities is available at the Comprehensive R Archive Network (CRAN). There you can also find thorough documentation of the system (the official manuals are rather technical and not very beginner-friendly).

Practical tips

When trying out new features of R, take advantage of the help system. Useful commands include help(function) (or equivalently ?function), example(function), help.search("topic"), and apropos("topic"). There is also a lot of help to be found on the internet. Instead of using Google, try to use the special purpose search engines RSeek or Dan Goldstein's R Search.

Usually working with R involves a bit of trial and error. It is a good idea to keep a text editor (like Emacs, Notepad, WinEdt etc.) open in addition to R, into which you can copy and paste the commands that actually work. When your session is done, save the file so that you can refer to it later.

Instead of having the graphics show up in the graphics window, you can send the graphics directly to a file that you can include in a later report. For instance, to send your graphics to a PDF-file, just give the command pdf("filename.pdf"). When you are done making graphics, say dev.off() to close up the PDF-file. R automatically creates a new page for each graphic. If instead you want each graphic to appear as a separate file, you can use pdf("filename%03d.pdf", onefile = FALSE). The %03d is automatically changed to a different number for each graphic by R. There are also similar commands for other graphics formats like jpeg, bmp, png, tiff and so on.

Creating nice and clean output from your R session can be done using the functions sink(), source(), and pdf() as follows. The sink() function can redirect all of R's output to a textfile. Just write sink("filename.txt"). To get the output back to your R window, write sink(). The source() function does the opposite, it reads R input from a textfile, for instance source("filename.R") reads from the file filename.R. When using source() R is by default fairly quiet. To have R speak up set the parameters echo = TRUE (R writes both input and output) or print.eval = TRUE (R writes only output).
Assuming that you have saved your good commands in a text file (as recommended above), say mycoolstuff.R, you can offer the commands

> sink("mycoolstuff.txt")
> pdf("mycoolstuff.pdf")
> source("mycoolstuff.R", print.eval = TRUE)
> sink()
> dev.off()

R will then create the files mycoolstuff.txt and mycoolstuff.pdf for you, containing all your results and graphics.

When using files R by default reads from and saves to what is called the working directory. To find out what the working directory is, write getwd(). If you want to change the working directory, you can write setwd("C:/my/new/working/directory/").

Example programs and files

In this section you can find data files and R files that were used in the lectures, are explaining something from the lectures, or give some useful hints for homework.

Also the web-page for last year's course in Biostatistics contains many programs that may be useful.

Data files

estriol.txt The estriol level (mg/24 hr) of pregnant women, and the corresponding birthweight (g/100) of their babies.

mathability.txt Scores on a standardized math test for children of different ages.

cropyield.txt Crop yield (in tonnes) for different types of fertilizers.

birthweights.txt A (made up) population of birth weights used in the lecture on Febrary 2nd.

heights.txt The sex and height (in inches) of all the students in the class.

Examples from lectures

090406_examples.R Examples of R commands corresponding to the material in Chapter 11.4 (F test for linear regression).

090403_examples.R Examples of R commands corresponding to the material in Chapter 11.3 (Least squares method).

090401_lecture.R Introduction to analysis of variance and regression analysis.

090318_examples.R Examples of R commands corresponding to the material in Chapter 10.1 - 10.2 (2x2 contingency tables).

090302_examples.R Examples of R commands corresponding to the material in Chapter 8.2 - 8.7 (Two sample hypothesis testing).

090218_examples.R Examples of R commands corresponding to the material in Chapter 7.6 (Sample size determination).

090216_examples.R Examples of R commands corresponding to the material in Chapter 7.5 (Power of a test).

090211_examples.R Examples of R commands corresponding to the material in Chapter 7.1 - 7.3 (Hypothesis testing).

090204_examples.R Examples of R commands corresponding to the material in Chapter 6.6 - 6.8 (Confidence intervals).

090202_examples.R Examples of R commands corresponding to the material in Chapter 6.5 - 6.7 (Estimation).

090130_examples.R Examples of R commands corresponding to the material in Chapter 6.1 - 6.4 (Random samples).

090126_examples.R Examples of R commands corresponding to the material in Chapter 4 & 5 (Probability distributions).

090116_examples.R Examples of R commands corresponding to the material in Chapter 2 (Descriptive statistics).

090114_lecture.R The introductory lecture on how R works. This file contains quite a bit more than we had time to cover in class.

Self defined R functions

090202_functions.R R functions used to demonstrate different estimators and the effect of sample size.