Math 2200 - Home



Using R














Math 2200, Spring 2010

Monday, May 10

Scaled final scores are (almost all) posted.
How did I calculate them?
1. I rescaled your 4 exam grades onto a straight scale. For exam 1 that's trivial, for exam 2 that means (as announced) add 5, for exams 3 and final I took (score - 85)/1.5 + 90.
2. I took the LARGEST of the following: a straight average, a progressive 19-23-27-32% average (I'm aware that 19+23+27+32=101, for a slight bonus), and a weighting that replaced 40% of a low E2 or E3 grade with your next lowest grade.
3. I added 1.7 points, out of my generous heart. Essentially, I'm bumping _everyone_ up a little bit.
4. The grading scale will be: 65-70 C-, 70-75 C, 75-80 C+, 80-83 B-, 83-87 B, 87-90 B+, 90-93 A-, and 93+ A. The top 3 scores in the class (all above 99) will get A+'s.

Sunday, May 9

The final exam is posted here.

Saturday, May 8

1. The update for Question 25 on Exam 3 (throwing out Q25 for those who answered false) was reverted by Telesis. I'll fix it before making final grades.
2. The final exam grades are up on mathlookup. I don't yet have all the data on them, but at first glance it appears they should be curved similarly to Exam 3.
3. I hope to have final grades made by sometime Monday.

Wednesday, May 5

The answers for the Fall 2009 Final Exam are: IIFFEBGCCBDBBEHIIEGHGAGCE.

Tuesday, May 4

Let me take this time to remind you to fill out course evaluations. I understand that tomorrow 5/5 is the last day for evaluations.
I especially encourage you to leave comments. As pre-tenure faculty, thoughtful positive comments are especially important to me. (Of course, thoughtful constructive criticism helps me understand ways in which I can improve my teaching.)

Monday, May 3

The review session will be held tomorrow, Tuesday May 4th, in Seigle 103, from 8 - 10pm.

Sunday, May 2

I'll be around my office at least between 12 and 3 Monday through Thursday this week.
I'm still waiting to hear on review session time and location.

Saturday, May 1

1. My computer was a victim of the lightning on Friday. As a result, I'll be somewhat more difficult to reach by email. Please make sure to use the "" address.
2. I did manage to borrow a computer and write a brief review document for the final.
3. I recommend Fall 2007 and 2009 finals to give you an idea of what multiple choice questions may look like. I'll try to post the answers to Fall 2009 later, but for now they were a victim of my computer's crash.
4. As announced in class: the final exam is cumulative, and covers the entire textbook. I probably won't ask questions directly about the first few chapters, but these chapters are implicit in everything else we do, so you should review them anyway. The material since exam 3 will probably make up a little more than a third of the final.

Friday, April 30

Happy WILD!
The R transcript from this morning's class is posted.

Wednesday, April 28

The R transcript from this morning's class is posted.

Wednesday, April 28

The R transcript from Monday's afternoon class is posted. (The morning class had a similar transcript.)

The afternoon class on Friday will meet in LabSci 250 at the usual 3-4 time, so that we can avoid the WILD wildness. (You can also reasonably attend the morning class, subject to space.)

I'll be around my office Thursday, 12-2:30.
I'll also be around my office most afternoons during reading week -- I'll post a schedule here, and you can always email to arrange some other time.

Monday, April 26

My usual office hours are Tuesday 12-2. To help you prepare for the final exam, I'll be around my office 2-4 as well.

Sunday, April 25

The homework sets in the schedule are now complete.

It won't be on the exam, but if you want to try your hand at pulling apart a table of data in R, then you might take a look at problem 15 of Chapter 30, or problem 14 from the Part VII Review. These both have some interesting data tables (provided on the CD), with some guidance about good ways to proceed in building a model.
I'd be happy to discuss attempts at one of these.

Friday, April 23

I've posted the transcript of R commands from this morning's class. (The afternoon class had a similar transcript.) See Thursday's note below for how to get the cereals data into R.

Thursday, April 22

I've posted the transcript of R commands from yesterday's afternoon class. (The morning class had a similar transcript.)

Before class, I read in the frisbee and cereal data. For the cereal data, I edited the text file to remove an extraneous line (with .'s) at the end, then did the commands:

Tuesday, April 20

Solutions for Exam 3 are posted here. They are brief, but I hope they are helpful.

The R commands for anova today were:
  boxplot(Distance ~ Grip)
  anova(lm(Distance ~ Grip))
See also Chapter 28 Problem 6, which is where I pulled this data from. It presents the result of the R calculation, as do most of the homework problems -- make sure that you can perform a simple 3-way ANOVA on your TI, as well.

The plot of the curves of various F-distributions comes from the Wikipedia article.

Monday, April 19

You should interpret your grade on Exam 3 as follows: 86-100 = A, 71-85 = B, 51-70 = C, 30-50 = D, etc.
I will throw out Problem 25, since both true and false are defensible as answers. If you answered false, then another 2 points will show up in telesis over the next several days.
Before this adjustment, the mean was 69.3, with a sd of 18.4. Also: median = 72, Q1 = 58, Q3 = 84.

I'll be late to my office hours on Tuesday, but I should be there by 1pm.

Friday, April 16

Professor Sawyer has a nice page on doing statistics with a TI-83. It may be a useful reference, combined with the discussion from class. It includes the method for calculating invT on a TI-83.

Thursday, April 15

Exam 3 is now posted.

Quick review: to calculate invT on a TI-83, go to Math | Math | Solver. Enter an equation "0=tcdf(-99, X, df) - val", where df is the number of degrees of freedom you're interested in, and val is the value of which you want invT (e.g., .975). Hit enter to go to the next screen, change your initial guess to 2, and while your cursor is in the X= field, hit Alpha Enter.

Monday, April 12

The review session is TONIGHT, 8-10pm, in Seigle Hall 204.

Sunday, April 11

I've written an overview of topics for Exam 3.
I'm planning a review session for Monday 8-10pm -- location is yet TBA.

As usual, I especially recommend the Fall 2007 and Fall 2009 exams for seeing how homework problems can be turned into multiple choices. (But only look at these once you've carefully thought about all of our homework problems.)
The answers to the Fall 2009 Exam 3 are: DACBDIHGBDHGBEHDHBBACACAA.

Friday, April 9

Related to the astrological data on CEOs from Wednesday: on this page is an analysis of the distribution of birthdays over the 12 months of the year. The author provides birthday data of 480,040 individuals, and finds more variation than can be accounted for by chance.
You might be interested in reproducing his analysis, and calculating P-values (which he omits). Is there anything suspicious about his analysis?

Wednesday, April 7

The exam will cover from Chapter 17 through the first part of Chapter 26 (the chi-squared test for goodness of fit).

The Chi-squared curves I showed in class today can be found in the Wikipedia article on the chi-squared distribution.
The R example with the Wilcoxon test from today was
  wilcox.test(New.Activity, Control)

But (as announced in class) any Wilcoxon problem on the exam will not require tie-breaking.

Wednesday, March 31

The applet for comparing Student's t-distribution with the normal distribution for various degrees of freedom is linked here.

Monday, March 29

Class was cancelled Friday 3/26, due to a medical emergency.
As a result, we'll be running about 1-2 days behind for the forseeable future. We should still be able to cover the important topics remaining in a reasonable way.

The smoking data I showed in class today can be found on the website for the Federal Committee on Statistical Methodology. (I made up the sample sizes of 10,000 and 2,800 -- you may be able to deduce it fro the data.)

Sunday, March 21

I have rescored the exam, giving partial credit on the following problems:

7. Answer C, to analyze the data with and without the erroneous point, is by far the second best answer. Now worth 2/4 points.
8. Scatterplot I appears to be appropriate to analyze with linear methods, as the data does fall very well on a straight line. However, it does appear that there are two separate clusters of data, and you would likely want to seek an explanation for these clusters in addition. Thus, answer G, "III only" is a better answer than the others, and also worth 2/4 points.
24/25. I didn't explicitly say whether the random selection was with or without replacement. Both models are apriori reasonable, so I give full credit for either "True/True" or "False/False". ("True/False" or "False/True" still get 2/4 points.)

1-Var Statistics: Mean=77.43, sd=11.82, Min=36, Q1=71, Median=80, Q3=86, Max=98

You may want to check Telesis to see that your grade matches the above algorithm.
You should think of your letter grade as being found by adding 5 to your score, and taking the resulting number on a 'straight scale', i.e., 90+ = A (or A minus), 80-89=B (plus or minus), 65-79=C (plus or minus), 50-64=D, etc.

Wednesday, March 17

A pdf of the exam is posted here. The correct answers are on mathlookup, as usual.

Sunday, March 13

Akshay's scheduled office hours are cancelled, due to lack of attendance. You can still email him to set up a meeting.
He's out of town this week, but will still answer questions and comment on homework solutions via email.

Sunday, March 13

Once more, our upcoming exam will cover Chapters 8-16.
The answers to the Fall 2009 Exam 2 are: BCAIHIHDIGDCCCGFEHEHBBDEI.
The review session will be held in Seigle Hall, Room 104, Monday 8-10pm.

Just for fun: Last Thursday the Colbert Report had Scott Rasmussen on to discuss polling. How many problems can you find in Stephen Colbert's web poll?

Monday, March 8

The links section (at left), still has the link to the math department old exams page. I especially recommend the Fall 2009 and Fall 2007 exams from Math 2200. The relevant material is mostly from Exam 2, with a little bit (residuals and regression lines) from Exam 1. You should have a pretty good idea after the first exam of how our exams this semester will compare to these past exams.
I'll put an answer key up later in the week for the Fall 2009 Exam 2.

I've written an overview of topics (and some other information) for the second exam, similar to that for the first exam.

Monday, February 22

The WebMD news article shown in class is linked here. Also available are another summary and the scholarly article itself (these may be available only from a Wash U ip address).

Tuesday, February 16

If anyone picked up an extra copy of the Student Solutions Manual (with a ripped top) at the review session, please email Shira Sacks (sesacks at wustl).

Saturday, February 13

The R commands that we demonstrated yesterday in class were as follows:
fuel<-read.delim("Ch09_Fuel_efficiency.txt") # read fuel efficiency data.
fuel[1,] # display the first row of the data table.
attach(fuel) # make it possible to refer to variables in the fuel table directly
plot(City.MPG ~ Weight) # an alternate version of the plot command: City.MPG is the y variable here
plot( 1/City.MPG ~ Weight # plot 1/City.MPG against weight

Reexpressing data works similarly with other R command, e.g.
     abline( lm(1/City.MPG ~ Weight))
     cor(1/City.MPG, Weight)
     plot(sqrt(InsectSprays$count) ~ InsectSprays$spray)

Friday, February 12 pt 2

The exam is posted here. The correct answers can be found on mathlookup -- note that both A and B received full credit on #6, and both C and I received full credit on #19.

Friday, February 12 pt 1

You can look up your score and find out what you got wrong at the math scores lookup site.

Performance on this exam was typically quite good: the mean was 90, while the quartiles are Q1=86, med=92, Q3=96 (according to R). You should read your score on a 'straight scale', i.e., 90+ = A (or A minus), 80-89=B (plus or minus), 65-79=C (plus or minus), 50-64=D, etc.

Thursday, February 11

To exclude outliers from analysis in R in the SAT Verbal vs SAT Math model, I first identified them as rows 66 and 162. I then gave the command:
The c function combines its arguments, so that I'm combining the sets 1-65 and 67-161, and then setting the new dataframe 'sat_nooutliers' to be that subset of rows.
I then draw lines on the plot using similar commands to previous sessions, e.g.
     abline( lm( sat_nooutliers$Verbal.SAT ~ sat_nooutliers$Math.SAT ), lty=2 )
And similarly for the subset with Sex=='M' or Sex=='F' (as done on Feb 8).

Monday, February 8 pt 2

The homeworks for Chapters 8 and 9 are posted in MyStatLab and the homework. Hint for Chap 9 #19 -- look for leverage points.

The R code used to create the plot of SAT Verbal vs SAT Math with gender indicated is as follows:

sat<-read.delim("Ch08_SAT_scores") # read SAT scores data. See the note from Feb 3.
attach(sat) # makes it possible to refer to variables in the sat table by their name alone, e.g. Verbal.SAT instead of sat$Verbal.SAT
plot(Verbal.SAT, Math.SAT, type='n') # Set up the axis with the right scales for plotting
points(Verbal.SAT[Sex=='F'], Math.SAT[Sex=='F'], col='red') # display points only of cases with Sex value of 'F' in color red
points(Verbal.SAT[Sex=='M'], Math.SAT[Sex=='M'], col='blue') # do the same for males in color blue
l_Male=lm(Verbal.SAT[Sex=='M'] ~ Math.SAT[Sex=='M']) # create linear models for males and females
l_Female=lm(Verbal.SAT[Sex=='F'] ~ Math.SAT[Sex=='F'])
abline(l_Male, col='blue') # draw the line of best fit for males and females in blue and red
abline(l_Female, col='red')
Other commands of interest: 'l_Male' will display the linear model, including the line of best fit and r-value. 'sat[1,]' will display the first row of the data table. You might also draw the line of best fit for the entire data set, or try it for the gender-selected subpopulations with the previously-identified outliers removed. 'subset(sat, Sex=='F')' will construct a new data table consisting of the lines from sat with variable Sex having value 'F'.

Monday, February 8 pt 1

1. No Kendall or Spearman on the exam. Otherwise everything from Chapter 7 (and 2-6).
2. Answers from Fall 2009 Exam 1: FDGHBAFHBDACAIHHHEBGFGEBH. (No solutions will be provided.)

R code from today and homework later tonight...

Friday, February 5

An overview of topics (and some other information) for the first exam is available.
Of special interest from this document: Akshay's review session will be held Monday evening, 8 - 10 pm in 306 Seigle Hall.

On the day of the exam, you'll look up your room and seat at the math department seat lookup site. This has been added to the links.

Wednesday, February 3 pt2

The R calculations shown in class today were as follows:
1.  The data file is Ch08_SAT_scores.txt. There are two lines at the end with periods in both column, which give R problems reading the file. Open the file with a text editor and remove these two lines.
2.  sat<-read.delim("Ch08_SAT_scores") # read SAT scores data
3.  sat # display data
4.  plot(sat$Verbal.SAT, sat$Math.SAT) # make scatterplot of quant vs quant data
5.  abline(lm (sat$Verbal.SAT ~ sat$Math.SAT)) # display line of best fit
6.  cor(sat$Verbal.SAT, sat$Math.SAT) # calculate correlation coefficient

Wednesday, February 3

Preparing for exams: under the links section (at left), there is a link to the math department old exams page. I especially recommend the Fall 2009 and Fall 2007 exams from Math 2200. If you want to see an exam that I personally have written, then I would suggest looking at Exam 3 from Math 131 in Spring 2009.

An extra office hour: I'll be in my office tonight from 7:00 - 8:00 pm, but I have to leave fairly promptly at 8.

Monday, February 1

1) I've set my office hours to be Tuesday 12:00 - 2:00 pm. I'm still also pleased to make appointments for other times.
2) The instructions for using R are linked at left.

Wednesday, January 27

1) As announced in class, Akshay Honnatti is available to give comments on homework, and will have help sessions on monday and wednesday. I've updated the syllabus with his information.
2) The cost-of-living data, with OpenOffice code to separate it into 'bins' or 'buckets' to make a histogram is available. Its in OpenDocument spreadsheet format, which I understand is readable by recent versions of Excel (or with a filter to convert).
3) The Freakonomics blog story with an example of a pie chart used to present categorical data (talked about in 3:00 section) is linked here, and here is another nice presentation of categorical data.
4) The full set of Titanic data can be found here, if you're interested. The ASCII comma separated file is easily imported into OpenOffice or Excel.

Saturday, January 23

Homework for Friday's up now. More schedule materials and syllabus updates will be up over the weekend.

Wednesday, January 20

Welcome to Math 2200! The Syllabus and Schedule are linked at the left, as well as MyStatLab, which we will be using for optional online homeworks.

To register for MyStatLab, follow these instructions. There should currently be two homeworks on the MyStatLab page: an orientation, and one for Chapter 2.

Last modified May 14, 2010