**********************************************************; * We now consider a question that we should have considered first: * * Often two-sample tests are the result of a test of `Treated' * versus `Control' (or `Non-Treated') for a set of individuals. * * If the individuals are not randomly assigned to the two treatment * groups (Treated and Control), then the two groups might be * inhomogeneous with respect to other traits such as income or * geography or age or sex. In that case, significance for * E(X)!=E(Y) might be due to accidental effects of these secondary * variables and not due to the treatment. * * The best way to avoid this is to assign subjects to two groups by * using random numbers. There might still be extraneous group * differences that might cause E(X)!=E(Y), but the chance of this * happening will be less. * * In the following example, we start with 15 individuals and randomly * assign 6 of them to Group A (Control or Non-Treatment) and the * rest to Group B (Treatment). * * The random assignment is done as follows. * First, `Randval = ranuni(0)' assigns a random number between 0 * and 1 to each record. The records with the 6 smallest Randval * values will be assigned to Group A. SAS has hundreds of * different functions that can be used in data steps. Ranuni() is * the first of these that we have encountered. * * We then sort the records by Randval, so that records with smaller * values of Randval come first. * * We then define a new data set `twosamp' that is defined by * reading records from our first dataset. The SAS `macro' _N_ gives * the observation number. If _N_ <= 6, we assign that record to * Group A, and otherwise to Group B. * * Finally, for neatness, we sort the new data set `twosamp' by Group * and then by Name, so that the assigned records are sorted by name * within each group. * * RANDOM-NUMBER SEEDS: * Computer-generated random numbers are actually determined by a * number called the initial `seed'. If a computer program is run * twice with the same initial seed, then the sequence of random * numbers will be identical. * * The function `randuni()' requires you to specify an initial seed, * which SAS reads only once, namely the first time that `randuni()' * is evaluated. If you say `randuni(0)', then the initial seed is * read from the computer system clock. This means that every time * that you run the program, you will get a different sequence of * random numbers and hence a different set of assignments. While * this may be plausible given what `random' means, it can be * disconcerting in practice. * * In contrast, if you say `randuni(Num)' where `Num' is any number * other than 0, then Num is used as the seed. The program will then * always use the same sequence of "random" numbers, and you will * always get the same set of random assignments. For the sake of * sanity, we use a non-zero initial seed (411) here, so that the * same program always makes the same set of random assignments. * ***************************************************; title 'RANDOM ASSIGNMENTS INTO TWO GROUPS'; options ls=75 ps=60 pageno=1 nocenter; data treatdat; input Name $; randval = ranuni(411); datalines; Betty Carol Charles Gary George Jill Joseph Helen Linda Margaret Marion Michael Phillip Rebecca Samuel run; title2 'Names with random numbers'; proc print; run; * Sort dataset `Treatdat' by randval; proc sort; by randval; run; title2 'Sorted names with random numbers'; proc print; run; * Put the first 6 records in Group A and the rest in Group B * _N_ is the observation number * do ... end are like parentheses in the `if then else' statement * In particular, end does NOT mean to end the data step. ; data twosamp; set treatdat; if _N_ le 6 then do; Group='A'; Text='Control'; end; else do; Group='B'; Text='Treated'; end; run; title2 'Sorted names with random numbers and sample designators'; proc print; run; proc sort; by Group Name; run; title2 'Sorted by Name within each Group'; proc print; var Name Group Text; run;