***********************************************; * An impartial panel judges the taste of apples at different levels * of 5 covariates: 3 soil nutrients and 2 amounts of shade and water. * * The covariates are highly correlated, so that the coefficients in the * final regression are unstable. * * We use ridge regression to try to get stable estimates of the * regression coefficients. * * The motivation of ``variance reduction'' techniques like ridge * regression is that if we have an estimator with a large variance * for a parameter mu that is, for example, unbiased: * * E(X)=mu Var(X)=Large * * then the expected error in estimating mu with X is * * Error(X) = E[(X-mu)^2] = Var(X) = Large * * We then look for another estimator Y (that may have no statistical * meaning at all) such that * * E(Y)-E(X) = Small Var(Y)=Medium * * Then the error using Y * * Error(Y) = E[(Y-mu)^2] = Var(Y) + (E(Y)-mu)^2 = Medium + Small^2 * * may be much smaller than Error(X) = Var(X). * * Assume that we have a regression (written in matrix form) * * Y = X beta + e * * for which the columns of the matrix X are highly correlated. Then * the usual unbiased estimator of the coefficients beta * * betahat = (X'X)^{-1}X'Y * * can have high variance because X'X is close to singular. Recall from * before that the eigenvalues of the correlized matrix form of X'X are * * 2.650, 2.277, 0.049, 0.0213, 0.002 * * The three smaller eigenvalues, and in particular the smallest * eigenvalue, cause mischief in this case. * * The basic idea is to define a perturbed estimator of beta by * * betahat(k) = (X'X+kI)^{-1} X'Y * * While betahat(k) makes no sense statistically as an estimator of * beta, we expect the variance of betahat(k) to be much smaller * since the eigenvalues of X'X+kI will be larger, while the bias * E(betahat_h)-beta will be small unless k is too large. The trick is * choosing an intermediate value of k that is large enough so that * Var(betahat(k)) is small or medium sized but small enough so that * the bias E[betahat(k)]-beta is not too large. * * The first step in this method is to notice that X'X+kI may involve * adding apples and oranges: The entries of X'X are in different * units that depend on the units of the covariates. Let * * Z = X with columns centered at column means * and column standard deviations set equal to one. * * The first column of Z is also removed, since the centering makes * it indentically zero. It can then be shown that the standard * estimator betahat can be written * * betahat = inv(sqrt(D))*inv(Z'Z)*Z'*(Y-Ybar) * * where D is the diagonal matrix whose diagonal elements are the * variances of the columns of X and inv(A)=A^{-1}. We then define * * betahat(k) = inv(sqrt(D))*inv(Z'Z+k*I)*Z'*(Y-Ybar) * * If k=0, this reduces to the standard estimator of beta. * * The idea is to use the smallest value of k such that the estimates * seem `stable'. This would be a compromise between the original * high-variance but unbiased estimator betahat and a lower-variance * estimator with a small or moderate bias. The error in using * betahat(k) is * * E[(betahat(k)_i-beta_i)^2] = Var(betahat(k))_ii * * + [E(betahat(k)_i) - beta_i]^2 * * In our case, including a small value of k>0 will dramatically decrease * Var(betahat(k)_i) for all i. Since we do not know the true beta, * the size of the second term above will be harder to estimate, but * we will keep track of SSE = Sum (Y_i - (X betahat)_i)^2 and * RMSE = sqrt(MSE) = sqrt(SSE/(n-p)). * * A set of output that tells what happens for various values of k is * called a `ridge trace'. This is used to pick a value of k. * The SAS regression procedure `proc reg' can be told to generate * a ridge trace, so that ridge regression is easy to implement. * * To see what the method is actually doing, we also carry out ridge * regression using SAS's matrix language `proc iml'. Note that the * output is identical. * * MACROS IN SAS: The `proc iml' code defines a SAS macro to output * ridge-trace information for different values of k, and even calls * the macro within a `do' loop in proc iml. * * SAS macros operate on text in general, and are not restricted to * `proc iml'. In particular, SAS macros can be used anywhere in SAS * with exactly the same syntax. See the SAS documentation for * more details. * ************************************************; title 'RIDGE REGRESSION for apple taste - YOURNAME'; title2 "USING Inv(Z`Z+k*I) INSTEAD OF Inv(Z`*Z) IN PARAMETER ESTIMATES"; options ls=75 ps=60 pageno=1 nocenter; data appledat; con=1; * A column of 1s for Proc IML; input yy nat kk pp shade water; label yy='Apple Taste' nat='Sodium' kk='Potassium' pp='Phosphorus' shade='Shade' water='Water'; datalines; 2876 20.0 38 2488 2.42 216 2078 11.1 13 2998 1.62 321 3052 19.8 31 3835 2.79 376 2265 13.9 19 2360 1.65 265 940 17.0 24 233 0.86 18 2815 16.9 26 3922 2.70 369 2661 11.6 16 4343 2.40 453 2181 14.3 22 3110 2.05 267 2052 10.5 13 2869 1.63 286 2064 18.2 31 2335 2.17 252 1551 8.3 8 1784 0.84 185 2338 20.4 36 2601 2.47 275 1753 8.7 18 2124 1.27 201 2110 7.5 4 4408 1.85 411 ; proc print; title3 'THE DATA AS SAS SEES IT'; run; ************************************************; * Generate a ridge trace using proc reg. * * The call to proc reg below writes parameter estimate data for various * values of k to the dataset `ridgest' and also plots them as a * function of k ( / ridgeplot). `Ridge plots' cannot be done in * lineprinter mode. Here we tell SAS to write the plot as a * GIF-format file with name `appleridge.gif'. SAS does not like * file names with more than 8 characters, so that this is * truncated to `applerid.gif'. * * `Proc reg' below stores different types of information in the dataset * `ridgest' in records with different values of the variable _TYPE_ . * The value of k appears in `ridgest' as the column _RIDGE_ * * The option `outvif' tells SAS to output estimates of the variances * of the parameter estimates divided by MSE for each value of k. * For k=0, these are the diagonal entries of (X`X)^{-1}. These are * the relative values of the variances of the perturbed estimators. * * The option `noprint' suppresses most `proc reg' output * * The `goption' command tell SAS to write the ridge plot as a * GIF-format graphics file (800x600 pixels). `Grafout' names * the output directory. `Name=' in the `proc reg' plot command * gives the file name. The default file name is `reg.gif'. * * Two other examples of `grafout' and `goptions' are commented out. * `Pdfc' stands for `PDF with color'. ************************************************; filename grafout "."; * Write to this directory: . means default; * filename grafout "c:\myfiles\math475\sasfiles"; goptions device=gif gsfname=grafout gsfmode=replace; * goptions device=pdfc gsfname=grafout gsfmode=replace; proc reg outest=ridgest ridge=0 to 0.30 by 0.01 outvif; title3 'RIDGE REGRESSION USING PROC REG'; model yy = nat--water / noprint; plot / ridgeplot name="appleridge"; run; data first; title4 "THE FIRST 25 RECORDS IN `ridgest'"; set ridgest; if _N_ le 25; run; proc print data=first; run; ************************************************; * Show parameter estimates and RMSE=sqrt(SSE/(n-r-1)) as a function of k ; * In a data step,`if ' means that record will be kept only * if that condition is satisfied. In this case, this keeps the * records with estimated parameter values. ************************************************; data ridgeparms; set ridgest; if _TYPE_='PARMS' or _TYPE_='RIDGE'; * Drop some columns without any useful information; drop _MODEL_ _DEPVAR_ _PCOMIT_ yy; run; proc print data=ridgeparms; title4 'PARAMETER ESTIMATES AS A FUNCTION OF K'; * Make k = _RIDGE_ the first column; id _RIDGE_; run; ************************************************; * Show variances/MSE of the parameters as a function of k ; ************************************************; data ridgevars; set ridgest; if _TYPE_='RIDGEVIF'; * Drop some columns without any useful information; drop _MODEL_ _DEPVAR_ _PCOMIT_ _RMSE_ intercept yy; run; proc print data=ridgevars; title4 'VARIANCES OF PARAMETER ESIMATES/MSE AS A FUNCTION OF K'; id _RIDGE_; run; ************************************************; * Now do the same using proc iml ; ************************************************; title3 "RIDGE REGRESSION USING PROC IML:"; proc iml; * Start the IML environment; * Create the design matrix XX and vector of observed values YY; * The syntax lists the 6 columns. Recall that `con' is con=1; * The design matrix is 14 x 6. The observed values are 14 x 1; use appledat; read all var { con nat kk pp shade water } into xx; read all var { yy } into yy; * Column vector of variable labels for labels ; names = { 'Intercept', 'Sodium', 'Potassium', 'Phosphorus', 'Shade', 'Water' }; n=nrow(xx); * Number of observations; p=ncol(xx); * Number of parameters including intercept; r=p-1; * Number of parameters excluding intercept; print "XX is the n by r+1 design matrix for r covariates", "XXC is the n by r design matrix for covariates centered", " at column means with the intercept column dropped", "ZZ is the n by r normalized centered design matrix, so that", " ZPZ=ZZ'*ZZ is the correlation matrix of the r covariates.", n r p; * Find beta and SSE using XX ; beta=inv(xx`*xx)*xx`*yy; * Standard beta estimator; yfit=xx*beta; ssex=ssq(yy-yfit); * Standard residual SS; rmsex=sqrt(ssex/(n-p)); * Root MSE; * XXC is the centered design matrix without the first column; xxc=(xx-(1/n)*j(n)*xx); * Subtract off the covariate (column) means; ycent=yy-(1/n)*sum(yy); * Ycent = Y-Ybar ; xxc = xxc[1:n, 2:p]; * Drop the 1st column (which is 0),; betaxx=beta[2:p]; * the intercept coefficient,; names=names[2:p]; * and the `Intercept' name; * Find beta and SSE using XXC ; * This should give identical results; betaxc=inv(xxc`*xxc)*xxc`*ycent; yfit=xxc*betaxc; ssec=ssq(ycent-yfit); rmsec=sqrt(ssec/(n-p)); * Root MSE; print "Beta coefficients and RMSE=Root(MSE) using both XX and XXC", names betaxx [format=10.4] " " betaxc [format=10.4] " " rmsex [format=7.3] " " rmsec [format=7.3]; dd=diag(xxc`*xxc); * Covariance matrix of r covariates; invdd=inv(sqrt(dd)); * Inverse st.dev. of covariates; zz=xxc*invdd; * Normalized centered design matrix; zpz=zz`*zz; * The correlation matrix of the columns; zpzinv=inv(zpz); print "ZPZ for centered and normalized design matrix", "(ZPZ is the correlation matrix of the covariates.)", zpz [format=8.2] zpzinv [format=8.2]; zvar=vecdiag(zpzinv); * Variance of estimators/MSE; * Find beta and sse using ZZ; betazz=invdd*zpzinv*zz`*ycent; ssez=ssq(ycent-xxc*betazz); rmsez=sqrt(ssez/(n-p)); print "Beta coefficients using XX and ZZ and Variances/MSE", names betaxx [format=10.4] " " betazz [format=10.4] " " rmsex [format=7.3] " " rmsez [format=7.3] zvar [format=8.3]; call eigen(eigenz,eigenvectors,zpz); print "Eigenvalues and Eigenvectors of ZPZ:", eigenz [format=7.5] " " names eigenvectors [format=8.3]; print "RIDGE TRACE (RIDGE REGRESSION OUTPUT):", "(ZVAR = variance of parameter estimate/MSE)", "(SSEFAC = Error SS/Full model Error SS.)", "STOP as soon as the estimates seem to stabilize", " and the ZVARs do not become too small"; ***********************************************; * Define a macro to do the steps of the ridge regression for one k; * The macro with k=0 (kk=0) will give the same results as above; * * Start the definition of the macro: Ends with command %mend * The follow code does nothing other than define the macro ***********************************************; %macro ridgereg(kk= ); kkr=&kk; * kkr is the same as kk ; zpzk=zpz+kkr*i(r); * Zpzk = Z`Z+kkr*I ; zpzkinv=inv(zpzk); betak=invdd*zpzkinv*zz`*ycent; * This is betahat(k) ; gg=zpzkinv*zpz*zpzkinv; zvar=vecdiag(gg); * This is Var(betahat(k)_i)/MSE ; sse=ssq(ycent-xxc*betak); rmse=sqrt(sse/(n-p)); ssefac=sse/ssex; call eigen(eigenv,eigenvectors,zpzk); print "Eigenvalues, beta coefficients, and Zvars using ZZ`ZZ+kkr*I", eigenv [format=5.3] " " names kkr [format=7.3] betak [format=8.2] zvar [format=8.2] " " rmse [format=6.2] " " ssefac [format=6.3]; * End of macro definition; %mend; ***********************************************; * Call the macro to build a `ridge trace'; * The first call should give the standard output; ***********************************************; %ridgereg(kk=0.0) ***********************************************; * Call the macro twelve times in a `do' loop ; ***********************************************; do kval=0.01 to 0.12 by 0.01; %ridgereg(kk=kval) end; ***********************************************; * Call the macro two more times ; ***********************************************; %ridgereg(kk=0.20) %ridgereg(kk=0.30) quit; * End proc IML;