Thursday, December 3, 2009

Generating correlated random variables using SAS

This code is based on the discussion on SITMO. It uses two ways to generate correlated random variables. For any correlation matrix, C,

1) Find the Cholesky decomposition. In SAS, this uses the root function in IML. Multiply the Cholesky decomposition to a matrix of randomly generated numbers.

2) Find the eigenvalues and eigenvectors. In SAS, the function is call eigen in IML. The eigenvectors pre-multiplied with the diagonalized eigenvalues results in a matrix V. Multiply the transpose of V with the matrix of randomly generated numbers.

The product of this multiplication results in a matrix of correlated series.

The code:

proc iml;
C={1 0.6 0.3, 0.6 1 0.5, 0.3 0.5 1};
/* Method 1 uses the Cholesky decomposition */
U=root(C);
/* Method 2 uses the eigenvalues and eigenvectors */
call eigen(eival, eivec, c);
v=eivec*(diag(sqrt(eival)));
vt=t(v);
call randseed(12345);
/* Generate 3 random series 500 in length */
randm = j(500,3,.);
call randgen(randm,'NORMAL');
corr = randm * U;
corrv = randm * vt;
create random_data from randm;
append from randm;
create correlated_data from corr;
append from corr;
create correlated_data_v from corrv;
append from corrv;
quit;

title1 'Correlation of randomly generated data';
proc corr data = random_data;
run;

title1 'Correlation of data using Cholesky decomposition';
proc corr data = correlated_data;
run;

title1 'Correlation of data using Eigenvalue and Eigenvector decomposition';
proc corr data = correlated_data_v;
run;

Note that the correlation using 500 numbers may not give the exact correlation as in the C matrix. A longer series may be required, e.g. 1000.

No comments: