I have a dataset with more than 30 columns and I would like to calculate the correlation between each of them (and preferably put the results in a correlation matrix). Is there any way to create a loop that runs the CORRELATION() function for all possible pairs of columns?
An example to make it clearer: Given a dataset of columns y, x1, x2 and x3, I would like to create the following table with the correlations between each one:
Here's an example of how to accomplish that:
filelayout := RECORD
UNSIGNED myid;
REAL y;
REAL x1;
REAL x2;
REAL x3;
END;
GFred := DATASET([ {1, 1.1, 2.2, 3.3, 4.4},
{2, 5, 6, 7, 8},
{3, 9, 10, 11, 12},
{4, 13.1, 14.2, 15.3, 16.4}],
filelayout);
SetVals := [GFred.y,GFred.x1,GFred.x2,GFred.x3];
OutRecCnt := COUNT(SetVals);
OneRec := DATASET([{0}],{UNSIGNED1 h});
filelayout XF(OneRec L, INTEGER C) := TRANSFORM
SELF.myid := C;
SELF.y := CORRELATION(GFred,SetVals[C],y);
SELF.x1 := CORRELATION(GFred,SetVals[C],x1);
SELF.x2 := CORRELATION(GFred,SetVals[C],x2);
SELF.x3 := CORRELATION(GFred,SetVals[C],x3);
END;
P := NORMALIZE(OneRec,OutRecCnt,XF(LEFT,COUNTER));
OUTPUT(P);