I've been searching for a question related but I have had no luck as of yet. I am looking to transform a long list of independent variables for a regression analysis. A dummy dataset would look like this:
DATA TEST (DROP = i);
DO i = 1 to 4000;
VAR = i + 100000;
output;
end;
run;
PROC TRANSPOSE
DATA = TEST
OUT = TEST_T
(DROP = _NAME_)
PREFIX = X_;
ID VAR;
VAR VAR;
RUN;
DATA TEST_ARRAY;
SET TEST_T;
ARRAY X[*] X_:;
DO J = 1 TO 40;
DO I = 1 TO DIM(X);
X[I] = RANUNI(0)*I;
OUTPUT;
END;
END;
RUN;
In this case the variable names X_i are increasing monotonically, in reality, my variables are actually X_number where the number is a six digit unique identifier. I've been trying to log transform and square all these variables such that I have a new X matrix with the following columns
X_133456 X_SQ_133456 LOG_X_133456
I tried looping a list through all variables like this
PROC CONTENTS
DATA = TEST_ARRAY
OUT = CONTENTS;
RUN;
PROC SQL NOPRINT;
SELECT NAME INTO: REG_FACTORS
SEPARATED BY " "
FROM CONTENTS;
QUIT;
DATA WANT;
SET TEST_ARRAY;
%LET index = 1;
%DO %UNTIL (%SCAN(®_factors.,&index.," ")=);
%LET factors = %SCAN(®_factors.,&index.," ");
LOG_X_&FACTORS. = LOG(X_&FACTORS.);
X_SQ_&FACTORS. = (X_&FACTORS.) ** 2;
%LET index = %EVAL(&Index + 1);
%END;
RUN;
but this blows up my server and I need to find a more efficient way of doing this, thanks in advance
EDIT: for the contributor - I managed to solve at 13:04
%LET input_factors = X_:;
PROC SQL;
SELECT
NAME
, TRANWRD(NAME,%SCAN(&input_factors.,1,'_'),'SQ')
, TRANWRD(NAME,%SCAN(&input_factors.,1,'_'),'LOG')
INTO
:factor_list separated by " "
, :sq_factor_list separated by " "
, :log_factor_list separated by " "
FROM
contents
WHERE
VARNUM < 5
WHERE
NAME LIKE "%SCAN(&input_factors.,1,'_')_"
ORDER BY
INPUT(SCAN(NAME,-1,'_'),8.)
;
QUIT;
%PUT &factor_list.;
%PUT &sq_factor_list.;
%PUT &log_factor_list.;
Use 3 arrays, one for the input values (e.g. X_31415
), and two for the new calculated values (log and square).
The trick is to dynamically generate the variable names for the calculated variables, based on the original variable names.
/* Use dictionary table to get/generate vnames */ proc sql ; select name, /* X_31415 */ tranwrd(name,'X_','X_SQ_'), /* X_SQ_31415 */ tranwrd(name,'X_','LOG_X_') /* LOG_X_31415 */ into :VARLIST separated by ' ', :SQLIST separated by ' ', :LOGLIST separated by ' ' from dictionary.columns where libname = 'WORK' and memname = 'MYDATA' and name like 'X_%' order by input(scan(name,-1,'_'),8.) /* order based on the numeric suffix */ ; quit ;
Now you can assign three arrays, looping over the input values and calculate the square and log accordingly.
data array3 ; set mydata ; array in{*} &VARLIST ; /* X_1 X_17 X_31415 X_99999 */ array sq{*} &SQLIST ; /* X_SQ_1 X_SQ_17 X_SQ_31415 X_SQ_99999 */ array lg{*} &LOGLIST ; /* LOG_X_1 LOG_X_17 LOG_X_31415 LOG_X_99999 */ do i = 1 to dim(in) ; sq{i} = in{i} ** 2 ; lg{i} = log(in{i}) ; end ; drop i ; run ;