I have a dataset that is approximately 30,000 rows & approximately 30 columns, consisting of:
Here's what the original data looks like:
enter image description here The original data
I would like to restructure the data, to 'combine' the diagnostic codes & diagnostic descriptions, so that each row consists of 1 SSN identifier, 1 diagnostic code, and it's respective diagnostic code. So that SSNs with multiple diagnoses are on separate rows.
Here's what the structure of the table I would like to look like:
enter image description here The structure of the data, after transposing the variables.
Since I am using SAS EG, I am using the 'Transpose' task. The options SAS EG gives when assigning variables are:
When I attempt to transpose both ICD Codes & Description columns, it looks like this: Pic from SAS EG Results post transpose
SAS EG creates 4 new variables: Column1 - Column4; I realize these come from the 'New column names' option from above, but it's mixing both ICD codes & descriptions in all 4 columns.
Regardless if I try to 'transpose' the ICD code columns first, while 'copying' the 'Descriptions' columns in step 1, and transposing the 'Descriptions' in a 2nd step, it continues to combine the columns, so that there are both Codes & Descriptions in the newly created SAS 'Column1 - Column4' variables.
Am I not using the correct task (proc transpose) to have it so that the first column is the SSN, the 2nd column is the ICD code, and the 3rd column is the respective ICD description?
Thanks for reading so far down!
You wouldn't want to use the EG task to do such a complex transposition. You'll need to write SAS code for it. The transpose task is meant for simple transpositions.
Something like...
data want;
set have;
array code code1-code6;
array desc desc1-desc6;
do i = 1 to dim(code);
code_out = code[i];
desc_out = code[i];
output;
end;
keep ssn code_out desc_out;
run;
Not very complicated really, but more complex than the task would support.