I'm trying to use an array to transpose a dataset from long to wide format. I would like to pull the form completion date for a repeating form into several columns so that each row will just have one participant, with all their forms on one row. Since I have to do this for several variables, I would like to avoid having to use proc transpose, so am trying to use an array to transform the data from long to wide format. I was following SAS' guide on using an array to transpose but am having trouble replicating it to my dataset. Specifically I am getting the error of "Array subscript out of range at line 39 column 4".
The following is my code that I am having trouble troubleshooting.
proc sort data=bl out=longsrt;
by pid;
run;
data wide;
set longsrt;
by pid;
keep pid c10:;
retain c10:;
array ac10_formcompletedate{*} c10_formcompletedate;
if first.pid then
do;
do i = 1 to 500;
ac10_formcompletedate ( i ) = .;
end;
end;
ac10_formcompletedate( redcap_event_name_bl ) = c10_formcompletedate;
if last.pid then output;
run;
Here's a small example of what my dataset looks like (For those familiar with REDCap, it is a REDCap export file):
data bl;
input pid redcap_event_name_bl c10_formcompletedate;
cards;
10001 baseline 2023-10-18
10001 baseline .
10001 baseline .
10001 baseline 2023-10-19
10002 baseline .
10002 baseline 2023-10-18
10002 baseline 2023-10-18
10003 baseline .
10003 baseline 2023-10-18
;
run;
Here is an example of how this should be done from start to finish, assuming I've guessed your output correctly.
data bl;
infile cards truncover;
input pid redcap_event_name_bl $ c10_formcompletedate : yymmdd10. status $;
format c10_formcompletedate yymmddd10.;
cards;
10001 baseline 2023-10-18 TESTA
10001 baseline .
10001 baseline .
10001 baseline 2023-10-19 TESTB
10002 baseline .
10002 baseline 2023-10-18 TESTA
10002 baseline 2023-10-18 TESTB
10003 baseline .
10003 baseline 2023-10-18 TESTA
;
run;
*determine maximum size of array required;
proc sql noprint;
select max(count) into :array_size TRIMMED from (select pid, count(*) as count
from bl group by pid);
quit;
%put &array_size;
data want;
set bl;
*by processing to control counter and output;
by pid;
format status_1-status_&array_size. $8.;
*at first id set variables to missing and counter to 0;
if first.pid then
do;
_counter=0;
call missing(of c10_formcompletedate_1-c10_formcompletedate_&array_size.);
call missing(of status_1-status_&array_size);
end;
*increment counter;
_counter+1;
*tell SAS to retain the values across the rwos;
retain c10_formcompletedate_1-c10_formcompletedate_&array_size. status_1-status_&array_size.;
*declare the arrays and sizes of the variables to be stored, max size was pre-computed;
array _c10fcd(&array_size)
c10_formcompletedate_1-c10_formcompletedate_&array_size.;
array _stat(&array_size) $ status_1-status_&array_size.;
*assign values in row;
_c10fcd(_counter)=c10_formcompletedate;
_stat(_counter)=status;
*output last value;
if last.pid then
output;
*format data;
format c10_formcompletedate_1-c10_formcompletedate_&array_size. yymmddd10.;
*drop unncessary variables;
drop c10_formcompletedate status _counter;
run;