Search code examples
sastranspose

How to use an array when transposing from long to wide format? Array subscript out of range error


I'm trying to use an array to transpose a dataset from long to wide format. I would like to pull the form completion date for a repeating form into several columns so that each row will just have one participant, with all their forms on one row. Since I have to do this for several variables, I would like to avoid having to use proc transpose, so am trying to use an array to transform the data from long to wide format. I was following SAS' guide on using an array to transpose but am having trouble replicating it to my dataset. Specifically I am getting the error of "Array subscript out of range at line 39 column 4".

The following is my code that I am having trouble troubleshooting.

    proc sort data=bl out=longsrt;
    by pid;
    run;

    data wide;
        set longsrt;
        by pid;
    keep pid c10:;
    retain c10:;

    array ac10_formcompletedate{*} c10_formcompletedate;

    if first.pid then
    do;
        do i = 1 to 500;
            ac10_formcompletedate ( i ) = .;
        end;

    end;

    ac10_formcompletedate( redcap_event_name_bl ) = c10_formcompletedate;

    if last.pid then output;

    run;

Here's a small example of what my dataset looks like (For those familiar with REDCap, it is a REDCap export file):

data bl;
input pid redcap_event_name_bl c10_formcompletedate;
cards;
10001 baseline 2023-10-18
10001 baseline .
10001 baseline .
10001 baseline 2023-10-19
10002 baseline .
10002 baseline 2023-10-18
10002 baseline 2023-10-18
10003 baseline .
10003 baseline 2023-10-18
;
run;

Solution

  • Here is an example of how this should be done from start to finish, assuming I've guessed your output correctly.

    data bl;
        infile cards truncover;
        input pid redcap_event_name_bl $ c10_formcompletedate : yymmdd10. status $;
        format c10_formcompletedate yymmddd10.;
        cards;
    10001 baseline 2023-10-18 TESTA
    10001 baseline .          
    10001 baseline .
    10001 baseline 2023-10-19 TESTB
    10002 baseline .
    10002 baseline 2023-10-18 TESTA
    10002 baseline 2023-10-18 TESTB
    10003 baseline .
    10003 baseline 2023-10-18 TESTA
    ;
    run;
    
    *determine maximum size of array required;
    proc sql noprint;
        select max(count) into :array_size TRIMMED from (select pid, count(*) as count 
            from bl group by pid);
    quit;
    
    %put &array_size;
    
    data want;
        set bl;
        *by processing to control counter and output;
        by pid;
        format status_1-status_&array_size. $8.;
        *at first id set variables to missing and counter to 0;
    
        if first.pid then
            do;
                _counter=0;
                call missing(of c10_formcompletedate_1-c10_formcompletedate_&array_size.);
                call missing(of status_1-status_&array_size);
            end;
        *increment counter;
        _counter+1;
        *tell SAS to retain the values across the rwos;
        retain c10_formcompletedate_1-c10_formcompletedate_&array_size. status_1-status_&array_size.;
        *declare the arrays and sizes of the variables to be stored, max size was pre-computed;
        array _c10fcd(&array_size) 
            c10_formcompletedate_1-c10_formcompletedate_&array_size.;
        array _stat(&array_size) $ status_1-status_&array_size.;
        *assign values in row;
        _c10fcd(_counter)=c10_formcompletedate;
        _stat(_counter)=status;
        *output last value;
    
        if last.pid then
            output;
        *format data;
        format c10_formcompletedate_1-c10_formcompletedate_&array_size. yymmddd10.;
        *drop unncessary variables;
        drop c10_formcompletedate status _counter;
    run;