Search code examples
datesascharactertranspose

SAS Proc transpose wide to long multiple variables


I had some data with a long text string that contained data with certain dates within them. I parsed out each individual text section and the corresponding date into different columns. Each ID will have repeats of each character variable. I would like to transpose these variables from wide to long and match them to their correct ID, while also using one character variable and date per observation. I know that description doesn't do it justice so I will post some sample data below.

data test;
length status_01 $175 status_02 $175 status_03 $175;
infile datalines dsd dlm="|" truncover;
input ID Status_01$ date_01 :mmddyy10. Status_02$ date_02 :mmddyy10. Status_03$ date_03 :mmddyy10.;
format date_01 date_02 date_03 mmddyy10.;
datalines;
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23 |example status on 02/06/24 with even more text| 02-06-24 |example status on 03/11/24 with yet again more text| 03-11-24
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23 |example status on 02/06/24 with even more text| 02-06-24 |example status on 03/11/24 with yet again more text| 03-11-24
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23 |example status on 02/06/24 with even more text| 02-06-24 |example status on 03/11/24 with yet again more text| 03-11-24
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23 |example status on 12/16/23 with even more text| 12-16-23 |example status on 12/24/23 with yet again more text| 12-24-23
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23 |example status on 12/16/23 with even more text| 12-16-23 |example status on 12/24/23 with yet again more text| 12-24-23
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23 |example status on 12/16/23 with even more text| 12-16-23 |example status on 12/24/23 with yet again more text| 12-24-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23 |example status on 05/29/23 with even more text| 05-29-23 |example status on 07/10/23 with yet again more text| 07-10-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23 |example status on 05/29/23 with even more text| 05-29-23 |example status on 07/10/23 with yet again more text| 07-10-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23 |example status on 05/29/23 with even more text| 05-29-23 |example status on 07/10/23 with yet again more text| 07-10-23
;
run;

I know this is a little confusing for data this was the simplest I could make the data based upon the original dataset. I would like the final product to look like the below:

data test_01;
length status_01 $175;
infile datalines dsd dlm="|" truncover;
input ID Status_01$ date_01 :mmddyy10.;
format date_01 mmddyy10.;
datalines;
1 |example status on 01/29/23 with more text to emphasize this is a text string| 01-29-23
1 |example status on 02/06/24 with even more text| 02-06-24
1 |example status on 03/11/24 with yet again more text| 03-11-24
2 |example status on 07/17/23 with more text to emphasize this is a text string| 07-17-23
2 |example status on 12/16/23 with even more text| 12-16-23
2 |example status on 12/24/23 with yet again more text| 12-24-23
3 |example status on 04/26/23 with more text to emphasize this is a text string| 04-26-23
3 |example status on 05/29/23 with even more text| 05-29-23
3 |example status on 07/10/23 with yet again more text| 07-10-23
;
run;

Thank you so much in advance!!!


Solution

  • In general for that simply an issue just use a couple of arrays and a DO loop. Assign the values from the array to NEW variables. For the DATE variable make sure to attach a format (preferable one that will print the dates in way that will not confuse half of your audience.)

    data want;
      set test;
      array c status_01-status_03;
      array d date_01-date_03;
      do index=1 to dim(c);
        status=c[index];
        date=d[index];
        output;
      end;
      format date yymmdd10.;
      drop status_01-status_03 date_01-date_03;
    run;
    

    But it appears that you have the same data replicated three times in your input dataset. Is there a reason for that? With your example data you could just add these two statements after the SET statement so that only the first copy is used for each value of ID.

    by id;
    if first.id;
    

    Results:

    enter image description here

    If there is no other way to remove the duplicates you might try using PROC SORT with the NODUPKEY option.