Search code examples
sastransposesas-macro

Transposing dataset by creating variables from substr


I have a dataset that I need to transpose by a specific design that involves substring.

The original dataset (note: this is just a mock up dataset, in reality, there are 75 nt variables, the transposed notes variable always have an odd suffix, such as nt3, nt5, nt7...):

data have;
    input ID $1. NT2 $38. NT3 $4. NT4 $38. NT5 $4.;
    cards;
1   NOTES 12:13:44 03-16-2018 CODE: ABC AML  NOTES 09:13:11 03-12-2018 CODE: OPI TEST
2   NOTES 04:25:09 01-04-2018 CODE: FDS IMD  NOTES 03:25:10 01-09-2018 CODE: FGH TEST
3   NOTES 12:22:49 11-12-2018 CODE: DGH TEST NOTES 08:02:49 11-11-2018 CODE: LKO AML
4   NOTES 22:02:21 01-14-2018 CODE: MKL TEST NOTES 07:02:21 01-10-2018 CODE: LOP IMD
5   NOTES 09:01:36 01-23-2018 CODE: HJK TEST NOTES 09:01:56 01-23-2018 CODE: UIY TEST
;
run;

enter image description here

Transpose by ID to split out time , date, code and notes:

ID    time          date   code notes
1    12:13:44    03-16-2018 ABC AML
1    09:13:11    03-12-2018 OPI TEST
2    04:25:09    01-04-2018 FDS IMD
2    03:25:10    01-09-2018 FGH TEST
3    12:22:49    11-12-2018 DGH TEST
3    08:02:49    11-11-2018 LKO AML
4    22:02:21    01-14-2018 MKL TEST
4    07:02:21    01-10-2018 LOP IMD
5    09:01:36    01-23-2018 HJK TEST
5    09:01:56    01-23-2018 UIY TEST

After transposing, ID will have multiple levels based on NT variables.

The code I have so far does not yield the desirable output:

data note11;
 length note0 $50;
 set note10;

  array t{*} nt:;;
    do _i = 1 to dim(t);
/*make sure notes are not blank*/
      if not missing(t[_i]) then note0 = catx('/',note0,vname(t(_i)));
/*timestamp, date, code*/
      if find(upcase(t[_i]),"NOTES") then do;
     timestamp=substr(left(nt0), index(left(nt0), 'NOTES')+5, 9);                                                                                                              
         date=substr(left(nt0), 15, 10);  
         code = substr(left(nt0), index(left(nt0), 'CODE:')+5);     
      end;

    end;
   drop _i;
run;

Thank you for any help!


Solution

  • Use an array.

    data note11;
     set have;
    
      array _nt{*} nt:;
        do i = 1 to dim(_nt) by 2;
    
          if not missing(_nt(i)) then do;
           timestamp=input(scan(_nt(i), 2, " "), time8.);                                                                                                              
             date=input(scan(_nt(i), 3, " "), mmddyy10.);  
             code = scan(_nt(i), 5, " ");   
             notes = _nt(i+1);
             output;
          end;
    
        end;
       drop i nt:;
       format timestamp time. date yymmddd10.;
    run;