I have a dataset that I need to transpose by a specific design that involves substring.
The original dataset (note: this is just a mock up dataset, in reality, there are 75 nt variables, the transposed notes variable always have an odd suffix, such as nt3, nt5, nt7...):
data have;
input ID $1. NT2 $38. NT3 $4. NT4 $38. NT5 $4.;
cards;
1 NOTES 12:13:44 03-16-2018 CODE: ABC AML NOTES 09:13:11 03-12-2018 CODE: OPI TEST
2 NOTES 04:25:09 01-04-2018 CODE: FDS IMD NOTES 03:25:10 01-09-2018 CODE: FGH TEST
3 NOTES 12:22:49 11-12-2018 CODE: DGH TEST NOTES 08:02:49 11-11-2018 CODE: LKO AML
4 NOTES 22:02:21 01-14-2018 CODE: MKL TEST NOTES 07:02:21 01-10-2018 CODE: LOP IMD
5 NOTES 09:01:36 01-23-2018 CODE: HJK TEST NOTES 09:01:56 01-23-2018 CODE: UIY TEST
;
run;
Transpose by ID to split out time , date, code and notes:
ID time date code notes
1 12:13:44 03-16-2018 ABC AML
1 09:13:11 03-12-2018 OPI TEST
2 04:25:09 01-04-2018 FDS IMD
2 03:25:10 01-09-2018 FGH TEST
3 12:22:49 11-12-2018 DGH TEST
3 08:02:49 11-11-2018 LKO AML
4 22:02:21 01-14-2018 MKL TEST
4 07:02:21 01-10-2018 LOP IMD
5 09:01:36 01-23-2018 HJK TEST
5 09:01:56 01-23-2018 UIY TEST
After transposing, ID will have multiple levels based on NT variables.
The code I have so far does not yield the desirable output:
data note11;
length note0 $50;
set note10;
array t{*} nt:;;
do _i = 1 to dim(t);
/*make sure notes are not blank*/
if not missing(t[_i]) then note0 = catx('/',note0,vname(t(_i)));
/*timestamp, date, code*/
if find(upcase(t[_i]),"NOTES") then do;
timestamp=substr(left(nt0), index(left(nt0), 'NOTES')+5, 9);
date=substr(left(nt0), 15, 10);
code = substr(left(nt0), index(left(nt0), 'CODE:')+5);
end;
end;
drop _i;
run;
Thank you for any help!
Use an array.
data note11;
set have;
array _nt{*} nt:;
do i = 1 to dim(_nt) by 2;
if not missing(_nt(i)) then do;
timestamp=input(scan(_nt(i), 2, " "), time8.);
date=input(scan(_nt(i), 3, " "), mmddyy10.);
code = scan(_nt(i), 5, " ");
notes = _nt(i+1);
output;
end;
end;
drop i nt:;
format timestamp time. date yymmddd10.;
run;