I have a SAS dataset, let us say
it has 4 columns A,B,C,D and the values
A = x
B = x
C = x
**D = x,y**
Here column D has two values inside a single column while converting it into CSV format it generates a new column with the value Y. How to avoid this and to convert SAS dataset into CSV file?
* get some test records in a file;
Data _null_;
file 'c:\tmp\test.txt' lrecl=80;
put '1,22,Hans Olsen,Denmark,333,4';
put '1111,2,Turner, Alfred,England,3333,4';
put '1,222,Horst Mayer,Germany,3,4444';
run;
* Read the file as a delimited file;
data test; infile 'c:\tmp\test.txt' dsd dlm=',' missover;
length v1 v2 8 v3 v4 $40 v5 v6 8;
input
'V1'n : ?? BEST5.
'V2'n : ?? BEST5.
'V3'n : $CHAR40.
'V4'n : $CHAR40.
'V5'n : ?? BEST5.
'V6'n : ?? BEST5.;
run;
* Read the file and write another file.
* If 6 delimiters and not 5, change the third to #;
data test2;
infile 'c:\tmp\test.txt' lrecl=80 truncover;
file 'c:\tmp\test2.txt' lrecl=80;
length rec $80;
drop pos len;
input rec $char80.;
if count(rec,',') = 6 then do;
call scan(rec,4,pos,len,',');
substr(rec,pos-1,1) = '','';
end;
put rec;
run;
* Read the new file as a delimited file;
data test2; infile 'c:\tmp\test2.txt' dsd dlm=',' missover;
length v1 v2 8 v3 v4 $40 v5 v6 8;
input
'V1'n : ?? BEST5.
'V2'n : ?? BEST5.
'V3'n : $CHAR40.
'V4'n : $CHAR40.
'V5'n : ?? BEST5.
'V6'n : ?? BEST5.;
run;
In this code, it add '#' but I want ',' itself in the output. Could anyone please guide me to do that? Thanks in advance!!
It sounds like you are starting with an improperly created CSV file.
1,22,Hans Olsen,Denmark,333,4
1111,2,Turner, Alfred,England,3333,4
1,222,Horst Mayer,Germany,3,4444
That should have been made like this:
1,22,Hans Olsen,Denmark,333,4
1111,2,"Turner, Alfred",England,3333,4
1,222,Horst Mayer,Germany,3,4444
If you are positive that you know that the only field with embedded commas is the third then you can use a data step to read it in and generate a valid file.
data _null_;
infile bad dsd truncover ;
file good dsd ;
length v1-v6 dummy $200;
input v1-v2 @;
do i=1 to countw(_infile_,',','q')-5;
input dummy @;
v3=catx(', ',v3,dummy);
end;
input v4-v6 ;
put v1-v6 ;
run;
Once you have a properly formatted CSV file then it is easy to read.
data want;
infile good dsd truncover ;
length v1-v2 8 v3-v4 $40 v5-v6 8;
input v1-v6 ;
run;
But if the extra comma could be in any field then you will probably need to have a human fix those lines.