Search code examples
sasdatasetdata-analysissas-macro

How to convert a SAS dataset into CSV file whereas a single filed in it has value with comma


I have a SAS dataset, let us say

it has 4 columns A,B,C,D and the values

A = x
B = x
C = x
**D = x,y**

Here column D has two values inside a single column while converting it into CSV format it generates a new column with the value Y. How to avoid this and to convert SAS dataset into CSV file?

 * get some test records in a file;
Data _null_;
    file 'c:\tmp\test.txt' lrecl=80;
    put '1,22,Hans Olsen,Denmark,333,4';
    put '1111,2,Turner, Alfred,England,3333,4';
    put '1,222,Horst Mayer,Germany,3,4444';
run;

* Read the file as a delimited file; 
data test; infile 'c:\tmp\test.txt' dsd dlm=',' missover;
    length v1 v2 8 v3 v4 $40 v5 v6 8;
    input
        'V1'n : ?? BEST5.
        'V2'n : ?? BEST5.
        'V3'n : $CHAR40.
        'V4'n : $CHAR40.
        'V5'n : ?? BEST5.
        'V6'n : ?? BEST5.;
run;

* Read the file and write another file.
* If 6 delimiters and not 5, change the third to #;
data test2; 
    infile 'c:\tmp\test.txt' lrecl=80 truncover;
    file 'c:\tmp\test2.txt' lrecl=80;
    length rec $80;
    drop pos len;
    input rec $char80.;
    if count(rec,',') = 6 then do;
        call scan(rec,4,pos,len,',');
        substr(rec,pos-1,1) = '','';
    end;
    put rec;
run;

* Read the new file as a delimited file; 
data test2; infile 'c:\tmp\test2.txt' dsd dlm=',' missover;
    length v1 v2 8 v3 v4 $40 v5 v6 8;
    input
        'V1'n : ?? BEST5.
        'V2'n : ?? BEST5.
        'V3'n : $CHAR40.
        'V4'n : $CHAR40.
        'V5'n : ?? BEST5.
        'V6'n : ?? BEST5.;
run;

In this code, it add '#' but I want ',' itself in the output. Could anyone please guide me to do that? Thanks in advance!!


Solution

  • It sounds like you are starting with an improperly created CSV file.

    1,22,Hans Olsen,Denmark,333,4
    1111,2,Turner, Alfred,England,3333,4
    1,222,Horst Mayer,Germany,3,4444
    

    That should have been made like this:

    1,22,Hans Olsen,Denmark,333,4
    1111,2,"Turner, Alfred",England,3333,4
    1,222,Horst Mayer,Germany,3,4444
    

    If you are positive that you know that the only field with embedded commas is the third then you can use a data step to read it in and generate a valid file.

    data _null_;
      infile bad dsd truncover ;
      file good dsd ;
      length v1-v6 dummy $200;
      input v1-v2  @;
      do i=1 to countw(_infile_,',','q')-5;
        input dummy @;
        v3=catx(', ',v3,dummy);
      end;
      input v4-v6 ;
      put v1-v6 ;
    run;
    

    Once you have a properly formatted CSV file then it is easy to read.

    data want;
      infile good dsd truncover ;
      length v1-v2 8 v3-v4 $40 v5-v6 8;
      input v1-v6 ;
    run;
    

    But if the extra comma could be in any field then you will probably need to have a human fix those lines.