Search code examples
csvseddouble-quotes

sed convert double quoted csv comma delimited to non quoted pipe delimited


The input file 'input.file' is:

"col one",,,"col, two",,"col, three"
,"col one",,"col, two",,"col, three"
,,"col one","col, two",,"col, three"

The desired output file is:

col one|||col, two||col, three
|col one||col, two||col, three
||col one|col, two||col, three

This is what is in place so far:

sed -r 's/"([^"])/\1/g; s/,/|/g'  ./input.file

The objective is the first substitution

s/"([^"])/\1/g

would parse arbitrary fields delineated by " and copy them to the output and the second substitution

s/,/|/g

would replace the ',' not included in double quoted fields with '|'.


Solution

  • $ cat ip.txt 
    "col one",,,"col, two",,"col, three"
    ,"col one",,"col, two",,"col, three"
    ,,"col one","col, two",,"col, three"
    
    $ perl -pe 's/"[^"]+"(*SKIP)(*F)|,/|/g; s/"//g' ip.txt
    col one|||col, two||col, three
    |col one||col, two||col, three
    ||col one|col, two||col, three
    
    • "[^"]+"(*SKIP)(*F) skip the pattern "[^"]+" and look for any other alternate match provided
      • (*F) is short-form for (*FAIL) and (?!) can also be used instead
    • |, alternate pattern to match ,
    • |/g replace all , with |
    • s/"//g then delete all "


    Further Reading: