Search code examples
csvperlawksed

Remove double quotes if delimiter value is not present in data


An input file is given, each line of which contains quotes for each column and carriage return/ new line character.

  • If the line contains new lines it has be appended with in the same line which is inside the quotes i.e for example line 1

  • Removing of double quotes for each column if the delimiter(,) is not present.

  • Removing of Carriage Return characters i.e(^M)

To exemplify, given the following input file

"name","address","age"^M
"ram","abcd,^M
def","10"^M
"abhi","xyz","25"^M
"ad","ram,John","35"^M

I would like to obtain the following output by means of a sed/perl/awk script/oneliner.

name,address,age
ram,"abcd,def",10
abhi,xyz,25
ad,"ram,John",35

Solutions which i have tired it so far For appending with previous line

sed '/^[^"]*"[^"]*$/{N;s/\n//}' sample.txt

for replacing control-m characters

perl -pne 's/\\r//g' sample.txt

But i didn't achieve final output what i required below


Solution

  • This might work for you (GNU sed):

    sed ':a;/[^"]$/{N;s/\n//;ba};s/"\([^",]*\)"/\1/g' file
    

    The solution is in two parts:

    1. Join broken lines to make whole ones.
    2. Remove double quotes surrounding fields that do not contain commas.

    If the current line does not end with double quotes, append the next line, remove the newline and repeat. Otherwise: remove double quotes surrounding fields that do not contain double quotes or commas.

    N.B. Supposes that fields do not contain quoted double quotes. If that is the case, the condition for the first step would need to be amended and double quotes within fields would need to catered for.