Search code examples
awkseddelimiterquote

awk ignore delimiter inside single quote within a parenthesis


I have a set of data inside the csv as below:

 Given Data:
 (12,'hello','this girl,is lovely(adorable \r\n actually)',goodbye),
 (13,'hello','this fruit,is super tasty (sweet actually)',goodbye)

I want to print the given data into 2 rows starting from ( till ) and ignore delimiter , and () inside the ' ' field.

How can I do this using awk or sed in linux?

Expected result as below:

 Expected Result: 
 row 1 = 12,'hello','this girl,is lovely(adorable actually)',goodbye
 row 2 = 13,'hello','this fruit,is super tasty (sweet actually)',goodbye

UPDATE: I just noticed that there are a comma between the 2 rows. So how can i separate it into 2 rows using the , after ) and before (?


Solution

  • You can use the following awk command to achieve your goal:

    awk -i.bak '{str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;}' file.in
    

    tested on your input:

    enter image description here

    explanations:

    • -i.bak will take a backup of your file and
    • {str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;} will first remove the first and last parenthesis of your string before removing the \r,\n and printing it in the format you want
    • you might need to add before the {...} the following condition if you have a header NR>1 -> 'NR>1{str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;}'

    following the changes in your requirements, I have adapted the awk command to be able to take into account your , as a record separator (row separator)

    awk -i.bak 'BEGIN{RS=",\n|\n"}{str=substr($0,2,length($0)-2); gsub("\\\\r ?|\\\\n ?","",str); print "row "NR" = "str;}' file.in
    

    where BEGIN{RS=",\n|\n"} defines your row separator constraint