Search code examples
csvawkdelimiter

Using AWK to split elements separated by " "


I have a .csv file which contains three elements that I want to further separate. The rows in the file look like this:

gene_id "ENSDARG00000104632", gene_version "2", gene_name "RERG"
gene_id "ENSDARG00000104632", gene_version "2", transcript_id "ENSDART00000166186"
gene_id "ENSDARG00000104632", gene_version "2", transcript_id "ENSDART00000166186"

I want to take the strings in " " and make them into their own elements separated by ,

Basically I want it to look like this:

gene_id, ENSDARG00000104632, gene_version, 2, gene_name, RERG
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186

I had thought to do it like this:

awk 'BEGIN{OFS=",";FS="""};{print $1,$2,$3,$4,$5,$6}'

However, it seems AWK cannot recognize " as a delimiter. Does anyone have a recommendation as to how to achieve this?


Solution

  • $ awk -F'[ ",]+' -v OFS=', ' '{sub(/"$/,""); $1=$1} 1' file
    gene_id, ENSDARG00000104632, gene_version, 2, gene_name, RERG
    gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186
    gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186