Search code examples
awkcarriage-return

Weird behaviour awk last character "]"


I encountered a weird awk behaviour when trying to bracketerise some column in my file:

batch ID        tumor_reads2_fastqgzs   tumor_reads_fastqgzs    tumor_reads2_fastqgzs ID        tumor_reads_fastqgzs ID
9_S8    9_S8_L001_R2_001.fastq.gz       9_S8_L001_R1_001.fastq.gz       file-Fk3BgVj4yBGZqQvF2VV2Q2Z4   file-Fk3BgfQ4yBGYz756BGvbzkP8
7_S6    7_S6_L001_R2_001.fastq.gz       7_S6_L001_R1_001.fastq.gz       file-Fk3Bg884yBGYF4xXJjpf08f8   file-Fk3Bg1j4yBGvbp9VK2ZQ76G3
10_S9   10_S9_L001_R2_001.fastq.gz      10_S9_L001_R1_001.fastq.gz      file-Fk3Bfg84yBGq9g7zJk5kv435   file-Fk3BfVQ4yBGxxPZy6pjxv635
3_S3    3_S3_L001_R2_001.fastq.gz       3_S3_L001_R1_001.fastq.gz       file-Fk3Bf3Q4yBGq9g7zJk5kv42z   file-Fk3BfB04yBGYz756BGvbzkGk
15_S14  15_S14_L001_R2_001.fastq.gz     15_S14_L001_R1_001.fastq.gz     file-Fk3Bbp04yBGkyPqy2073BKf7   file-Fk3BbV84yBGq00fKK3j5KjG5

Here is my file, and i wanted to put brackets around columns 4 and 5 so:

awk -v OFS="\t" '{if($0 ~ /^batch/){print $0}else{print $1, $2, $3, "["$4"]", "["$5"]";}}' myfile

But it outputted this:

batch ID        tumor_reads2_fastqgzs   tumor_reads_fastqgzs    tumor_reads2_fastqgzs ID        tumor_reads_fastqgzs ID
]_S8    9_S8_L001_R2_001.fastq.gz       9_S8_L001_R1_001.fastq.gz       [file-Fk3BgVj4yBGZqQvF2VV2Q2Z4] [file-Fk3BgfQ4yBGYz756BGvbzkP8
]_S6    7_S6_L001_R2_001.fastq.gz       7_S6_L001_R1_001.fastq.gz       [file-Fk3Bg884yBGYF4xXJjpf08f8] [file-Fk3Bg1j4yBGvbp9VK2ZQ76G3
]0_S9   10_S9_L001_R2_001.fastq.gz      10_S9_L001_R1_001.fastq.gz      [file-Fk3Bfg84yBGq9g7zJk5kv435] [file-Fk3BfVQ4yBGxxPZy6pjxv635
]_S3    3_S3_L001_R2_001.fastq.gz       3_S3_L001_R1_001.fastq.gz       [file-Fk3Bf3Q4yBGq9g7zJk5kv42z] [file-Fk3BfB04yBGYz756BGvbzkGk
]5_S14  15_S14_L001_R2_001.fastq.gz     15_S14_L001_R1_001.fastq.gz     [file-Fk3Bbp04yBGkyPqy2073BKf7] [file-Fk3BbV84yBGq00fKK3j5KjG5

The last bracket replaces the first character for some reason. Any idea why? How can i fix that? I tried using sub() also but it did the same thing


Solution

  • Your code worked for me. IMHO there are chances of having control M characters in your Input_file, so you could add a piece of code to remove them, could you please try following.

    awk -v OFS="\t" '{gsub(/\r/,"");if($0 ~ /^batch/){print $0}else{print $1, $2, $3, "["$4"]", "["$5"]";}}' Input_file