Search code examples
bashfastqsequencing

Grep outputting strange characters using -A and -B flags for fastq analysis


I have a file that looks like this:

@HISEQ:331:C85AMANXX:8:1101:16636:1980 1:N:0:ATCACGAC
NTCTATAAACTCTTCATGCCAGTTCCCTGTCTCATCAGATAGATTCTGAGGCCTCTAGGCATCAGCCGGATATCCCTAAGGACAGTGTTGGAGGAACTGCTGAGTGGATTCATGGTCAACTACCAA
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFF
@HISEQ:331:C85AMANXX:8:1101:2337:2047 1:N:0:ATCACGAC
CTGTGAAAACTCTTCATGCCAGTTCCCTGTCTCATCAGATAGATTCTGAGGCCTCTAGGCATCAGCCGGATATCCCTAAGGACAGTGTTGGAGGAACTGCTGAGTGGATTCATGGTCAACTACCAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFF<FFF<FF<BFFFF<FFFFBFFFBFFFFF<B

I am using the following grep command:

grep -B 1 -A 2 'AGGCATCAGCCGGA' file.fastq | head > out.fastq

And the output looks like this where you can see that on lines 5 and 10 that two dashes are output and I would not like it to be so:

@HISEQ:331:C85AMANXX:8:1101:16636:1980 1:N:0:ATCACGAC
NTCTATAAACTCTTCATGCCAGTTCCCTGTCTCATCAGATAGATTCTGAGGCCTCTAGGCATCAGCCGGATATCCCTAAGGACAGTGTTGGAGGAACTGCTGAGTGGATTCATGGTCAACTACCAA
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFF
--
@HISEQ:331:C85AMANXX:8:1101:2337:2047 1:N:0:ATCACGAC
CTGTGAAAACTCTTCATGCCAGTTCCCTGTCTCATCAGATAGATTCTGAGGCCTCTAGGCATCAGCCGGATATCCCTAAGGACAGTGTTGGAGGAACTGCTGAGTGGATTCATGGTCAACTACCAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFF<FFF<FF<BFFFF<FFFFBFFFBFFFFF<B
--

Is there a way to output without the dashes on lines 5 and 10?


Solution

  • By default grep separates context groups by the separator --. There may be more than one match in one block so the amount of lines is not constant. The separator will show where the blocks begin and end.

    You can add the option --no-group-separator to suppress this functionality, if available on your version of grep.