Search code examples
bashseq

How to determine if every three lines has the desired number of letters?


I have a big data set and would like to check if every third line has the desired number of bases.

example:

line 1
line 2
ATTGAC
line 4
line 5
TTCGGATC
line 7
line 8
GGTCAA

So line 6 contains 8 bases instead of 6. I would like my script to stop if this is the case.


Solution

  • Sounds like a job for awk:

    awk 'NR % 3 == 0 && length($0) != 6 { print "line " NR " is the wrong length"; exit }' file
    

    When the record number NR is a multiple of 3 and the length of the line isn't 6, print the message and exit.

    Output from your example (assuming that all those blank lines aren't supposed to be there):

    $ awk 'NR % 3 == 0 && length($0) != 6 { print "line " NR " is the wrong length"; exit }' file
    line 6 is the wrong length