Search code examples
perlbioinformaticsfasta

How do I remove the first line from a FASTA format file during input?


I want to remove the first line during input from a FASTA file, so that my program takes only the amino acid sequence as input.

The first line of a FASTA file starts with > and it contains the 'accession number' of the sequence and the source of it. E.g.:

>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken    
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID 
FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA 
DIDGDGQVNYEEFVQMMTAK*

Solution

  • Skip lines starting with >:

    while(<>) {
        next if /^>/;
        # ...
    }
    

    or, use $. (current input line number) to skip the first one:

    while(<>) {
        next if $. < 2;
        # ...
    }