Search code examples
perlloopssequencebioinformaticsfasta

extract first sequence only from a fasta file


I want to extract the first sequence only from a fasta file of multiple sequences. I have this code below but i cant get the loops just right to work with one another.

while (my $line = <$in_fh>) {
    chomp $line;
    for (my $i = 1; $i <= 1; $i++) {
        print $out_fh $line;
    }
}

close $out_fh;

I think its getting mixed up in the while loop but no matter what i try its not correct. I tried moving the for loop outside for example but it didnt work. Is it the type of loop? Thanks very much for all pointers.


Solution

  • Since each fasta record header starts with > and the sequence shouldn't ever have that character in it. It should be safe to keep reading lines until you see the 2nd line that starts with >.

    my $line = <$in_fh>;
    #print first line no matter what
    print $line;
    
    while($line = <$in_fh>){
      #line must start with ">"
      unless( $line =~/^>.+/){
         print $line;
      }else{
        last;  #skip to the end
     }
    

    }