Search code examples
arraysperlbioinformaticsfasta

Convert string into array perl


I have a script which takes headers of a multi-fasta file and pushes them into an array. Then I want to loop through this array to find a specific pattern and perform some commands.

open(FH, '<', $ref_seq) or die $!;
while(<FH>){

    $line = $_;
    chomp $line;
    if(m/^>([^\s]+)/){
        $ref_header = $1;
        print "$ref_header\n";
        chomp $header;
        if($1 eq $header){
            $ref_header = $header;
            #print "header is $ref_header\n";
        } 
    } 
}

This code prints headers like

chr1
chr2
chr3

How can I push these headers into an array?

I tried following code, but it splits individual letters, instead of $header_array[0] being chr1

@header_array = split(/\n*/, $ref_header);
            print ("Here's the first element $header_array[0]");

Any help will be appreciated.


Solution

  • Shorten the code as shown below, removing some extra statements, and use push. You can combine push and the pattern match:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    use Carp;
    
    my $in_file = shift;
    my @headers;
    
    open my $in_fh, '<', $in_file or croak "cannot open $in_file: $!";
    while ( <$in_fh> ) {
        push @headers, />(\S+)/;
    }
    close $in_fh or croak "cannot close $in_file: $!";
    
    print "@headers";
    
    # Now, loop through headers and select the ones you need, for example:
    
    for my $header ( @headers ) {
        if ( $header =~ /foo/ ) {
            # do something
        }
    }
    

    A few suggestion on fixing your original code are below:

    # Always use strict and use warnings.
    
    # Remove extra parens and make the error message more informative:
    open(FH, '<', $ref_seq) or die $!;
    while(<FH>){
    
        $line = $_;
        chomp $line;
        # [^\s] is simply \S:
        if(m/^>([^\s]+)/){
            $ref_header = $1;
            print "$ref_header\n";
            # where is $header coming from?
            chomp $header;
            # if the condition is satisfied, this assignment does not make sense:
            # $ref_header is already the same as $header:
            if($1 eq $header){
                $ref_header = $header;
                #print "header is $ref_header\n";
            } 
        } 
    }