Search code examples
perlrandomadditionidentifierfasta

Want to add random string to identifier line in fasta file


I want to add random string to existing identifier line in fasta file. So I get:

MMETSP0259|AmphidiniumcarteCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Then the sequence on the next lines as normal. I am have problem with i think in the format output. This is what I get:

MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
TCTGGGAAAGGTTGCTATCATGAGTCATAGAATaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac

It's added to every line. (I altered length to fit here.) I want just to add to the identifier line.

This is what i have so far:

use strict;
use warnings;
my $currentId = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";

my $header_line;
my $seq;
my $uniqueID;

open (my $fh,"$ARGV[0]") or die "Failed to open file: $!\n";
open (my $out_fh, ">$ARGV[0]_longer_ID_MMETSP.fasta");

while( <$fh> ){
    if ($_ =~ m/^(\S+)\s+(.*)/) {
        $header_line = $1;
        $seq = $2;
        $uniqueID = $currentId++;
        print $out_fh "$header_line$uniqueID\n$seq";
    } # if
} # while

close $fh;
close $out_fh;

Thanks very much, any ideas will be greatly appreciated.


Solution

  • Your program isn't working because the regex ^(\S+)\s+(.*) matches every line in the input file. For instance, \S+ matches CTTCATCGCACATGGATAACTGTGTACCTGACT; the newline at the end of the line matches \s+; and nothing matches .*.

    Here's how I would encode your solution. It simply appends $current_id to the end of any line that contains a pipe | character

    use strict;
    use warnings;
    use 5.010;
    use autodie;
    
    my ($filename) = @ARGV;
    
    my $current_id = 'a' x 57;
    
    open my $in_fh,  '<', $filename;
    open my $out_fh, '>', "${filename}_longer_ID_MMETSP.fasta";
    
    while ( my $line = <$in_fh> ) {
        chomp $line;
        $line .= $current_id if $line =~ tr/|//;
        print $line, "\n";
    }
    
    close $out_fh;
    

    output

    MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    CTTCATCGCACATGGATAACTGTGTACCTGACT
    TCTGGGAAAGGTTGCTATCATGAGTCATAGAAT