Search code examples
regexperlpattern-matchingfastadna-sequence

I need search a pattern in a header line of my file and concatenates the next line with Perl


My multi-fasta archive is in this format:

>miRNA65 dvex2345
CGATGCTAGATGCTATGACAACGATGCCTCG-G
>miRNA60 dvex1234
T-TAA-ACTCATCATCATCATACTCATCATCATCATCAGCATATTAACAAG
>miRNA65 dvex2345
T-TAA-ACTTATCATCATCATACTCATCATCATCATCAGCATATTAACAAG

I am new in Perl and I need to search the equals '> lines' and concatenate the next line to join the sequence.

I'm expecting the following output for the above file:

>miRNA60 dvex1234
T-TAA-ACTCATCATCATCATACTCATCATCATCATCAGCATATTAACAAG
>miRNA65 dvex2345
T-TAA-ACTTATCATCATCATACTCATCATCATCATCAGCATATTAACAAG.CGATGCTAGATGCTATGACAACGATGCCTCG-G

What is the best way to get this done?


Solution

  • %hash;
    while (<DATA>) {
            if (/^>(miRNA\d+)/) {
                    $hash{$1}[0] = $_;
                    chomp($n = <DATA>);
                    unshift @{$hash{$1}[1]}, $n;
            }
    }
    
    for $k (sort keys %hash) {
            print $hash{$k}[0], join(',', @{$hash{$k}[1]}), "\n";
    }
    __DATA__
    >miRNA65 dvex2345
    CGATGCTAGATGCTATGACAACGATGCCTCG-G
    >miRNA60 dvex1234
    T-TAA-ACTCATCATCATCATACTCATCATCATCATCAGCATATTAACAAG
    >miRNA65 dvex2345
    T-TAA-ACTTATCATCATCATACTCATCATCATCATCAGCATATTAACAAG