Search code examples
stringperlpattern-matchingmatchbioperl

Find the number of matching two characters in a string in Perl


Is there a method in Perl (not BioPerl) to find the number of each two consecutive letters.

I.e., number of AA, AC, AG, AT, CC, CA, ... in a sequence like this:

$sequence = 'AACGTACTGACGTACTGGTTGGTACGA'

PS: We can make it manually by using the regular expression, i.e., $GC=($sequence=~s/GC/GC/g) which return the number of GC in the sequence.

I need an automated and generic way.


Solution

  • You had me confused for a while, but I take it you want to count the dinucleotides in a given string.

    Code:

    my @dinucs = qw(AA AC AG CC CA CG);
    my %count;
    my $sequence = 'AACGTACTGACGTACTGGTTGGTACGA';
    
    for my $dinuc (@dinucs) {
        $count{$dinuc} = ($sequence =~ s/\Q$dinuc\E/$dinuc/g);
    }
    

    Output from Data::Dumper:

    $VAR1 = {
              "AC" => 5,
              "CC" => "",
              "AG" => "",
              "AA" => 1,
              "CG" => 3,
              "CA" => ""
            };