Search code examples
regexstringperlletters

finding certain letter sequences in a string of letters


I have a string of letters and I need to find certain letter sequences ex. BAENNN (N can be any letter of the alphabet) or BAEMOP and the position where that letter sequence ends. So output should be the sequence and the position it ends. There can be multiple letter sequences in the string of letters, just in different positions.

This is what I have so far:

#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;

my $string = a long string of letters
if $string =~ m/regex/; {
    print the repeat and the position where that letter sequence ends. 

what would be the regular expression code that I would need to put in? I would think it would be

m/(BAE[A-Z][A-Z][A-Z] | BAEMOP)/;
   print $1 

and then something to do with the pos() function. but i only get one value.

thank you guys for the help!!


Solution

  • if only runs once. You need a loop if you want to match multiple times. Also, you need to add the /g modifier to start the next match where the previous one left.

    Also note that BAEMOP is matched by BAENNN, so it's not needed in the regex.

    Your idea is right, but the syntax is wrong. Conditions need parentheses (unless in postfix modifiers), whitespace around | is not ignored by regex unless you use the /x modifier.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use feature qw{ say };
    
    #                      1         2
    #             12345678901234567890123456789
    my $string = 'AAABBBCCCDDDEEEBAEZZZXBAEABCZ';
    
    while ($string =~ /(BAE[A-Z]{3})/g) {
        say $1, ' at ', pos $string;
    }
    

    Output:

    BAEZZZ at 21
    BAEABC at 28
    

    If the sequences you're looking for can overlap, you'll need look-around assertions. See perlre for details.