Search code examples
regexrakumultiset

Perl6: Confused about BagHash/Matching


I'm attempting to count matches of a regex using a BagHash, and getting odd results.

my $fh = open "versions.txt";
my $versions = BagHash.new();

while (defined my $line = $fh.get) {
    my $last = '';
    if $line ~~ /(\d+)\.?(\d*)/ {
        say 'match ' ~ $/[0];
        if $last !eq  $/[0] {
            say 'not-same: ' ~ $/[0];
            $versions{$/[0]}++
        }
        $last = $/[0];
    }
    else {
        $last = '';
   }

}

say 'count: ' ~ $versions.elems;

Output is:

match 234
not-same: 234
match 999
not-same 999
count: 1 # I expect 2 here. 

The test case I'm working with is:

version history thingy

version=234.234
version=999

What am I missing?


Solution

  • You are resetting $last with each iteration. Also, don't trust say. It's meant to be used to avoid flooding a terminal or logfile with infinite lists. Use dd (Rakudo internal) or a module to dump debug output. If you would have used dd would would have seen that $/[0] contains a Match, a complex structure that is not suited to generate Hash keys.

    # my @lines = slurp('version.txt');
    my @lines = ('version=234.234', 'version=999');
    my BagHash $versions.=new;
    for @lines {
        ENTER my $last = '';
        if .Str ~~ /(\d+) '.'? (\d*)/ {
            $versions{$0.Str}++ if $last ne $0.Str;
            $last = $0.Str
        }else{
            $last = ''
        }
    };
    
    dd $versions;
    # OUTPUT«BagHash $versions = ("234"=>1,"999"=>1).BagHash␤»
    

    The whole point of BagHash is that it's constructor will do the counting for you. If you supply lazy lists all the way down, this can be fairly efficient.

    my @lines = ('version=234.234', 'version=999');
    dd BagHash.new(@lines».split('=')».[1]);
    # OUTPUT«("234.234"=>1,"999"=>1).BagHash␤»