Search code examples
perlhashnlptagging

Hash (Multihash?) Indexing (Perl)


I have a function that counts frequencies of Trigrams in text. No knowledge of Computational Linguistics required, I just need help with Perl code.

This is the Function:

sub extract_frequencies {
    for( my $i=0; $i<=$#tag; $i++ ) {
       $wordtagfreq{"$word[$i]\t$tag[$i]"}++;
       $tagfreq{$tag[$i]}++;
    }

    # count Tag-Trigramm-Frequencies
    my @start = ("<s>","<s>");
    unshift @tag, @start;  # korrigiert
    push @tag, "<s>";
    for( my $i=2; $i<=$#tag; $i++ ) {
        $ngramfreq[3]{"$tag[$i-2]\t$tag[$i-1]\t$tag[$i]"}++;
    }
 } 

The particular code points that I do not understand are the following:

1) $ngramfreq[3]

What does the Index on the hash means here? Do I count for each Tag separately? Is it the length of the key? What is my end key (3 different tag keys?)?

2) $i<=$#tag

What does $# in Perl mean?

Haven't used Perl in a while, so I hope some Perl Monks will help me.


Solution

  • [0] is an array index, nothing to do with a hash. This implies that ngramfreq is actually an array of hashes:

    my @ngramfreq = (
                         { tag => 1, fish => 3 },
                         { anothertag => 4 } 
                    );
    

    And thus $ngramfreq[0] gets you the first anon hash, and then you can access the tag.

    $#tag is the last index in the array @tag. So with 3 elements, it would be 2, because the array indicies are 0,1,2

    Data::Dumper is a good way of visualising a structure, to give you an idea of how it's layed out.

    perldoc perldsc is worth a read, as it expands on data structures.