Search code examples
perlfrequency-analysisgetc

Calculate Character Frequency in Message using Perl


I am writing a Perl Script to find out the frequency of occurrence of characters in a message. Here is the logic I am following:

  • Read one char at a time from the message using getc() and store it into an array.
  • Run a for loop starting from index 0 to the length of this array.
  • This loop will read each char of the array and assign it to a temp variable.
  • Run another for loop nested in the above, which will run from the index of the character being tested till the length of the array.
  • Using a string comparison between this character and the current array indexed char, a counter is incremented if they are equal.
  • After completion of inner For Loop, I am printing the frequency of the char for debug purposes.

Question: I don't want the program to recompute the frequency of a character if it's already been calculated. For instance, if character "a" occurs 3 times, for the first run, it calculates the correct frequency. However, at the next occurrence of "a", since loop runs from that index till the end, the frequency is (actual freq -1). Similary for the third occurrence, frequency is (actual freq -2).

To solve this. I used another temp array to which I would push the char whose frequency is already evaluated.

And then at the next run of for loop, before entering the inner for loop, I compare the current char with the array of evaluated chars and set a flag. Based on that flag, the inner for loop runs.

This is not working for me. Still the same results.

Here's the code I have written to accomplish the above:

#!/usr/bin/perl

use strict;
use warnings;

my $input=$ARGV[0];
my ($c,$ch,$flag,$s,@arr,@temp);

open(INPUT,"<$input");

while(defined($c = getc(INPUT)))
{
push(@arr,$c);
}

close(INPUT);

my $length=$#arr+1;

for(my $i=0;$i<$length;$i++)
{
$count=0;
$flag=0;
$ch=$arr[$i];
foreach $s (@temp)
{
    if($ch eq $s)
    {
        $flag = 1;
    }
}
if($flag == 0)
{
for(my $k=$i;$k<$length;$k++)
{
    if($ch eq $arr[$k])
    {
        $count = $count+1;
    }
}
push(@temp,$ch);
print "The character \"".$ch."\" appears ".$count." number of times in the         message"."\n";
}
}

Solution

  • If you want to do a single character count for the whole file then use any of the suggested methods posted by the others. If you want a count of all the occurances of each character in a file then I propose:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    # read in the contents of the file
    my $contents;
    open(TMP, "<$ARGV[0]") or die ("Failed to open $ARGV[0]: $!");
    {
        local($/) = undef;
        $contents = <TMP>;
    }
    close(TMP);
    
    # split the contents around each character
    my @bits = split(//, $contents);
    
    # build the hash of each character with it's respective count
    my %counts = map { 
        # use lc($_) to make the search case-insensitive
        my $foo = $_; 
    
        # filter out newlines
        $_ ne "\n" ? 
            ($foo => scalar grep {$_ eq $foo} @bits) :
            () } @bits;
    
    # reverse sort (highest first) the hash values and print
    foreach(reverse sort {$counts{$a} <=> $counts{$b}} keys %counts) {
        print "$_: $counts{$_}\n";
    }