I have a 32,678kb encrypted bin file which I need the entropy of. I am using Perl as its part of a larger project.
I have so far used the following 'technique':
use Shannon::Entropy qw/entropy/;
my $file = "test.bin";
open(my $bin, "<", $file) or die $!; binmode $bin;
seek($bin, 0x000000, 0);
read($bin, my $entropy, 0x01FFFFF0);
print entropy($entropy);
This yields an almost infinite wait time, to the point where I give up after 30+ minutes.
I cannot deviate from testing the entire file's entropy.
Is there any quicker way? Would splitting it, entropy-ing it and using some weird math to combine again result in the same entropy as if it were one file?
Here is the entropy function re written to avoid all the map calls
sub entropy {
my ($entropy, $len, $p, %t) = (0, length($_[0]));
my @chars = split '', $_[0];
$t{$_}++ foreach @chars;
foreach (values %t) {
$p = $_/$len;
$entropy -= $p * log $p ;
}
return $entropy / log 2;
}
It may work out faster for you
I've had second thoughts about this. You don't actually need to slurp the file into memory. $len
is the length of the file which can be got from -s $file_name
and %t
is the frequency table which can be calculated by reading in a block at a time. So a version of the function to calculate the entropy of a file would be
sub file_entropy {
my ($file_name) = @_;
# Get number of bytes in file
my $len = -s $file_name;
my ($entropy, %t) = 0;
open (my $file, '<:raw', $file_name) || die "Cant open $file_name\n";
# Read in file 1024 bytes at a time to create frequancy table
while( read( $file, my $buffer, 1024) ) {
$t{$_}++
foreach split '', $buffer;
$buffer = '';
}
foreach (values %t) {
my $p = $_/$len;
$entropy -= $p * log $p ;
}
return $entropy / log 2;
}