Search code examples
perlout-of-memorybinaryfilesdynamic-arraysperl-data-structures

Perl :Out of Memory Error while building a 2d array at run time


I am perl beginner. I am trying to build a 2d array at run time from a binary file. I am getting a "out of memory" error. I am using Perl 5.16.3 in windows7. My input file size is ~4.2MB. My system has a physical memory of 4GB and I am hitting 90% usage and then showing up the out of memory error when I run this code.

I tried lot of ways to debug this. Only If I reduce the b32 to b16 or less, I am able to run successfully. Even with this, if the file size increase beyond 4MB, the error shows up again. I tried looking at physical memory usage in task manager while executing the code, it keep on increasing.

My friend suspected this should be memory leak issue. I couldnt make out with his suspect. I need help on fixing this.

#!/usr/bin/perl
use strict;
use warnings;

open( DATA, 'debug.bin' ) or die "Unable to open:$!";
binmode DATA;
my ( $data, $n, $i );
my @2dmatrix;
while ( $n = read DATA, $data, 4 ) {
    push @2dmatrix, [ split( '', unpack( 'b32', $data ) ) ];
}
print scalar(@2dmatrix);
print "completed reading";
close(DATA);

Just to clear the requirement. From the 2d array build, I need to extract contents from a column A corresponding to a particular pattern (11111111000000001111111100000000) in column B. This needs to be done on 4 set of columns with a file size of 500Mb.


Solution

  • It's not a memory leak, your program is just very inefficient with memory use.

    For every 4 bytes you read in, you do an unpack 'b32' which creates a 32-character string; split // it, which turns it into 32 1-character strings, make an arrayref of the resulting list, and push the arrayref on @2dmatrix. That results in:

    • 32 string bodies, each at least 2 bytes (for "0\0" or "1\0") although perl might decide to use more to avoid reallocations if the strings grow: 64 bytes.
    • 32 SVPVs (scalar variables containing strings, 28 bytes each on 32-bit, 40 bytes each on 64-bit): 896 or 1280 bytes.
    • 1 array body with 32 entries: 128 bytes on 32-bit, 256 bytes on 64-bit.
    • 1 AV (array variable): 28 bytes on 32-bit, 40 bytes on 64-bit.
    • 1 SVRV (scalar containing a reference): 16 bytes on 32-bit, 24 bytes on 64-bit.
    • 1 entry in @2dmatrix's array body: 4 bytes on 32-bit, 8 bytes on 64-bit.

    With a result of 1136 bytes per 4 bytes (284x multiplication) on 32-bit and 1672 bytes per 4 bytes (418x multiplication) on 64-bit, not accounting for constant factors and the fact that perl might choose to use larger string bodies (on two versions of perl I tested here, I got either 10 or 16 bytes, not 2.) As such your program will use upwards of 1.1GB of memory for a 4.2MB input on a 32-bit system, and upwards of 1.7GB of memory for a 4.2MB input on a 64-bit system.

    The solution here is to store and access the data in a more efficient way, but I can't give any specific advice because you haven't said what you're actually trying to do with @2dmatrix once you have it.