Search code examples
perlmultidimensional-arrayhashinitialization

Perl's 2D hash initialization - do we have any faster way?


I am looking for a fast 2D hash initialization. I know some recommended use undef, but I need it to be 1. I have tested 4 different flavors as below (please ignore the naming, as I am bad at names) Surprisingly, for_2d_init_array has the fastest execution.

But, i am still not satisfy with it, as when the $limit=4000, it requires roughly 4-5s for complete initialization. Is there any faster way to do so?

I know there is a bit-vector for such application. For now, I would like to maintain such structure.

timethese(-2, {
       for_2d_original => sub {
                                my %hash;
                                my $limit = 100;
                                for(my $i=0; $i<$limit; $i++){
                        for(my $j=0; $j<$limit; $j++){
                                $hash{$i}{$j}=1;
                                }
                                }
                           },
       for_2d_map => sub {
                                my %hash;
                                my $limit = 100;
                                for(my $i=0; $i<$limit; $i++){
                                        $hash{$i} = {map { $_ => 1 } 0..$limit-1};
                                }
                           },
       for_2d_init_hash => sub {
                                my %hash;
                                my $limit = 100;
                                for(my $i=0; $i<$limit; $i++){
                                        my %tmp_hash;
                                        for(my $j=0; $j<$limit; $j++){
                                                $tmp_hash{$j} = 1;
                                        }
                                        $hash{$i} = \%tmp_hash;
                                }
                           },
       for_2d_init_hash_w_array => sub {
                                my %hash;
                                my $limit = 100;
                                my @array = (0..$limit-1);
                                my @init = (1)x$limit;
                                for(my $i=0; $i<$limit; $i++){
                                        my %tmp_hash;
                                        @tmp_hash{@array} = @init;
                                        $hash{$i} = \%tmp_hash;
                                }
                           },
});

Results: for_2d_original: 3 wallclock secs ( 2.10 usr + 0.00 sys = 2.10 CPU) @ 751.90/s (n=1579) for_2d_map: 2 wallclock secs ( 2.18 usr + 0.00 sys = 2.18 CPU) @ 559.17/s (n=1219) for_2d_init_hash: 2 wallclock secs ( 2.15 usr + 0.00 sys = 2.15 CPU) @ 994.42/s (n=2138) for_2d_init_hash_w_array: 2 wallclock secs ( 2.12 usr + 0.00 sys = 2.12 CPU) @ 1580.19/s (n=3350)


Solution

  • I took a hint from Armali and replace with 2D array and the results is great. I know this is no longer a 2D hash, I consider this structure as fine, as I can just replace {} to [] for the remaining of the legacy code the system have. Thanks all. (initially, it is a variable sizes of hash in each rows. ) For completeness, I included the code and its results.

    timethese(-2, {
       for_2d_original => sub {
                                my %hash;
                                my $limit = 100;
                                for(my $i=0; $i<$limit; $i++){
                        for(my $j=0; $j<$limit; $j++){
                                $hash{$i}{$j}=1;
                                }
                                }
                           },
       for_2d_map => sub {
                                my %hash;
                                my $limit = 100;
                                for(my $i=0; $i<$limit; $i++){
                                        $hash{$i} = {map { $_ => 1 } 0..$limit-1};
                                }
                           },
       for_2d_init_hash => sub {
                                my %hash;
                                my $limit = 100;
                                for(my $i=0; $i<$limit; $i++){
                                        my %tmp_hash;
                                        for(my $j=0; $j<$limit; $j++){
                                                $tmp_hash{$j} = 1;
                                        }
                                        $hash{$i} = \%tmp_hash;
                                }
                           },
       for_2d_init_hash_w_array => sub {
                                my %hash;
                                my $limit = 100;
                                my @array = (0..$limit-1);
                                my @init = (1)x$limit;
                                for(my $i=0; $i<$limit; $i++){
                                        my %tmp_hash;
                                        @tmp_hash{@array} = @init;
                                        $hash{$i} = \%tmp_hash;
                                }
                           },
       for_2d_init_array => sub {
                                my $limit = 100;
                                my @init;
                                $init[$_] = [(1)x$limit] for 0..$limit-1;
                           }, 
    });
    

    Results:

    for_2d_original:  2 wallclock secs ( 2.12 usr +  0.00 sys =  2.12 CPU) @ 744.81/s (n=1579)
    for_2d_map:  2 wallclock secs ( 2.19 usr +  0.00 sys =  2.19 CPU) @ 556.62/s (n=1219)
    for_2d_init_hash:  2 wallclock secs ( 2.21 usr +  0.00 sys =  2.21 CPU) @ 978.28/s (n=2162)
    for_2d_init_hash_w_array:  2 wallclock secs ( 2.15 usr +  0.00 sys =  2.15 CPU) @ 1558.14/s (n=3350)
    for_2d_init_array:  2 wallclock secs ( 2.10 usr +  0.00 sys =  2.10 CPU) @ 6018.57/s (n=12639)