Search code examples
perlperl-data-structures

Perl Multi hash vs Single hash


I want to read and process sets of input from a file and then print it out. There are 3 keys which I need to use to store data. Assume the 3 keys are k1, k2, k3

Which of the following will give better performance

$hash{k1}->{k2}->{k3} = $val;

or

$hash{"k1,k2,k3"} = $val;

For my previous question I got the answer that all perl hash keys are treated as strings.


Solution

  • Hash lookup speed is independent of the number of items in the hash, so the version which only does one hash lookup will perform the hash lookup portion of the operation faster than the version which does three hash lookups. But, on the other hand, the single-lookup version has to concatenate the three keys into a single string before they can be used as a combined key; if this string is anonymous (e.g., $hash{"$a,$b,$c"}), this will likely involve some fun stuff like memory allocation. Overall, I would expect the concatenation to be quick enough that the one-lookup version would be faster than the three-lookup version in most cases, but the only way to know which is faster in your case would be to write the same code in both styles and Benchmark the difference.

    However, like everyone else has already said, this is a premature and worthless micro-optimization. Unless you know that you have a performance problem (or you have historical performance data which shows that a problem is developing and will be upon you in the near future) and you have profiled your code to determine that hash lookups are the cause of your performance problem, you're wasting your time worrying about this. Hash lookups are fast. It's hardly a real benchmark, but:

    $ time perl -e '$foo{bar} for 1 .. 1_000_000'
    real    0m0.089s
    user    0m0.088s
    sys 0m0.000s
    

    In this trivial (and, admittedly, highly flawed) example, I got a rate equivalent to roughly 11 million hash lookups per second. In the time you spent asking the question, your computer could have done hundreds of millions, if not billions, of hash lookups.

    Write your hash lookups in whatever style is most readable and most maintainable in your application. If you try to optimize this to be as fast as possible, the wasted programmer time will be (many!) orders of magnitude larger than any processing time that you could ever hope to save with the optimizations.