data-structures caching performance dynamic-languages

Cache-oblivious data structures and dynamic languages - effective?

I've been reading recently about cache-oblivious data structures like auxiliary buffer heaps. These data structures work by keeping their most-recently-accessed elements in cache memory so any subsequent access is also quicker.

Most of these data structures are implemented with a low-level language like C/C++. Is it worth it to try to port these data structures over to a dynamic language like Python, or does the overhead of running on a virtual machine ruin all the performance benefits of these data structures? It seems like the latter, but I thought I would ask to see if someone actually has some experience with it.

Solution

In C (or C++) you have fine-grained control over the exact size of each data structure. You also have the possibility of fine-grained control over storage allocation. You can, after all, extend the new method, use malloc directly and otherwise structure memory to create spatial locality.

In most dynamic languages (like Python) you have no control at all over the exact size of anything, much less it's location.

In Python you may have some temporal locality, but you have little or no spatial locality.

Temporal locality might be enhanced through simple memoization of results. This is such a common speed-up that people often include a memoization decorator to uncouple the memoization (temporal locality) from the core algorithm.

I don't think that C or C++ cache-oblivious implementations translate to dynamic languages because I don't think you have enough control. Instead, just exploit memoization for speed-ups.

http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize