I'm working on a project written in C that's going to be deployed on vintage systems with very limited resources. The target system has at minimum a 16 MHz Motorola 68030 processor with a 256 byte L1 cache, and my application has a limit of 1 MB of memory, so everything I do needs to be as CPU and memory efficient as I can manage.
Further, I need to keep heap memory allocations to a minimum, because they're relatively slow and can result in memory fragmentation. Memory alignment is important too, which is why I'm primarily dealing with values that are 2 or 4 bytes, rather than individual bytes.
In this app, I need several static associative arrays. Each array has the same key and value type: the key is a pair of two byte integers (which could be represented as a single four byte integer), and the value is a two byte integer. The data in all of these arrays is determined at compile time, and it will never change during runtime. So there's no need to add or remove keys, or change any values. The size of these arrays is between 100 and 1000 entries.
Given these specifications, how might I go about implementing such an array? What are my options in terms of trading off CPU cycles for memory, or vise versa?
Research so far:
One idea was to sort the data ahead of time and use a binary search to find a value, but I'd prefer to find something that runs in constant time. This has the advantage though of being easy to implement and calculate ahead of time.
I next looked into a hash table using a minimal perfect hash function. This makes sense because the data is static, it could be stored efficiently in contiguous memory, and compile time calculations can be done to determine the hash function. However, after doing more research, I found that the way such hash functions are usually implemented are not well suited for this use case.
That leaves non-minimal perfect hash functions, and the advantage of such a thing is that only the data needs to be stared in the hash table, not the keys as well. So with an efficient enough table, this looks like this might be the most memory efficient option available. But there's still the question of the hash function, and whether there's a good one that can be used on an old processor, especially since the modulo operator will be relatively slow. However, if it still ends up being faster on average than a binary search, this might be the best approach.
I figure I'll post my solution to this. I ended up implementing a hash table with the following binary structure:
The first two words are two 16 bit seed values. The next word is the length of the hash table / number of buckets, which must be a power of 2.
Following that is the hash table, an array of 16 bit values (since my value type if 16 bits). When looking up a value, I calculate the hash based on the two 16 bit keys, check the bucket for that hash value, and if the first bit is 0 then there were no collisions and its literal value can be returned.
However if the first bit is 1, then the other 15 bits specify an offset from the end of the hash table (expressed in number of words, or two byte increments) to look for values. This offset points to a list, whose structure is: a word specifying the length of the list, two 16 bit values for the two keys, and then the 16 bit value associated with those keys. I perform a linear search of these values to find the one that matches the target keys.
The hash is calculated with the following formula:
((key1 + 1) * seed1 + (key2 + 1) * seed2) & (bucket_count - 1)
I tried several different hash formulas, but this proved to be the most performant and effective on a 68030, even though multiplication is typically not as fast as addition or bitwise operations.
This structure has the advantage of being very memory efficient since key values aren't stored except in the case of collisions. Granted one bit of the value is sacrificed to be a flag indicating collisions, but in my use case the values I'm storing will never be larger than what can be represented by a 15 bit integer.
It's also easy to search. With the right seed values (which can be calculated at compilation time) most keys have no collisions and are therefore a quick calculation of the hash, followed by a lookup in the array. In the case of a collision, a linear search is performed, but usually there are no more than two or three values to search through.
During the compilation stage (when this structure is created), I wrote a script that runs through several different combinations of seed values to find the ones that produce the least collisions. Scanning through combinations of the first 20 prime numbers would reliably produce good seed values. I also try different bucket counts using the first power of two smaller than the number of key / value pairs, along with the two powers of two that are the next largest, to see which produces the smallest table.
This hash table implementation could be easily modified to support other sizes or types of keys, or different value sizes. A similar structure could in theory be used if the table needs to support writes as well, though that might require using linked lists for the collision values.