I have to store millions of entries in a database. Each entry is identified by a set of unique integer identifiers. For example a value may be identified by a set of 10 integer identifiers, each of which are less than 100 million.
In order to reduce the size of the database, I thought of the following encoding using a single 32 bit integer value.
Identifier 1: 0 - 100,000,000 Identifier 2: 100,000,001 - 200,000,000 . . . Identifier 10: 900,000,001 - 1,000,000,000
I am using Java. I can write a simple method to encode/decode. The user code does not have to know that I am encoding/decoding during fetch/store.
What I want to know is: what is the most efficient (fastest) and recommended way to implement such encoding/decoding. A simple implementation will perform a large number of multiplications/subtractions.
Is it possible to use shifts (or bitwise operations) and choose different partition size (the size of each segment still has to be close to 100 million)?
I am open to any suggestions, ideas, or even a totally different scheme. I want to exploit the fact that the integer identifiers are bounded to drastically reduce the storage size without noticeably compromising performance.
Edit: I just wanted to add that I went through some of the answers posted on this forum. A common solution was to split the bits for each identifier. If I use 2 bits for each identifier for a total of 10 identifiers, then my range of identifiers gets severely limited.
It sounds like you want to pack multiple integer values of 0...100m into a single 32bit Integer? Unless you are omitting important information that would allow to store these 0...100m values more efficiently, there is simply no way to do it.
ceil(log2(100m)) = 27bit, which means you have only 5 "spare bits".