I need to handle a data table (2D) of a maximum of 100 million rows and 30 columns. The table elements contain only 0s, 1s and dashes (-). The table is given to me in a text file. I have to store this data into some data structure and perform operations like sorting the table, deleting rows, comparing elements (both in different rows and columns), etc.
Currently I am using a 2D vector for this. I tried using an integer vector and a bit vector. Both work for 10mil rows and 25 columns but neither works for the above said maximum limit (I have a bad alloc).
I am assuming that this is because the vector needs contiguous memory. What would be the right data structure for this? Timing is also one of my considerations and would want a low access time as well. Also, if a vector is actually the right data structure, is there a better implementation I could use to make this work? I have to stick to C++ and can't use databases and stuff.
Thanks a lot!
Check out Nstate.
It is a library for Packed Arrays for an arbitrary radix (so tristates qualify).
Disclaimer: I wrote nstate.