Search code examples
databaserediskey-value-store

Sorted Key/Value Database solution


I'm looking to implement a database solution to support very fast column based access on a set of simple 2D datasets. i.e consider this dataset

==========================================================
                     SOME DATASET1
==========================================================
   ENTRY     |    Col1   |   Col2  |    Col3    ... Coln
----------------------------------------------------------
   ENTRY A        1.1        0.2         5.5       6.2
   ENTRY B        2.3        6.4         1.5       1.1
   ENTRY C        2.2        4.2         9.5       3.4
   ENTRY D        2.3        1.1         5.5       2.9
   ENTRY E        9.1        3.6         7.5       2.6

What I need is a means of simply selecting all values in column1, or column2, or column n, while preserving the sort order. My initial idea was to use redis, with the following keyspace design:

   SOMEDS1/COLUMNS/           =>     Col1, Col2, Col3 ... Coln
   SOMEDS1/ENTRIES/           =>     A, B, C, D, E
   SOMEDS1/Col1/              =>     1.1, 2.3, 2.2, 2.3, 9.1
   SOMEDS1/Coln/              =>     ......

The principle behind this design is that the number of entries in each list are not large, perhaps < 10,000 but there may be a lots of columns, and only selected columns are ever needed at a given time.

My question is has anyone implemented anything like this already, and if so can you advice on the most appopriate type of database. My initial thoughts were to go with redis but I'm open to suggestions.


Solution

  • You did not specify if you need local or remote access to your data store. If you need remote access, then Redis is probably a very good solution. If your access is purely local, an embedded database (such as BerkeleyDB) will probably be more efficient.

    The main point is to define how the data are maintained: can new entries only be added at the end of the data structure or not? If yes, Redis lists will fly to store your columns. If not, keeping your data unsorted in a hash object (associated entry and value) per column is probably better. If the number of entries is low, sorting the data after retrieval on client side is cheap anyway.

    This design is similar to the implementation you can find in some columnar databases. The main benefit of this approach is the system can compress the values for a given column with a high-compression ratio, which is interesting when the volume of data is large. The downside is real-time maintenance of the data is difficult. For examples with MySQL, you may want to have a look at Infobright or Calpont products.

    In your case, if the volume of data is limited, Redis is a good fit. But please note the representation of these data in memory will not be especially compact (involving pointers, double-linked lists and/or hash-tables) when the number of entries becomes significant (i.e. more than the thresholds described here).