Small brainstorming here.
I search the most suitable solution for a distributed storage solution. I look for a efficient key/value storage, flat namespace, with minimum latency.
I plan to save small blob records, 1Ko or less. They are mostly produced/consumed records:
However some records may grow up to 10Mb, it's the maximum but must be possible.
The data must be serialize on disk.
My first one priority is a storage that can provide good response time on a really huge list of file, may be several hundred of millions.
Of course, with this number, I don't care about iterating over my files (I look for the functionality but don't care about performance, only for debug or maintenance).
And of course a solution that scale, without SPOF only better.
Must be Linux solutions and no Cloud allowed (private data).
I looked at Voldemort, Cassandra and HBase.
I check also Lustre and Ceph, but they're not key/value store.
CouchBase and MongoDB have terrible performance with persistence activated.
I'm running some tests but can't really launch solid benchmark just yet. If someone have some information about this solutions or know another product design for such workload?
Have you taken a look at in-memory data grids like Infinispan or Hazelcast? They have excellent scalability and are responsive but having 10Mb objects stored could be a problem if one day you would consider any processing on those entries. However Hazelcast for example allows tasks execution on same member of cluster that owns target entries thereby reducing amount of inter-member data flow.