Search code examples
cassandralog-analysisbigdatadatabasenosql

Ideal database for grouping data by timestamp


I'm in the process of testing some noSql solutions for handling some basic log analytics. I'm looking for something that is optimized for reads. The data has a timestamp and some other columns that I want to count and sum. I need the ability to group and sum on Year, Month, day, hour and the values of some of the other columns. My data will likely be operating at above about 50 million records, and likely from a single server (no sharding, or horizontal scaling required), but a RESTful API is handy for tying into other applications easily.

I'm currently trying out couchDB, but would like to know if there's something more suited for this task.

I can probably improve this map and overall performance, but wanted to check some other options.

function(doc) {
  ts = doc.timestamp.split(/[^A-Z0-9\_]+/i)
  emit([ts[0],ts[1],ts[2],ts[3],ts[4], doc.eventtype,doc.name],1);
}

I'm not using relation databases, because entries vary in the data they have based on the event type, and I want to be able to handle the data dynamically, rather than having to update the schema every time a new event type is logged.


Solution

  • Use a Time Series Database which would be designed for this kind of data persistence.