Search code examples
time-seriesriakinfluxdb

Riak TS/InfluxDB limits on number of series


We are considering using either Riak TS or InfluxDB as a time series storage for a use-case where we can have hundreds of millions of series. Each series will have a small number of writes over time, either hourly or daily writes. Number of datapoints per serie is also going to be low. Queries will probably also have low complexity.

While investigating both, we found that InfluxDB has some limitations on the number of series it can handle, and therefore might not be a valid solution.

I cannot find information on this restriction for Riak TS. I imagine that, because it is built on top of Riak KV's core, it doesn't have such an hard limitation, but I'd like to be sure.

Is InfluxDB still a valid solution when considering that the number of datapoints per series is going to be low. Does Riak TS suffer of the same limitations?


Solution

  • Riak TS doesn't have these limitations indeed, so you can freely use it. Also RiakTS scales very well. Actually it works best when in cluster, so you should probably start with 3 boxes. You can configure replication factor and a lot of settings.

    You say that your queries will have low complexity, so RiakTS built-in query features will be more than enough.

    RiakTS allows you to configure the size of "quanta", which will make your RiakTS instance more read- or write-oriented. In your case however, if your traffic is low and you don't have a lot of complex queries, I wouldn't worry about that.

    One think to keep in mind is that Riak TS doesn't keep track of the series names, so you'll have to either have series names that you can compute ( like _ ), or have a separate DB to store, list and lookup the series names. If that's an issue for you, I can give you more info/tips/examples on how to get that working.

    If you want to stay on the open source side, I don't think InfluxDB will work well for you. If you pay the enterprise version of InfluxDB it might work, as deniszh said, but you would be forced to go cluster and scale up just to be able to store more series, not because your traffic requires it.

    Some examples of InfluxDB: https://www.reddit.com/r/Database/comments/2nw9k0/practical_limits_of_influxdb/

    You might want to be interested by DalmatinerDb ( https://dalmatiner.io/ ) as it is based on some of the same technologies than RiakTS, but provides series names storing and indexing for you; it's also said to be faster. It seems however to require a more complex setup to get it up and running. Also it's very new.