Search code examples
elasticsearchtime-seriesinfluxdb

Influxdb(single node) scaling to ~200 writes per second


What is the maximum number of points that can be written to influxdb (single node) per second? Is it feasible to scale influxdb without going for the paid cluster? And should I consider elasticsearch instead of influxdb for time series data (~3000 bytes/sec/user) if I am expecting around 60 concurrent users?


Solution

  • Depends on hardware.

    Limiting factors are

    • Cardinality of series in the DB (total unique series)
    • WAL disk throughput (this could be put on tmpfs if you don't have SSD)
    • Data disk throughput (use SSD for best results)
    • RAM (more is better)
    • CPU for ingestion, indexing and queries

    How far a single node can go largely depends on these and on the workload.

    For write-heavy workloads of low cardinality, CPU generally tends to run out faster than anything else, assuming SSDs are used and disk I/O has been optimised accordingly.

    After that, cardinality is the biggest limiting factor. Schema design plays a huge role, much bigger than number of nodes.

    From some benchmarks I have done, a single node easily scales to ~70K series per second, with CPU being the limiting factor. This was on an old version though, likely higher than that now. Again, largely depends on data and schema design.

    It is feasible to scale it without paid cluster by adding separate nodes, but not if you want to keep a homogeneous view (single source of all your data). Scaling vertically (more CPU, RAM) works only as long as cardinality remains consistent, meaning more data points for roughly same number of series.

    InfluxDB suggest up to 250K writes / second with 25 queries per second on up to 1M unique queries is feasible on a single node. See hardware guidelines.

    For the amount of data you have single node is more than enough - size of data does not matter, number of series does. Avoid elasticsearch for time series data - needs much more infrastructure to handle same amount of data.