Search code examples
bigdatadata-sciencedata-analysisgrafanainfluxdb

Which DB to use for comparing courses of data by days?


I'm currently thinking about a little "BigData" Project where I want to record some utilizations every 10 minutes and write them to a DB over several month or years. I then want to analyze the data e.g. in these ways:

  • Which time of the day is best (in terms of a low utilization)?
  • What are the differences in utilization between normal weekdays and days on the weekend?
  • At what time does the higher part of the utilization begin on a normal monday?

For this I obviously need the possibility to build averaged graphs for e.g. all mondays that where recorded so far.

For the first "proof of concept" I set up a InfluxDB and Grafana which works quite fine for seeing the data being written to the DB, but the more I research on the internet the more I see that InfluxDB is not made for what I want to do (or it can not do it yet).

So which Database would be best to record and analyze data like that? Or is it more like a question about which tool to use to analyze the data? Which tool could that be?


Solution

  • InfluxDB query language is not flexible enough for your kind of questions. SQL databases supported by Grafana (MySQL, Postgres, TimescaleDB, Clickhouse) seem to fit better.The choice depends on your preferences and amount of your data. For smaller datasets pure MySQL & Postgres may be enough. For higher loads consider TimescaleDB. For billions of datapoints Clickhouse is a probably better.

    If you want a lightweight but scalable NoSQL timeseries solution have a look at VictoriaMetrics.