Search code examples
jsondatabase-designrelational-databaseiotnon-relational-database

Best IoT Database?


I have many IoT devices sending data currently to MySQL Database.

I want to port it to some other Database, which will be Open Source and provide me with:

  • JSON support
  • Scalability
  • Flexibility to add multiple columns automatically as per payload
  • Python and PHP Support
  • Extremely Fast Read, Write
  • Ability to export at least 6 months of data in CSV format

Solution

  • Shaping your database based on input data is a mistake. Think of tomorrow your data will be CSV or XML, in a slight different format. Design your database based on your abstract data model, normalize it and apply existing data to your model. Shape your structure based on what input you have and what output you plan to get. If you retrieve the same content as the input, storing data in files will be sufficient, you don't need a database.

    Also, you don't want to store "raw" records the database. Even if your database can compose a data record out of the raw element at run time, you cannot run a selection based on a certain extracted element, without visiting all the records.

    Most of the databases allow you to connect from anywhere (there is not such thing as better support of PostgreSQL in Java as compared to Python, but the quality and level of standardization for drivers may vary). The question is what features shall your driver support. For example, you may require support for bulk import (don't issue large INSERT sets to the database).

    What you actually look for is:

    • scalability: can your database grow with your data? Would the DB benefit of adding additional CPUs (MySQL particularly doesn't for large queries). Can you shard your database on multiple instances? (MySQL again fails to handle that).
    • does your model looks like a snowflake? If yes, you may consider NoSQL, otherwise stay away of it. If you manage to model as a snowflake (and this means you are open for compromises) you may use anything like Lucene based search products, Mongo, Cassandra, etc. The fact you have timeseries doesn't qualify you for NoSQL. For example, you may have 10K devices issuing 5k message types. Specific data is redundantly recorded at device level and at message type level. In that case, because of the n:m relation, you don't have the snowflake anymore.
    • why do you store the data? What queries are you going to issue?