Search code examples
sqlazureazure-sql-databaseazure-table-storage

Large Sensor Data best option. Tables SQL Vs Azure Table


I would like some advice on the best option, as I am having slow retrieval( over 30 seconds) of data on my web-based API.

I have multiple IoT Sensors(over 100), which will also be growing that TX updated sensor data to my IoT Hub, which then gets saved onto database or storage.

Previously I used to save all my sensor data to a SQL table, but as data grew, I found it was getting very slow, therefore I used Azure Table Storage. Each Sensor has got its own individual table storage, the partition key is the month and year(e.g. 202012), and the row key is a timestamp(e.g. 0002518033824243332546).

This proved to be much faster, as the amount of sensor data had reduced, as each sensor has its own table, but as the table grows for any particular sensor and I need to retrieve data across a longer period of time(1 month) this becomes very slow again. Each sensor TX's an update every 1 min, therefore each day produces 1400 records and 1 month would have about 44,640 records.

Is there any better solution for my requirement?

Would having an individual SQL table for each sensor be a good idea? How many tables can there be in SQL storage?

Thank You


Solution

  • You can consider using Azure SQL Database (premium tiers P6 or greater) to store IoT data coming from a cold path (storage accounts) or coming from hot paths that needs to be saved directly to the database.

    First consider to optimize data storage by using column stored indexes as mentioned here. With that you will compress data up to 20x and retrieve more data into memory quickly.

    CREATE TABLE Telemetry (
        _id bigint IDENTITY,
        sensorData nvarchar(max),
       INDEX cci CLUSTERED COLUMNSTORE
    )
    

    Second you can consider use In-Memory OLTP and JSON functions in Transact-SQL. With that you can treat data formatted as JSON as any other SQL data type, and extract values from the JSON text to use JSON data in the SELECT list or in a search predicate. As Columnstore-based tables are optimized for scans and aggregations rather than key lookup queries, you can also create computed columns based on JSON functions that will then expose as regular relational columns specific attributes within the original JSON column, simplifying query design and development. you can further optimize data retrieval by creating regular (row-based) non clustered indexes on computed columns to support critical queries and access paths. While these will slightly increase overall storage needs, they will help query processor to filter rows on key lookups and range scans, and can also help on other operations like aggregations and such. Notice that you can add computed columns and related indexes at any time.

    In case your JSON structure is stable and known upfront, the best option would be to design our relational schema to accommodate the most relevant attributes from JSON data, and leverage the OPENJSON function to transform these attributes to row fields when inserting new data. These will be fully relational columns (with optimized SQL data types) that can be used for all sort of retrieval and analytical purposes, from complex filtering to aggregations, and with that you just need to create proper indexes to support various access paths. However, you can still decide to keep the entire JSON fragment and store it in a VARCHAR(max) field in the same table if further processing may be needed.

    In case, you can save your IoT data as row fields with SQL Data types, consider using also time-based table partitioning. Have all your data in one table but partitioned by range of dates as explained here.