For an IoT project with our customer we work with multiple factories sending data from multiple devices to the Azure IoT hub. We extract the last package every minute for each factory and device and store this data in a CosmosDb. Goal is to build reporting on the various parameters for every device over time.
The document we store in CosmosDb is around 1Kb. If we take 100 factories into account, each containing 100 devices, and every minute 1 package is stored for these devices, then after one year we will have collected around 5Tb of data.
This is quite expensive storage. The data that is stored is readonly and will not be updated. So I was exploring the analytical store to offload the data. The storage of the analytical store is 10x cheaper.
I have enabled the synapse link on my CosmosDb and container. I created a synapse workspace and connected to the CosmosDb as a connected service.
Now I tried to query the data using the SQL Serverless pool. But even a simple query showing only 1 or 2 columns of only a few records is taking several seconds to return.
So I am wondering if this is the normal behaviour? If this is the case, then the analytical store doesn't seem to be a good candidate for doing live reporing in a webportal.
For our particular use case we ended up with using Azure Data Explorer. This resource is perfect for ingesting timebased IoT telemetry data. Querying 25 million records, performing grouping (on timespans) and aggregation on these groups is done in less than 0.3 seconds.