Search code examples
azureazure-timeseries-insights

Azure Time Series (TSI) initial considerations and best practices


My apologies for the bad title!

I am in the initial phase of designing an Azure Time Series solution and I have run into a number of uncertainties. The background for getting into TSI is that we currently have a rather badly designed cosmos db which contains close to 1TB of IoT data and it is growing by the minute. By "badly" I mean that the partition key was designed in such a manner that we do not have control of the size of the partitions. Knowing that there is a limit of 10GB(?) pr partition key, we will soon run out of space and need to come up with a new solution. Also, when running historical queries on the cosmos db, it does not respond within an acceptable time frame. Any experiments with throughput calculations and changes does not improve the response time to an accepted time frame.

We are in the business of logging IoT time series data including metadata from different sensors. We have a number of clients which have from 30 to 300 sensors each - smaller and larger clients. At the client side the sensors are grouped into locations and sub-locations.

An example of an event could be something like this:

{
  deviceId,
  datetime,
  clientId,
  locationId,
  sub-locationId,
  sensor,
  value,
  metadata{}
}

Knowing how to better design a partition key in CosmosDB, would the same approach as described below be considered as a good practice in TSI when composing the TimeSeriesId?

  • In a totally different cosmosdb solution we have included eventDate.datepart(YYYY-MM) as a part of the partition key to stop it from growing out of bounds and to better predict the response time on queries within one partition.

Or will TSI handle time series data differently thus making the datepart in TimeSeriesId obsolete?

Having TSI API queries in mind, should I consider the simpicity of the composed TimeSeriesId as well? The TimeSeriesId has to be provided in the body of each API request - as far as I can tell, and when composing a query in a back-end service I do have access to all our clients id's and location/sub-location id's. And these are more accesible than the deviceId's

And finally, when storing IoT data for multiple clients would it be best practice to provision a new TSI solution for each client or does TSI support collections as seen in CosmosDB?


Solution

  • As stated in this article, when using composite key, you will need to query against all the key properties, and not against one or some of them. That's a consideration when deciding for a single key or composite key. Also, as it states in the article, as tip,

    If your event source is an IoT hub, your Time Series ID will likely be iothub-connection-device-id.

    So, I assume you will have at least one IoT Hub sourcing the events reported from the devices, and in this case you can use the iothub-connection-device-id.