Search code examples
azure-data-explorer

What is the storage limit in Azure Data Explorer and what does it depend on?


I could not find it in the official documentation and service limits also keep silence about this aspect.

  1. What is the maximum size of a database in Azure Data Explorer?
  2. What is the maximum total size of all databases in the cluster?
  3. Where exactly is my data stored? Is it on the cluster nodes HDDs?

Solution

  • There is no hard limit on the amount of data that can be ingested into a database. Ingested data is persisted to limitless durable storage (Azure blob storage). Based on the databases' effective caching policy - ingested data can be cached on the cluster's nodes' local SSDs.

    The total amount of data that can fit in a single cluster's hot cache depends on the number of nodes and the chosen SKU - if, for example, the maximum SSD size per VM is 4TB, and the maximum number of nodes in a cluster is 1000, then you can have up to 4000TB of (compressed) data available in the hot cache.

    Data compression ratio varies based on the schema and data. For example, if the compression ratio is 10, in the example above you can cache up to 40PB of data, while still having additional data queryable from cold storage (not cached).

    Whether or not it makes sense to store and cache so much data in a single, and very large, cluster - greatly depends on the scenario and workloads running against the cluster.