Search code examples
azurecloudpartitioningazure-data-exploreradx

Azure Data Explorer: How do Partitioning Policy and Merge Policy work?


In our ADX cluster there is no partitioning policy and no merge policy on a table, but the adx still creates extents. I am confused how this works and what the default settings are. Does anyone know this?

Further, how do a combination of partition keys work? For example I have

{
  "PartitionKeys": [
    {
      "ColumnName": "tenant_id",
      "Kind": "Hash",
      "Properties": {
        "Function": "XxHash64",
        "MaxPartitionCount": 128,
        "Seed": 1,
        "PartitionAssignmentMode": "Uniform"
      }
    },
    {
      "ColumnName": "timestamp",
      "Kind": "UniformRange",
      "Properties": {
        "Reference": "2021-01-01T00:00:00",
        "RangeSize": "7.00:00:00",
        "OverrideCreationTime": false
      }
    }
  ]
}

This will create on every new tenant_id a partition within the next 7 days? But a limit is 128? Or how should I read this?

And what is the benefit of building this small extents based on partition policy when there is a merge policy which merge the small extents to a bigger one? Why not building a bigger one instant?

Thanks


what i did: searching docs and try to goole


Solution

  • In our ADX cluster there is no partitioning policy and no merge policy on a table, but the adx still creates extents

    if you ingest data, extents will be created (either immediately - if you use batch ingestion - or eventually - if you use streaming ingestion).

    a partitioning policy ('null' by default, rarely required to define it) will change how extents are partitioned, and a merge policy (defined by default, rarely required to change it) impacts how extents are merged.

    how do a combination of partition keys work? This will create on every new tenant_id a partition within the next 7 days? But a limit is 128? Or how should I read this?

    given the policy you included, extents in the table will be partitioned as follows:

    • all records for which the result of hash_xxhash64(tenant_id, 128) has the same value (a value between 0 and 127) and for which the result of bin_at(timestamp, 7d, datetime(2021-01-01T00:00:00)) has the same value - will be included in the same set of extents, and will have the same partition metadata.

    • afterwards, extents that have the same partition metadata (for both partition keys) may get merged together, until they reach optimum size (managed by the system). extents that have different partition metadata (for either partition key) can't be merged.

    what is the benefit of building this small extents based on partition policy when there is a merge policy which merge the small extents to a bigger one? Why not building a bigger one instant?

    I would recommend you go over the following posts/documents:

    1. Data partitioning in Kusto
    2. Extents overview
    3. Partitioning policy