Search code examples
mergepartitioningazure-data-explorer

Why is extents with the same partition metadata not merging in Azure Data Explorer


I am optimizing a database with a string based (hashed) and time based (uniform) partitioning policy. To optimize query performance, I am investigating the setting "MaxPartitionCount" for the hash partition key (https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/partitioningpolicy).

When choosing the default setting of 128 bins, I end up getting 128*2 extents per each uniform range datetime partition. I would expect to only get 128 extents per each uniform range datatime partiton as each pair of extents with the same partition metadata satisfy the conditions to merge, set by the merge and sharding policy. Any suggestion why this is the case?

Minimal example where I only use uniform range partitioning policy and small part of our dataset:

Partitioning policy:

  "EffectiveDateTime" : "1970-01-01T00:00:00",
  "PartitionKeys": [
    {
      "ColumnName": "timestamp",
      "Kind": "UniformRange",
      "Properties": {
        "Reference": "1970-01-01T00:00:00",
        "RangeSize": "01.00:00:00",
        "OverrideCreationTime": true
      }
    }

In this case I get two extents per day, as seen in the numbers below where sizes are in bytes.

OriginalSize ExtentSize CompressedSize RowCount MaxCreatedOn Column B
87058761 8022944 7932142 838350 2023-10-26T23:52:30Z 2023-10-26T07:10:00Z
720604715 60644931 59245697 6920424 2023-10-26T23:59:50Z 2023-10-26T00:00:00Z

Comparing the numbers to the merge policy shown below i dont understand why these two extents don't merge?

  "RowCountUpperBoundForMerge": 16000000,
  "OriginalSizeMBUpperBoundForMerge": 30000,
  "MaxExtentsToMerge": 100,
  "LoopPeriod": "01:00:00",
  "MaxRangeInHours": 48,
  "AllowRebuild": true,
  "AllowMerge": true,
  "Lookback": {
    "Kind": "All",
    "CustomPeriod": null
  },
  "ShardEngineMaxExtentSizeInMb": 8192,

Solution

  • the constraints for whether or not 2 shards can be merged together are partially controlled by documented policies (e.g. sharding & merge), however there are additional ones that are internal to the system and are not documented nor contractual.

    but, surely, the shards being in the same partition and aligning with values in the sharding & merge policies isn't sufficient, i.e. these aren't the only constraint (as your question implies your expectation is).