Search code examples
azuremetadatapartitioningazure-data-explorer

How to find the partition metadata of an extent in Azure Data Explorer


See this for background: Azure Data Explorer: How do Partitioning Policy and Merge Policy work?

I have created a table with a Partitioning policy based on both a string and timestamp column. The performance gain for my queries is huge with this policy:

  "PartitionKeys": [
    {
      "ColumnName": "sensor_id",
      "Kind": "Hash",
      "Properties": {
        "Function": "XxHash64",
        "MaxPartitionCount": 30,
        "Seed": 1,
        "PartitionAssignmentMode": "Uniform"
      }
    },
    {
      "ColumnName": "timestamp",
      "Kind": "UniformRange",
      "Properties": {
        "Reference": "1970-01-01T00:00:00",
        "RangeSize": "15.00:00:00",
        "OverrideCreationTime": true
      }
    }
  ],

Using .show database extents partitioning statistics I am also sure that I have 100% PartionedRowPercentage. However, for a 15-day period I get 79 extents and the extents is not near the max size set by the merging and sharding policy. I am therefore trying to find the metadata for a particular extent.

My first question is if anyone know of a command that show the partion metadata for an extent.

My second question is that when I run the following query, the output per sensor ID is widely different, but shouldn't each sensor_id in a particular extent have the same output of the XxHash64 function?

T
| where extent_id() == "e1ccecab-522c-48ce-801f-5b173add04e7"
| distinct sensor_id
| project sensor_id, hashed= hash_xxhash64(sensor_id, 30)
| sort by hashed asc  

Solution

  • My first question is if anyone know of a command that show the partion metadata for an extent.

    there isn't an official way to show this (nor is the partition metadata format contractual).

    that said, given an extent, you can run something like this, to see the partition key values - if the extent is partitioned, you should get a single partition key value.

    for 2 partition keys:

    TableName
    | where extent_id() == '<some_guid>'
    | summarize row_count = count()
      by 
        hash_partition_value = __hash_xxh64(<hash_partition_column_name>, <max_partition_count>, <seed>),
        datetime_partition_value = bin_at(<datetime_partition_column_name>, <range_size>, <reference>)
    

    for a policy with 2 partition keys (hash & uniform datetime range):

    TableName
    | where extent_id() == '<some_guid>'
    | summarize row_count = count()
      by 
        hash_partition_value = __hash_xxh64(<hash_partition_column_name>, <max_partition_count>, <seed>),
        datetime_partition_value = bin_at(<datetime_partition_column_name>, <range_size>, <reference>)
    

    for a policy with only a hash partition key:

    TableName
    | where extent_id() == '<some_guid>'
    | summarize row_count = count()
      by hash_partition_value = __hash_xxh64(<hash_partition_column_name>, <max_partition_count>, <seed>)
    

    for a policy with only a uniform range datetime partition key:

    TableName
    | where extent_id() == '<some_guid>'
    | summarize row_count = count()
      by datetime_partition_value = bin_at(<datetime_partition_column_name>, <range_size>, <reference>)
    

    all variables encapsulated in <,> can be found in the policy you've defined.

    My second question is that when I run the following query, the output per sensor ID is widely different, but shouldn't each sensor_id in a particular extent have the same output of the XxHash64 function?

    in a partitioned extent, all records belong to the same hash/datetime partition, for the arguments the extents was partitioned according to. i.e. __hash_xxh64(<hash_partition_column_name>, <max_partition_count>, <seed>) should have the same value for all records in the extent.

    if you see differently, that means the extent is either not partitioned, or partitioned according to a different policy you may have previously set (e.g. check if you forgot to modify the EffectiveDateTime of your current policy, to re-partition previously-partitioned extents that were partitioned by the different policy you had set)