I am optimizing a database with a string based (hashed) and time based (uniform) partitioning policy. To optimize query performance, I am investigating the setting "MaxPartitionCount" for the hash partition key (https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/partitioningpolicy).
When choosing the default setting of 128 bins, I end up getting 128*2 extents per each uniform range datetime partition. I would expect to only get 128 extents per each uniform range datatime partiton as each pair of extents with the same partition metadata satisfy the conditions to merge, set by the merge and sharding policy. Any suggestion why this is the case?
Minimal example where I only use uniform range partitioning policy and small part of our dataset:
Partitioning policy:
"EffectiveDateTime" : "1970-01-01T00:00:00",
"PartitionKeys": [
{
"ColumnName": "timestamp",
"Kind": "UniformRange",
"Properties": {
"Reference": "1970-01-01T00:00:00",
"RangeSize": "01.00:00:00",
"OverrideCreationTime": true
}
}
In this case I get two extents per day, as seen in the numbers below where sizes are in bytes.
OriginalSize | ExtentSize | CompressedSize | RowCount | MaxCreatedOn | Column B |
---|---|---|---|---|---|
87058761 | 8022944 | 7932142 | 838350 | 2023-10-26T23:52:30Z | 2023-10-26T07:10:00Z |
720604715 | 60644931 | 59245697 | 6920424 | 2023-10-26T23:59:50Z | 2023-10-26T00:00:00Z |
Comparing the numbers to the merge policy shown below i dont understand why these two extents don't merge?
"RowCountUpperBoundForMerge": 16000000,
"OriginalSizeMBUpperBoundForMerge": 30000,
"MaxExtentsToMerge": 100,
"LoopPeriod": "01:00:00",
"MaxRangeInHours": 48,
"AllowRebuild": true,
"AllowMerge": true,
"Lookback": {
"Kind": "All",
"CustomPeriod": null
},
"ShardEngineMaxExtentSizeInMb": 8192,
the constraints for whether or not 2 shards can be merged together are partially controlled by documented policies (e.g. sharding & merge), however there are additional ones that are internal to the system and are not documented nor contractual.
but, surely, the shards being in the same partition and aligning with values in the sharding & merge policies isn't sufficient, i.e. these aren't the only constraint (as your question implies your expectation is).