Search code examples
kqlazure-data-explorerkusto-explorer

Percentile Changing Day Level to Week Level || KQL || ADX || Kusto


Based on my previous post below url, I have prepared aggregate table from source raw table using materialized view.

Materialized view over materialized view || ADX || KQL || Kusto

Calculating Percentile value on raw data from my source table which is different from aggregate table.

Below is percentile calculating on raw data or source table.

enter image description here

Below is percentile calculating on aggregate data which is from raw data or source table. stg01 is aggregate table in my real time scenario which is Materialized view from source table.

enter image description here

Below is code for calculating Percentile from source table

let MySourceTable = datatable(col1:int, col2:int, Value:int, EventUTCDateTime:datetime)
[
1,2,10,'2022-01-01 01:00:01',
1,2,20,'2022-01-01 02:01:01',
1,2,30,'2022-01-01 03:05:01',
1,2,40,'2022-01-02 01:30:01',
1,2,20,'2022-01-02 02:30:01',
1,2,50,'2022-01-03 07:30:01',
1,2,50,'2022-01-05 06:30:01'
];
MySourceTable
| extend EventUTCDate = bin(EventUTCDateTime,30m)
| project col1,col2,Value,EventUTCDate
| summarize RecordsCount=count(), TotalValue=sum(Value), Min=min(Value), Max=max(Value),Median=percentile(Value,50) by col1,col2
| project col1,col2, Mean=TotalValue/RecordsCount, Min, Max,Median

Below is code for calculating Percentile from aggregated table

let MySourceTable = datatable(col1:int, col2:int, Value:int, EventUTCDateTime:datetime)
[
1,2,10,'2022-01-01 01:00:01',
1,2,20,'2022-01-01 02:01:01',
1,2,30,'2022-01-01 03:05:01',
1,2,40,'2022-01-02 01:30:01',
1,2,20,'2022-01-02 02:30:01',
1,2,50,'2022-01-03 07:30:01',
1,2,50,'2022-01-05 06:30:01'
];
let stg01 = MySourceTable
| extend EventUTCDate = bin(EventUTCDateTime,1d)
| project col1,col2,Value,EventUTCDate
| summarize RecordsCount=count(), TotalValue=sum(Value), Min=min(Value), Max=max(Value),Median=percentile(Value,50) by col1,col2,EventUTCDate;
stg01
| summarize RecordsCount=sum(RecordsCount), TotalValue=sum(TotalValue),Min=min(Min), Max=max(Max),Median=percentile(Median,50) by col1,col2
| project col1,col2, Mean=TotalValue/RecordsCount, Min, Max,Median 

Need suggestions and inputs how to mitigate this issue, I want to present same percentile value from both source table and aggregate table.


Solution

  • I think you're looking for tdigest() - see using hll and tdigest. If you preserve the tdigest from the source table, you can calculate the 2nd aggregation based on that, instead of over the percentile.