Search code examples
kqlazure-monitor-workbooksazure-monitor

How can I improve KQL query for large dataset for heatmap


I have a KQL query below which will provide a real nice heatmap to map out top access by country for Azure WAF.

The challenge here is that this query cannot go beyond 24 hours as the number of records I have way too big. How can i improve this to even display like weekly and monthly stats ?

// source: https://datahub.io/core/geoip2-ipv4
set notruncation;
let CountryDB=externaldata(Network:string, geoname_id:string, continent_code:string, continent_name:string, country_iso_code:string, country_name:string)
[@"https://datahub.io/core/geoip2-ipv4/r/geoip2-ipv4.csv"]
| extend Dummy=1;
let AppGWAccess = AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayAccessLog"
| where userAgent_s !in ("bot")
| project TimeGenerated, clientIP_s;
AppGWAccess
| extend Dummy=1
| summarize count() by Hour=bin(TimeGenerated,6h), clientIP_s,Dummy
| partition by Hour(
                  lookup (CountryDB|extend Dummy=1) on Dummy
                | where ipv4_is_match(clientIP_s, Network)
                )
| summarize sum(count_) by country_name

Solution

  • What you're doing is creating hourly aggregations over all the data. Instead, you should create a Materialized View that will do the aggregations in the background for you.

    Quoting the documentation:

    Materialized views expose an aggregation query over a source table. Materialized views always return an up-to-date result of the aggregation query (always fresh). Querying a materialized view is more performant than running the aggregation directly over the source table, which is performed each query.