Search code examples
azure-application-insightskql

Kusto query - Return top 5 per day by category


I'm trying to count each ocurrence of "name" by "headsection" and "day"

Let's say I have the following table structure (a small snippet):

Timestamp Headsection Name
01/01/2021 1 A
01/01/2021 2 AA
01/01/2021 3 AAA
01/01/2021 1 B
01/01/2021 2 BB
01/01/2021 3 BBB
01/01/2021 1 C
01/01/2021 2 CC
01/01/2021 3 CCC
01/01/2021 1 D
01/01/2021 2 CC
01/01/2021 3 DDD
01/01/2021 1 E
01/01/2021 2 DD
01/01/2021 3 EEE
01/01/2021 1 A
01/01/2021 2 EE
01/01/2021 3 DDD

In essence, I want to count the top 5 names by each day by headsection

So with 3 headsections - Each day in a year should contain 15 rows

For visualising, I want to summarize the table as such:

Timestamp Headsection Name Name_count
01/01/2021 1 A 2
01/01/2021 1 B 1
01/01/2021 1 C 1
01/01/2021 1 D 1
01/01/2021 1 E 1
01/01/2021 2 CC 2
01/01/2021 2 AA 1
01/01/2021 2 BB 1
01/01/2021 2 DD 1
01/01/2021 2 EE 1
01/01/2021 3 DDD 2
01/01/2021 3 AAA 1
01/01/2021 3 BBB 1
01/01/2021 3 CCC 1
01/01/2021 3 EEE 1

I've set the query to

|where timestamp between (startofday(datetime(2021-01-01)) .. endofday(now()))

Which means that the query should be able to turn an input table to the output table for each day up until now.

In example, the following 15 rows should be 01/02/2021 (January 2nd), with top 5 "names" that day by headsection.

I'm almost new to KQL, so I could really need some help!

I've tried experimented with top-nested and the summarize operator, but I can't seem to make it work.


Solution

  • This should do the trick:

    let NumItemsByDayAndHeadsection = 5;
    datatable(Timestamp:datetime, Headsection:long, Name:string) [
        datetime(2021-01-01), 1, "A",
        datetime(2021-01-01), 2, "AA",
        datetime(2021-01-01), 3, "AAA",
        datetime(2021-01-01), 1, "B",
        datetime(2021-01-01), 2, "BB",
        datetime(2021-01-01), 3, "BBB",
        datetime(2021-01-01), 1, "C",
        datetime(2021-01-01), 2, "CC",
        datetime(2021-01-01), 3, "CCC",
        datetime(2021-01-01), 1, "D",
        datetime(2021-01-01), 2, "DD",
        datetime(2021-01-01), 3, "DDD",
        datetime(2021-01-01), 1, "E",
        datetime(2021-01-01), 2, "EE",
        datetime(2021-01-01), 3, "EEE",
        datetime(2021-01-01), 1, "A",
        datetime(2021-01-01), 2, "EE",
        datetime(2021-01-01), 3, "DDD"
    ]
    | summarize NameCount = count() by Timestamp, Headsection, Name
    | order by Headsection asc, NameCount desc
    | summarize make_list(Timestamp, NumItemsByDayAndHeadsection), make_list(Name, NumItemsByDayAndHeadsection), make_list(NameCount, NumItemsByDayAndHeadsection) by Timestamp, Headsection
    | mv-expand list_Timestamp, list_Name, list_NameCount
    | project Timestamp, Headsection, Name = list_Name, NameCount = list_NameCount
    

    Output:

    Timestamp Headsection Name NameCount
    2021-01-01 00:00:00.0000000 1 A 2
    2021-01-01 00:00:00.0000000 1 B 1
    2021-01-01 00:00:00.0000000 1 C 1
    2021-01-01 00:00:00.0000000 1 D 1
    2021-01-01 00:00:00.0000000 1 E 1
    2021-01-01 00:00:00.0000000 2 EE 2
    2021-01-01 00:00:00.0000000 2 AA 1
    2021-01-01 00:00:00.0000000 2 BB 1
    2021-01-01 00:00:00.0000000 2 CC 1
    2021-01-01 00:00:00.0000000 2 DD 1
    2021-01-01 00:00:00.0000000 3 DDD 2
    2021-01-01 00:00:00.0000000 3 AAA 1
    2021-01-01 00:00:00.0000000 3 BBB 1
    2021-01-01 00:00:00.0000000 3 CCC 1
    2021-01-01 00:00:00.0000000 3 EEE 1