Search code examples
azure-data-explorerkql

How do I render the count of multiple rows in Azure Data Explorer?


Input data:

Location
europe_w
asia_s
us_e
asia_e
us_w
us_c
asia_c

I want this result using a Kusto query in Azure Data Explorer:

Location Count
asia 3
europe 1
us 3

Can't seem to figure out how to count by a specific substring and output it as a new value showing the total count.


Solution

  • Code

    datatable(Location:string)
    [
    'europe_w',
    'asia_s',
    'us_e',
    'asia_e',
    'us_w',
    'us_c',
    'asia_c'
    ]
    | extend Region = substring(Location,0,indexof(Location,"_"))
    | summarize Count = count() by Region
    

    In this query, extend operator is used to extract the substring before the first underscore character (_) in the Location column to create a new column Region. substring function extracts the substring before the first underscore character. Finally, to count the occurrences of each Region summarize operator is used.

    The output of this query:

    Region Count
    europe 1
    asia 3
    us 3

    You can also use the below code.

    datatable(Location:string)
    [
        'europe_w',
        'asia_s',
        'us_e',
        'asia_e',
        'us_w',
        'us_c',
        'asia_c'
    ]
    |summarize count() by tostring(split(Location, "_")[0])
    

    This query gives the same expected result.

    Edit:

    how would you do if the input data was like this: europew, asias, use, asiae, usw, usc, asiac (no specific char that separates the string)?

    datatable(Location:string)
    [
      'europe_w',
      'asia_s',
      'us_e',
      'asia_e',
      'us_w',
      'us_c',
      'asia_c'
    ]
    | extend Region = case(
        Location contains "europe", "europe",
        Location contains "asia", "asia",
        Location contains "us", "us",
        "unknown"
      )
    | summarize Count = count() by Region
    | where Region != "unknown"
    

    This query uses the case() function to create a new column called Region that maps each location to its corresponding region. If the Location column contains the string europe, the Region column will be set to europe. If it contains the string asia, the Region column will be set to asia. If it contains the string us, the Region column will be set to us. You can extend this if you have multiple base locations. If it does not contain any of these strings, the Region column will be set to unknown.