Input data:
Location |
---|
europe_w |
asia_s |
us_e |
asia_e |
us_w |
us_c |
asia_c |
I want this result using a Kusto query in Azure Data Explorer:
Location | Count |
---|---|
asia | 3 |
europe | 1 |
us | 3 |
Can't seem to figure out how to count by a specific substring and output it as a new value showing the total count.
Code
datatable(Location:string)
[
'europe_w',
'asia_s',
'us_e',
'asia_e',
'us_w',
'us_c',
'asia_c'
]
| extend Region = substring(Location,0,indexof(Location,"_"))
| summarize Count = count() by Region
In this query, extend
operator is used to extract the substring before the first underscore character (_
) in the Location
column to create a new column Region
. substring
function extracts the substring before the first underscore character. Finally, to count the occurrences of each Region
summarize
operator is used.
The output of this query:
Region | Count |
---|---|
europe | 1 |
asia | 3 |
us | 3 |
You can also use the below code.
datatable(Location:string)
[
'europe_w',
'asia_s',
'us_e',
'asia_e',
'us_w',
'us_c',
'asia_c'
]
|summarize count() by tostring(split(Location, "_")[0])
This query gives the same expected result.
Edit:
how would you do if the input data was like this: europew, asias, use, asiae, usw, usc, asiac (no specific char that separates the string)?
datatable(Location:string)
[
'europe_w',
'asia_s',
'us_e',
'asia_e',
'us_w',
'us_c',
'asia_c'
]
| extend Region = case(
Location contains "europe", "europe",
Location contains "asia", "asia",
Location contains "us", "us",
"unknown"
)
| summarize Count = count() by Region
| where Region != "unknown"
This query uses the case()
function to create a new column called Region
that maps each location to its corresponding region. If the Location
column contains the string europe
, the Region
column will be set to europe
. If it contains the string asia
, the Region
column will be set to asia
. If it contains the string us
, the Region
column will be set to us
. You can extend this if you have multiple base locations. If it does not contain any of these strings, the Region
column will be set to unknown
.