I'm new to Kusto and trying to figure out how to de-dupe in KQL. I have a table in Kusto that stores account information. For some of the accounts, the table has 2 rows: one with updated_c == true and another row with updated_c == false. I would like to write a KQL script that would survive the row with updated_c == true in these cases of duplicate accounts. In other words, I'm trying to de-dupe this dataset where when there are 2 rows per each account, it would survive the one with updated_c == true. And if there are only one row per perspective account, it would survive it regardless of updated_c vlaues.
account Id | updated_c |
---|---|
12 | true |
12 | false |
would anyone be able to help me learn how to de-dupe this?
This should do it.
datatable
| extend updated_c_int = toint(updated_c)
| summarize arg_max(updated_c_int, *) by accountId
| project accountId, updated_c
extend updated_c_int = toint(updated_c)
: Convert the boolean updated_c
to an integer. true
becomes 1 and false
becomes 0.summarize arg_max(updated_c_int, *) by accountId
: For each accountId
, get the row with the highest value of updated_c_int
. If there's a tie, arg_max
will return the first row it encounters.project accountId, updated_c
: Only select the columns accountId
and updated_c
in the final output.Given your example input, this query should return:
accountId updated_c
12 true