I have a table in Kusto that has some duplicates, which I can determine as each unique record will have a unique Key
column. I figured a simple way to delete duplicates would be with a command like this:
.delete table TABLENAME records <| TABLENAME
| summarize MaxIngestionTime = max(ingestion_time()) by Key
| join kind=inner (TABLENAME) on Key
| where ingestion_time() != MaxIngestionTime
However this fails because .delete
doesn't allow summarize
in the predicate. I tried getting around this by doing the summarize
in a separate query and materializing it like so:
let maxingest = materialize(TABLENAME
| summarize MaxIngestionTime = max(ingestion_time()) by Key);
.delete table TABLENAME records <| TABLENAME
| join kind=inner (maxingest) on Key
| where ingestion_time() != MaxIngestionTime
But this doesn't work either as it says that it can't resolve table or column expression "maxingest". Is there any other way to accomplish this?
This is a possible solution that requires sorting the table by Key and then ingestion_time()
and the assigning each record an index form 1 to n where n
is the number of duplicates that you get. The challenge is that you can't introduce new columns and are limited by the specific functions you can call.
Here is a sample table creation script where you ingest duplicated keys into the table:
.create-merge table SampleTest (
Key: int,
Content: string
);
.append SampleTest <|
datatable(Key: int, Content: string) [
1, "a",
2, "b"
]
.append SampleTest <|
datatable(Key: int, Content: string) [
1, "c",
2, "d"
]
Here is how you delete the duplicated records, keeping the latest ones only:
.delete table SampleTest records <|
SampleTest
| sort by Key, ingestion_time() desc
| where row_cumsum(1,prev(Key) !=Key) > 1
Here is what is happening:
Key
, and then the ingestion_time()
in descending orderwhere
clause and use the row_cumsum
function to create an index variable. This variable increments by 1 for every record, and resets to 1 if the Key
changes. Only the latest record will have a value of 1 so the duplicates are deleted.