I have a dataset in the following manner:
I want to remove duplicate timestamps for each 'id' such that for each timestamp the value of 'revision' is the highest, i.e, the final dataset should look like this:
I want to do this in the query editor of power BI. I have tried doing this for one particular 'id' and it works. I selected id=1001, then I sorted revision in descending order and added an index column in the table. Then I applied 'remove duplicates' on the timestamp column. This process works for one value of 'id'. I want to do this separately for each id such that I have each timestamp occur only once for any particular 'id'.
How do I do this?
In power query, you can use the "group by" feature, with some modification, to accomplish this. Here is the step you will need to add to do this:
= Table.Group(#"Previous Step", {"timestamp", "id"}, {{"revision", each List.Max([revision]), type nullable number}, {"value", each Table.Max(_, "revision")[value]}})
This is essentially telling power query to take the highest revision for each timestamp and id combination, and to also return the corresponding value.
Here is a detailed blog post of how this code works: https://www.ehansalytics.com/blog/2020/7/16/return-row-based-on-max-value-from-one-column-when-grouping
After sorting, here are the results: