Search code examples
powerbiduplicatescolumnsorting

How to remove duplicates in power BI grouping by a particular column?


I have a dataset in the following manner:

enter image description here

I want to remove duplicate timestamps for each 'id' such that for each timestamp the value of 'revision' is the highest, i.e, the final dataset should look like this:

enter image description here

I want to do this in the query editor of power BI. I have tried doing this for one particular 'id' and it works. I selected id=1001, then I sorted revision in descending order and added an index column in the table. Then I applied 'remove duplicates' on the timestamp column. This process works for one value of 'id'. I want to do this separately for each id such that I have each timestamp occur only once for any particular 'id'.

How do I do this?


Solution

  • In power query, you can use the "group by" feature, with some modification, to accomplish this. Here is the step you will need to add to do this:

    = Table.Group(#"Previous Step", {"timestamp", "id"}, {{"revision", each List.Max([revision]), type nullable number}, {"value", each Table.Max(_, "revision")[value]}})
    

    This is essentially telling power query to take the highest revision for each timestamp and id combination, and to also return the corresponding value.

    Here is a detailed blog post of how this code works: https://www.ehansalytics.com/blog/2020/7/16/return-row-based-on-max-value-from-one-column-when-grouping

    After sorting, here are the results:

    enter image description here