Search code examples
sqlpostgresqlselectgreatest-n-per-group

Can PostgreSQL return a list of values with latter values overwriting previous ones?


The table below represents decision events on the license of the given file.

I want to end up with a list of licenses of the file. If Removed column of the event is False, a license is added, but no more than one of each licenses for each file. If Removed column of the event is True, all the previous adding events for the particular license are overturned. An event made later it time takes precedence over previous ones.

For the events below, I want to return a list of [A, B]. License A has events add, remove, add, so it is returned. License B has events add, add, so it is returned, but not duplicated. License C has events add, add, remove, so it is not removed, as the latter removal nullifies both add-events.

Is this possible to achieve with a PostgreSQL query, or do I need to process data afterwards?

+------+---------+---------+
| Time | License | Removed |
+------+---------+---------+
|    1 | A       | False   |
|    2 | A       | True    |
|    3 | A       | False   |
|    4 | B       | False   |
|    5 | B       | False   |
|    6 | C       | False   |
|    7 | C       | False   |
|    8 | C       | True    |
+------+---------+---------+

Solution

  • If I understand it correctly you want the last event for every license. It is quite simple if you know how window functions work: you can partition your table using windows and then operate inside each window by sorting, etc. In this case you want to partition on the license, then sort on the time (descending) and finally select the most recent entry of each window:

    SELECT "License", "Removed" FROM (
        SELECT *, rank() OVER (PARTITION BY "License" ORDER BY "Time" DESC)) X
     WHERE rank = 1
    

    If you want to better understand how it works, try executing the inner SELECT by itself.