Search code examples
sqlpostgresqldatetimesubquerygreatest-n-per-group

How to sort and group data in request?


I have got table.

ID       section_name     purchaseNumber    docPublishDate  parsing_status
88940   notifications   0373200124515000035   2015-02-27       null
88939   notifications   0373200124515000035   2015-02-29       null
88938   notifications   0373200124515000034   2015-02-27       null
88687   notifications   0373100064615000120   2015-02-28       null
88937   notifications   0373100064615000120   2015-02-27       null
89122   notifications   0373100064615000120   2015-02-27       null
88936   notifications   0373200124515000032   2015-02-27       null
88988   notifications   0373100064615000120   2015-03-02       null
88696   notifications   0373100066915000038   2015-02-27       null
88963   notifications   0373200174515000013   2015-02-27       null

It have some duplicates in purchaseNumber. I need select from this table only newest records for processing. In other words I need to:

  1. group elements by purchaseNumber
  2. sort them by docPublishDate and select newest.
  3. if any of elements in group is have status parsing_status true (suppose it was newest from step 2) any elements SHOULD NOT be returned.

After processing parsing_status flag is changing to true.

docPublishDate - TimeStamp

The items with old dates should not be selected if at last one newest was processed. I tried to do some grouping. But code simply return me all data:

SELECT 
  "id", "section_name", "purchaseNumber", "docPublishDate", "parsing_status"
  FROM "xml_files" WHERE parsing_status IS NULL GROUP BY "purchaseNumber", "id", "section_name", "docPublishDate", "parsing_status" ORDER BY "docPublishDate" DESC 

Solution

  • I understand that you want the latest row per purchase, excepted those whose parsing status is true. One option uses distinct on in a subquery to get the latest element, and then filters in an outer query:

    select *
    from (
        select distinct on (purchasenumber) x.*
        from xml_files x
        order by purchasenumber, docpublishdate desc
    ) x
    where parsing_status is distinct from true