Search code examples
sqlpostgresql

Return rows where column value is final superset of preceding rows


I am using PostgreSQL DB and have use case where I need to return all the rows from table whose column value is superset of same column value of other preceding rows.

Example - the table has following rows:

-------------------------------------------
id   |   name       |   datetime
-------------------------------------------
1    |   N          |   2024-08-17 04:42:57 
2    |   NEW        |   2024-08-17 04:42:58    
3    |   NEW YOR    |   2024-08-17 04:42:58    
4    |   NEW YORK   |   2024-08-17 04:42:59    
5    |   W          |   2024-08-17 04:56:07    
6    |   WA         |   2024-08-17 04:56:07    
7    |   WASHI      |   2024-08-17 04:56:07    
8    |   WASHINGT   |   2024-08-17 04:56:08   
9    |   WASHINGTON |   2024-08-17 04:56:08   
10   |   NEW Y      |   2024-08-17 05:12:18   
11   |   NEW YORK   |   2024-08-17 05:12:18

I need SQL which can return following result against above table:

-------------------------------------------
id   |   name       |   datetime
-------------------------------------------
4    |   NEW YORK   |   2024-08-17 04:42:59    
9    |   WASHINGTON |   2024-08-17 04:56:08   
11   |   NEW YORK   |   2024-08-17 05:12:18

Solution

  • Your requirement can be translated to:

    "Take rows where the name in the next row does not start with the name of the current row."

    So:

    SELECT id, name, datetime 
    FROM  (
       SELECT *, lead(name, 1, '') OVER (ORDER BY id) AS next_name
       FROM   tbl
       ORDER  BY id
       ) sub
    WHERE  NOT (next_name ^@ name);
    

    fiddle

    Assuming rows are indeed sorted by id as you make it seem, and involved columns are defined NOT NULL.

    Two steps:

    1. In subquery sub, take the name of the next row according to your sort order with the window function lead(), let's call it next_name.
      To cover the corner case of the last row, default to the empty string (''), when there is no next row.

    2. In the outer SELECT, filter rows where next_name does not start with name. Should be exactly what you ask for.

    ^@ is the "starts with" operator added with Postgres 11. See:

    (Substitute with LIKE or ~ in older versions.)

    Other possible (more verbose) solutions:

    • a recursive CTEs
    • a procedural solution in a PL/pgSQL function.