Search code examples
pythonpostgresqlsqlalchemysubquerycalculated-columns

Calculated boolean column showing if column is newest


I have a table with a structure that more or less (I've simplified it for the question) looks like the following:

id (P.K.) creation_ts some_field
1 2021-08-19 foo
2 2021-08-18 foo
3 2021-08-17 foo
4 NULL bar
5 2021-01-01 bar
6 2021-01-02 bar

And I'm trying build a query to show a calculated column that per each row with the same value for "some_field" (erm... grouped by the value of "some_field" if you may?) would add an extra column is_newest showing which row is the newest.

id (P.K.) creation_ts some_field is_newest
1 2021-08-19 foo TRUE
2 2021-08-18 foo FALSE
3 2021-08-17 foo FALSE
4 NULL bar FALSE
5 2021-01-01 bar FALSE
6 2021-01-02 bar TRUE

The intention of this is to create a SqlAlchemy Hybrid property so we could quickly query things like "get me the newest record WHERE some_field = 'foo'"

I imagine this must be some kind of CASE statement (at least that's what I gather from this other S.O. answer, which looks pretty promising) but the best thing I can come with is something like:

    @is_newest.expression
    def is_newest(cls):
        subq = sa.sql.exists().where(
            sa.and_(
                cls.id != cls.id,
                cls.some_field == cls.some_field,
                # Dirty trick: If there aren't newer records than this one,
                # then this must be the newest
                cls.creation_ts > cls.creation_ts,
            )
        )
        return sa.case([(subq, False)], else_=True).label("is_newest")

but no: that seems pretty wrong to me (and it isn't working, since some unit tests are failing), because it produces a query like...

SELECT table.id, table.creation_ts, table.some_field
 FROM table WHERE 
  CASE WHEN (EXISTS (
    SELECT * FROM table WHERE 
       table.id != table.id 
       AND table.some_field = table.some_field
       AND table.creation_ts > table.creation_ts)
    ) THEN False 
    ELSE True END IS true 
AND table.some_field = 'foo'

... which doesn't look right. Though to be honest, I'm not very sure what would "look right" (I'm quite a newbie with Postgres)

Any hint would be appreciated. Thank you in advance


Solution

  • You may use this window function query as native:

    select the_table.*, 
     coalesce(creation_ts = max(creation_ts) over (partition by some_field), false) is_newest
    from the_table;
    

    or (better) make a view out of it and then use the view instead of the table. Result:

    id|creation_ts|some_field|is_newest|
    --+-----------+----------+---------+
     1|2021-08-19 |foo       |true     |
     2|2021-08-18 |foo       |false    |
     3|2021-08-17 |foo       |false    |
     4|           |bar       |false    |
     5|2021-01-01 |bar       |false    |
     6|2021-01-02 |bar       |true     |
    

    Edit
    Added coalesce to is_newest expression.