Search code examples
sqlpostgresqlaggregate

Skip result row if value is missing in group


I've a Postgres table like this:

datetime tenant_id orders_today
2023-06-25 10:00 tenant2 2
2023-06-25 10:00 tenant1 1
2023-06-25 11:00 tenant1 5
2023-06-25 11:00 tenant2 2
2023-06-25 12:00 tenant1 5

Note that a orders_today for tenant2 hasn't yet been generated for time 12:00.

I use a query like this to summaries orders today:

SELECT datetime, SUM(orders_today)
FROM orders
GROUP BY datetime

But this gives me this result:

datetime sum
2023-06-25 10:00 3
2023-06-25 11:00 7
2023-06-25 12:00 5

How can I make it ignore the group for time 12 where a count for tenant 2 is missing? And, if possible, can I make it use the previous value for tenant 2 from time 11?


Solution

  • can I make it use the previous value for tenant 2 from time 11?

    Assuming:

    • (datetime, tenant_id) is the PRIMARY KEY. So: UNIQUE and bth columns NOT NULL.
    • You want results for a given time range.
    • You want results for a given set of tenants ('tenant1' and 'tenant2' in the example).
    • One row for every full hour in the time range is expected.
    • Substitute for missing rows with the value for the same tenant from the previous hour.
    • If we can't get orders for each tenant this way, omit the row.

    Generate all combinations of interest from tenant & timestamp, LEFT JOIN to your table, LEFT JOIN to a possible substitute, then proceed:

    SELECT datetime, sum(orders) AS sum_orders_today
    FROM  (
       SELECT d.datetime, COALESCE(o.orders_today, o1.orders_today) AS orders
       FROM   generate_series(timestamp '2023-06-25 10:00'
                            , timestamp '2023-06-25 13:00'
                            , interval '1 hour') AS d(datetime)    -- given time range
       CROSS  JOIN (VALUES ('tenant1'), ('tenant2')) t(tenant_id)  -- given tenants
       LEFT   JOIN orders o USING (tenant_id, datetime)            -- regular data
       LEFT   JOIN orders o1 ON o1.tenant_id = t.tenant_id         -- substitutes
                            AND o1.datetime  = d.datetime - interval '1 hour'
       ) sub
    GROUP  BY 1
    HAVING count(orders) = 2  -- omit hours without full data set
    ORDER  BY 1;
    

    fiddle

    Similar:

    If my assumptions don't hold, you have to be more specific.