Search code examples
sqlsqlitepostgresqlaggregate-functionssql-limit

Select elements where sum of field is less than N


Given this table:

# select * from messages;
id verbosity
1 20
2 20
3 20
4 30
5 100

(5 rows) I would like to select N messages for which sum of verbosity is lower than N. So if N = 70, the desired result will be messages with id 1, 2 and 3. It's important the solution is database-independent. It should work at least on PostgreSQL and SQLite.

Something like:

SELECT * FROM messages GROUP BY id HAVING SUM(verbosity) < 70;

doesn't sum all values from the verbosity column.


Solution

  • SELECT m.id, sum(m1.verbosity) AS total
    FROM   messages m
    JOIN   messages m1 ON m1.id <= m.id
    WHERE  m.verbosity < 70    -- optional, to avoid pointless evaluation
    GROUP  BY m.id
    HAVING SUM(m1.verbosity) < 70
    ORDER  BY total DESC
    LIMIT  1;
    

    This assumes a unique, ascending id like you have in your example.


    In modern Postgres - or generally with modern standard SQL (but not in SQLite):

    Simple CTE

    WITH cte AS (
       SELECT *, sum(verbosity) OVER (ORDER BY id) AS total
       FROM   messages
       )
    SELECT *
    FROM   cte
    WHERE  total < 70
    ORDER  BY id;
    

    Recursive CTE

    Should be faster for big tables where you only retrieve a small set.

    WITH RECURSIVE cte AS (
       (  -- parentheses required
       SELECT id, verbosity, verbosity AS total
       FROM   messages
       ORDER  BY id
       LIMIT  1
       )
    
       UNION ALL 
       SELECT c1.id, c1.verbosity, c.total + c1.verbosity 
       FROM   cte c
       JOIN   LATERAL (
          SELECT *
          FROM   messages
          WHERE  id > c.id
          ORDER  BY id
          LIMIT  1
          ) c1 ON  c1.verbosity < 70 - c.total
       WHERE c.total < 70
       )
    SELECT *
    FROM   cte
    ORDER  BY id;
    

    All standard SQL, except for LIMIT.

    Strictly speaking, there is no such thing as "database-independent". There are various SQL-standards, but no RDBMS complies completely. LIMIT works for PostgreSQL and SQLite (and some others). Use TOP 1 for SQL Server, rownum for Oracle. Here's a comprehensive list on Wikipedia.

    The SQL:2008 standard would be:

    ...
    FETCH  FIRST 1 ROWS ONLY
    

    ... which PostgreSQL supports - but hardly any other RDBMS.

    The pure alternative that works with more systems would be to wrap it in a subquery and

    SELECT max(total) FROM <subquery>
    

    But that is slow and unwieldy.

    db<>fiddle here
    Old sqlfiddle