sql sqlite postgresql aggregate-functions sql-limit

Select elements where sum of field is less than N

Given this table:

# select * from messages;

id	verbosity
1	20
2	20
3	20
4	30
5	100

(5 rows) I would like to select N messages for which sum of verbosity is lower than N. So if N = 70, the desired result will be messages with id 1, 2 and 3. It's important the solution is database-independent. It should work at least on PostgreSQL and SQLite.

Something like:

SELECT * FROM messages GROUP BY id HAVING SUM(verbosity) < 70;

doesn't sum all values from the verbosity column.

Solution

SELECT m.id, sum(m1.verbosity) AS total
FROM   messages m
JOIN   messages m1 ON m1.id <= m.id
WHERE  m.verbosity < 70    -- optional, to avoid pointless evaluation
GROUP  BY m.id
HAVING SUM(m1.verbosity) < 70
ORDER  BY total DESC
LIMIT  1;

This assumes a unique, ascending id like you have in your example.

In modern Postgres - or generally with modern standard SQL (but not in SQLite):

Simple CTE

WITH cte AS (
   SELECT *, sum(verbosity) OVER (ORDER BY id) AS total
   FROM   messages
   )
SELECT *
FROM   cte
WHERE  total < 70
ORDER  BY id;

Recursive CTE

Should be faster for big tables where you only retrieve a small set.

WITH RECURSIVE cte AS (
   (  -- parentheses required
   SELECT id, verbosity, verbosity AS total
   FROM   messages
   ORDER  BY id
   LIMIT  1
   )

   UNION ALL 
   SELECT c1.id, c1.verbosity, c.total + c1.verbosity 
   FROM   cte c
   JOIN   LATERAL (
      SELECT *
      FROM   messages
      WHERE  id > c.id
      ORDER  BY id
      LIMIT  1
      ) c1 ON  c1.verbosity < 70 - c.total
   WHERE c.total < 70
   )
SELECT *
FROM   cte
ORDER  BY id;

All standard SQL, except for LIMIT.

Strictly speaking, there is no such thing as "database-independent". There are various SQL-standards, but no RDBMS complies completely. LIMIT works for PostgreSQL and SQLite (and some others). Use TOP 1 for SQL Server, rownum for Oracle. Here's a comprehensive list on Wikipedia.

The SQL:2008 standard would be:

...
FETCH  FIRST 1 ROWS ONLY

... which PostgreSQL supports - but hardly any other RDBMS.

The pure alternative that works with more systems would be to wrap it in a subquery and

SELECT max(total) FROM <subquery>

But that is slow and unwieldy.

db<>fiddle here
_{Old sqlfiddle}