CREATE TABLE inventory_box (
box_id varchar(10),
value integer
);
INSERT INTO inventory_box VALUES ('1', 10), ('2', 15), ('3', 20);
I prepared a sql fiddle with the schema.
I would like to select a list of inventory boxes with combined value of above 20
possible result 1. box 1 + box 2 (10 + 15 >= 20)
Here is what I am doing right now:
SELECT * FROM inventory_box LIMIT 1 OFFSET 0;
-- count on the client side and see if I got enough
-- got 10
SELECT * FROM inventory_box LIMIT 1 OFFSET 1;
-- count on the client side and see if I got enough
-- got 15, add it to the first query which returned 10
-- total is 25, ok, got enough, return answer
I am looking for a solution where the scan will stop as soon as it reaches the target value
Your question update clarifies that your actual requirements are much simpler than a full-blown "subset sum problem" as suspected by Ulrich:
Just fetch rows until the sum is big enough.
I sort by box_id
to get deterministic results. You might even drop the ORDER BY
altogether to get any valid result a bit faster, yet.
WITH RECURSIVE i AS (
SELECT *, row_number() OVER (ORDER BY box_id) AS rn
FROM inventory_box
)
, r AS (
SELECT box_id, val, val AS total, 2 AS rn
FROM i
WHERE rn = 1
UNION ALL
SELECT i.box_id, i.val, r.total + i.val, r.rn + 1
FROM r
JOIN i USING (rn)
WHERE r.total < 20
)
SELECT box_id, val, total
FROM r
ORDER BY box_id;
FOR
loopUsing sum()
as window aggregate function (cheapest).
CREATE OR REPLACE FUNCTION f_shop_for(_total int)
RETURNS TABLE (box_id text, val int, total int)
LANGUAGE plpgsql STABLE AS
$func$
BEGIN
total := 0;
FOR box_id, val, total IN
SELECT i.box_id, i.val
, sum(i.val) OVER (ORDER BY i.box_id) AS total
FROM inventory_box i
LOOP
RETURN NEXT;
EXIT WHEN total >= _total;
END LOOP;
END
$func$;
Call:
SELECT * FROM f_shop_for(20);
I tested both with a big table of 1 million rows. The function only reads the necessary rows from index and table. The CTE is very slow, seems to scan the whole table ...
Aside: sorting by a varchar
column (box_id
) with numeric data yields dubious results. Should be a numeric type, really?