Search code examples
sqloracle-databaseoracle12ccumulative-sum

Select where cumulative sum is less than a number (in order of priority)


I have a table with id, cost, and priority columns:

create table a_test_table (id number(4,0), cost number(15,2), priority number(4,0));

insert into a_test_table (id, cost, priority) values (1, 1000000, 10);
insert into a_test_table (id, cost, priority) values (2, 10000000, 9);
insert into a_test_table (id, cost, priority) values (3, 5000000, 8);
insert into a_test_table (id, cost, priority) values (4, 19000000, 7);
insert into a_test_table (id, cost, priority) values (5, 20000000, 6);
insert into a_test_table (id, cost, priority) values (6, 15000000, 5);
insert into a_test_table (id, cost, priority) values (7, 2000000, 4);
insert into a_test_table (id, cost, priority) values (8, 3000000, 3);
insert into a_test_table (id, cost, priority) values (9, 3000000, 2);
insert into a_test_table (id, cost, priority) values (10, 8000000, 1);
commit;

select 
    id,
    to_char(cost, '$999,999,999') as cost,
    priority
from 
    a_test_table;
        ID COST            PRIORITY
---------- ------------- ----------
         1    $1,000,000         10
         2   $10,000,000          9
         3    $5,000,000          8
         4   $19,000,000          7
         5   $20,000,000          6
         6   $15,000,000          5
         7    $2,000,000          4
         8    $3,000,000          3
         9    $3,000,000          2
        10    $8,000,000          1

Starting with the highest priority (descending), I want to select the rows where the cost adds up to less than (or equal to) $20,000,000.

The result would look like this:

       ID COST            PRIORITY
---------- ------------- ----------
         1    $1,000,000         10
         2   $10,000,000          9
         3    $5,000,000          8
         7    $2,000,000          4

      Total: $18,000,000

How can I do this with Oracle SQL?


Solution

  • Here is a way to do it in pure SQL. I won't swear there isn't a better way.

    Basically, it uses a recursive common table expression (i.e., WITH costed...) to compute every possible combination of elements totaling less than 20,000,000.

    Then it gets the first full path from that result.

    Then, it gets all the rows in that path.

    NOTE: the logic assumes that no id is longer than 5 digits. That's the LPAD(id,5,'0') stuff.

    WITH costed (id, cost, priority, running_cost, path) as 
    ( SELECT id, cost, priority, cost running_cost, lpad(id,5,'0') path
      FROM   a_test_table
      WHERE  cost <= 20000000
      UNION ALL 
      SELECT a.id, a.cost, a.priority, a.cost + costed.running_Cost, costed.path || '|' || lpad(a.id,5,'0')
      FROM   costed, a_test_table a 
      WHERE  a.priority < costed.priority
      AND    a.cost + costed.running_cost <= 20000000),
    best_path as (  
    SELECT *
    FROM   costed c 
    where not exists ( SELECT 'longer path' FROM costed c2 WHERE c2.path like c.path || '|%' )
    order by path
    fetch first 1 row only )
    SELECT att.* 
    FROM best_path cross join a_test_table att
    WHERE best_path.path like '%' || lpad(att.id,5,'0') || '%'
    order by att.priority desc;
    
    +----+----------+----------+
    | ID |   COST   | PRIORITY |
    +----+----------+----------+
    |  1 |  1000000 |       10 |
    |  2 | 10000000 |        9 |
    |  3 |  5000000 |        8 |
    |  7 |  2000000 |        4 |
    +----+----------+----------+
    

    UPDATE - Shorter version

    This version uses MATCH_RECOGNIZE to find all the rows in the best group following the recursive CTE:

    WITH costed (id, cost, priority, running_cost, path) as 
    ( SELECT id, cost, priority, cost running_cost, lpad(id,5,'0') path
      FROM   a_test_table
      WHERE  cost <= 20000000
      UNION ALL 
      SELECT a.id, a.cost, a.priority, a.cost + costed.running_Cost, costed.path || '|' || lpad(a.id,5,'0')
      FROM   costed, a_test_table a 
      WHERE  a.priority < costed.priority
      AND    a.cost + costed.running_cost <= 20000000)
      search depth first by priority desc set ord
    SELECT id, cost, priority
    FROM   costed c 
    MATCH_RECOGNIZE (
      ORDER BY path
      MEASURES
        MATCH_NUMBER() AS mno
      ALL ROWS PER MATCH
      PATTERN (STRT ADDON*)
      DEFINE
        ADDON AS ADDON.PATH = PREV(ADDON.PATH) || '|' || LPAD(ADDON.ID,5,'0')
        )
    WHERE mno = 1
    ORDER BY priority DESC;
    

    UPDATE -- Even shorter version, using clever idea from the SQL*Server link the OP posted

    *Edit: removed use of ROWNUM=1 in anchor part of recursive CTE, since it depended on the arbitrary order in which rows would be returned. I'm surprised no one dinged me on that. *

    WITH costed (id, cost, priority, running_cost) as 
    ( SELECT id, cost, priority, cost running_cost
      FROM   ( SELECT * FROM a_test_table
               WHERE  cost <= 20000000
               ORDER BY priority desc
               FETCH FIRST 1 ROW ONLY )
      UNION ALL 
      SELECT a.id, a.cost, a.priority, a.cost + costed.running_Cost
      FROM   costed CROSS APPLY ( SELECT b.*
                                  FROM   a_test_table b 
                                  WHERE  b.priority < costed.priority
                                  AND    b.cost + costed.running_cost <= 20000000
                                  FETCH FIRST 1 ROW ONLY
                                  ) a
    )
    CYCLE id SET is_cycle TO 'Y' DEFAULT 'N'
    select id, cost, priority from costed
    order by priority desc