Search code examples
oracle11gsqr

Returning a TABLE like structure in the SELECT


I am working with SQR to put together a report. I can't change the structure of the database, nor can I use PL/SQL to complete this task.

Since the report can be run from remote locations, I don't want to make multiple calls to the database from within SQR. My goal is to return everything in 1 SQL that includes only the records that I need to report to increase the run-time over a slow connection.

I have it working right now, but I'm concerned with the performance on the database.

The "transactions" table has the following fields that can be used for this purpose:

account_num number(10) -- the account number
seq_num number(10) -- not a real sequence, it is unique to account_num
check_num number(10) -- the number on the check
postdate date

The primary key is (account_num, seq_num)

Sample data looks like this:

account_num    seq_num  check_num   postdate
----------- ---------- ---------- ----------
          1         11        200 2014-07-13
          1         16        201 2014-07-14
          1         23        205 2014-07-15
          2         52        282 2014-07-13
          2         66        284 2014-07-14
          2         72        231 2014-07-15
          3         11        201 2014-07-13
          3         12        202 2014-07-14
          3         15        203 2014-07-15

Note: There are many other types of transactions in the table, but we are filtering the list on the type of transaction, which isn't very important for this question, so I left that out. The volume of transactions seems to average around 750,000 per month (for all transactions, not just checks), and out of that, on average, about 10,000 check transactions get reported.

The selection criteria is to return all check transactions occurring between two dates (inclusive - usually the first day of the month and the last day of the month) where the difference between any sorted check numbers for an account is greater than X (we will use 10 in this case).

Using the above sample data, the results look like this:

account_num    seq_num  check_num   postdate
----------- ---------- ---------- ----------
          2         52        282 2014-07-13
          2         66        284 2014-07-14
          2         72        231 2014-07-15

All checks from account_num 2 are returned because the difference between check_num 282 and 231 is greater than 10.

I built the following SQL to return the results above:

select
  t1.*
from
  transactions t1
join (
  select
    t3.account_num,
    t3.min_postdate,
    t3.max_postdate,
    max(t3.check_diff)
  from (
    select distinct
      t4.account_num,
      lead(t4.check_num, 1, t4.check_num) over (partition by t4.account_num order by t4.check_num) - t4.check_num as check_diff,
      min(t4.postdate) over (partition by t4.account_num) min_postdate,
      max(t4.postdate) over (partition by t4.account_num) max_postdate
    from
      transactions t4
    where
      t4.postdate between trunc(sysdate,'mm') and last_day(trunc(sysdate))) t3
  group by
    t3.account_num,
    t3.min_postdate,
    t3.max_postdate
  having max(t3.check_diff) > 10) t2
    on t1.account_num = t2.account_num
    and t1.postdate between t2.min_postdate and t2.max_postdate
;

I would like to return the seq_num of all the checks from t4 so I end up using the primary key on t1. I've tried using LISTAGG, which works to get the numbers together.

listagg(t4.seq_num,',') within group (order by seq_num) over (partition by account_num) sqe_nums

But this is where I'm stuck... using a comma delimited string. I can get it to work using INSTR but it can't use the primary key and the performance terrible.

instr(t1.seq_num || ',', t2.seq_nbrs || ',') > 0

And I tried joining to it this:

join (
  select
    t2.account_num,
    regexp_substr(t2.seq_nums,'[^,]+{1}',1,level) seq_num
  from
    dual
  connect by
    level <= length(regexp_replace(t2.seq_nums,'[^,]*')) + 1) t5
  on t1.account_num = t5. accout_num 
  and t1.sqe_num = t5.seq_num

But I should have known better (ORA-00904) - t2 will never be visible inside the select of the join.

Does anyone have any clever ideas?


Solution

  • I'd avoid the joins altogether by using subqueries and more analytic functions:

    select
      account_num, seq_num, check_num, postdate
    from
      (
        select account_num,
          seq_num,
          check_num,
          postdate,
          max(check_gap) over (partition by account_num) as max_check_gap
        from
          (
            select account_num,
              seq_num,
              check_num,
              postdate,     
              lead(check_num) over (partition by account_num order by check_num)
                - check_num as check_gap
            from
              transactions
            where postdate between trunc(sysdate,'mm') and last_day(trunc(sysdate))
        )
      )
    where
      max_check_gap > 10
    order by account_num, check_num;
    

    SQL Fiddle with you original query, an intermediate attempt that misread the 10-check gap rule, and this version. All give the same result for this data.

    This doesn't tackle the specific question you asked but hopefully addresses your underlying performance concern in a different way.


    If you did want to stick with the joins - which hits the table multiple times so will be less efficient - you could use collect. This is a rough way, the table access could probably be improved:

    select
      t1.*
    from
      transactions t1
    join (
      select
        t3.account_num,
        collect(t3.seq_num) as seq_nums,
        t3.min_postdate,
        t3.max_postdate,
        max(t3.check_diff)
      from (
        select distinct
          t4.account_num,
          t4.seq_num,
          lead(t4.check_num, 1, t4.check_num) over (partition by t4.account_num order by t4.check_num) - t4.check_num as check_diff,
          min(t4.postdate) over (partition by t4.account_num) min_postdate,
          max(t4.postdate) over (partition by t4.account_num) max_postdate
        from
          transactions t4
        where
          t4.postdate between trunc(sysdate,'mm') and last_day(trunc(sysdate))) t3
      group by
        t3.account_num,
        t3.min_postdate,
        t3.max_postdate
      having max(t3.check_diff) > 10) t2
        on t1.account_num = t2.account_num
        and t1.seq_num in (select * from table(t2.seq_nums))
    ;
    

    SQL Fiddle.