Search code examples
oracle-databasesubqueryquery-optimizationrownum

Oracle: Max, Partition by, or even rownum?


I am from SQL Server background, so my skill in Oracle is minimum. It looks like Partition by is better than max in terms of performance. Or do I use rownum to archive my result table?

I have the following table - TableW.

| P_TYPE      | TRX_DATE       | PROGRAM_NO | REF_NO    | SEQ_ID |  Select 
|-------------|----------------|------------|-----------|--------|
| 'Local'     | 2016/9/5 14:37 | C1         | null      | E1     |  Yes (latest in Sept 5)
| 'Local'     | 2016/9/5 14:36 | C1         | null      | E1     |
| 'Local'     | 2016/9/5 11:08 | C1         | null      | E1     |
|-------------|----------------|------------|-----------|--------|
| 'Local'     | 2016/9/2 15:16 | C1         | null      | E1     |  Yes (latest in Sept 2)
|-------------|----------------|------------|-----------|--------|
| 'Local'     | 2016/9/1 15:20 | C1         | null      | E1     |  Yes (latest in Sept 1)
| 'Local'     | 2016/9/1 14:33 | C1         | null      | E1     |
|-------------|----------------|------------|-----------|--------|
| '3rd Party' | 2016/9/4 18:00 | null       | D1        | E2     |  Yes
| '3rd Party' | 2016/9/4 17:55 | null       | D1        | E2     |

Here is what I want to get:

For column P_TYPE, if it is value of 'Local', use column PROGRAM_NO and SEQ_ID. Otherwise, use REF_NO and SEQ_ID. If value in column P_TYPE is same, check TRX_DATE. If column TRX_DATE indicates the same date, pick the one with latest time stamp. Another day? Another entry with latest time stamp.

| P_TYPE      | TRX_DATE       | PROGRAM_NO | REF_NO    | SEQ_ID |
|-------------|----------------|------------|-----------|--------|
| 'Local'     | 2016/9/5 14:37 | C1         | null      | E1     |  
| 'Local'     | 2016/9/2 15:16 | C1         | null      | E1     |  
| 'Local'     | 2016/9/1 15:20 | C1         | null      | E1     |  
| '3rd Party' | 2016/9/4 18:00 | null       | D1        | E2     |  

A script that I receive is to use SELECT MAX in the WHERE clause:

SELECT *
FROM TableW a
WHERE TRX_DATE = 
    CASE P_TYPE
        WHEN 'Local' THEN
            (SELECT MAX(TRX_DATE) FROM TableW
                WHERE PROGRAM_NO = a.PROGRAM_NO AND SEQ_ID = a.SEQ_ID)
        ELSE
            (SELECT MAX(TRX_DATE) FROM TableW
                WHERE REF_NO = a.REF_NO AND SEQ_ID = a.SEQ_ID)
    END
ORDER BY TRX_DATE desc, REF_NO ASC, SEQ_ID;

It does the job. However with some research, it seems like partition by is not that costly. Refer: Tune SQL statement with max subquery

I try to rewrite the query as:

SELECT *
FROM (
SELECT *,
    CASE P_TYPE 
        WHEN 'Local' THEN 
            MAX(TRX_DATE) OVER (PARTITION BY PROGRAM_NO, SEQ_ID)
        ELSE
            MAX(TRX_DATE) OVER (PARTITION BY REF_NO, SEQ_ID)
    END AS MAX_TRX_DATE
FROM TableW
WHERE P_TYPE = 'Local'
)
WHERE TRX_DATE = MAX_TRX_DATE

However, I get only this:

| P_TYPE      | TRX_DATE       | PROGRAM_NO | REF_NO    | SEQ_ID |
|-------------|----------------|------------|-----------|--------|
| 'Local'     | 2016/9/5 14:37 | C1         | null      | E1     |  

Any guideline please. If possible, please illustrate your suggestion with statistics. Thanks.

EDIT: It looks like using row_number and partition by will greatly reduce the execution plan and even the time?

| CASE             | OPERATION        | CARDINALITY | COST | LAST CR     | LAST ELAPSED  |
|                  |                  |             |      | BUFFER GETS | TIME          |
|------------------|------------------|-------------|------|-------------|---------------|
| 1 - max() in     | SELECT STATEMENT |             |  76  |             |               |
|     where clause | SORT (ORDER BY)  |      1      |  76  |     477     |      3602     |
|------------------|------------------|-------------|------|-------------|---------------|
| 2 - row_number   | SELECT STATEMENT |             |  18  |             |               |
|                  | SORT (ORDER BY)  |      8      |  18  |      53     |       607     |
|------------------|------------------|-------------|------|-------------|---------------|

Solution

  • For the Local rows you need to include the day when defining the window partition as all values for PROGRAM_NO, REF_NO are identical for those rows:

    select *
    from (
      SELECT *,
             CASE P_TYPE
               when 'Local' then 
                  row_number() over (partition by program_no, seq_id, trunc(trx_date) order by trx_date desc)
               else 
                  row_number() over (partition by ref_no, seq_id order by trx_date desc)
             end as rn
      FROM TableW a
    ) t
    where rn = 1;
    

    Online example: http://rextester.com/CZTY80559

    (The example uses Postgres, but apart from the different way of "ignoring" the time part of a timestamp, it will be the same in Oracle)