Search code examples
oracle-databasegroup-byrow-number

row_number() gets wrong result in certain conditions


I have a SQL query that works perfectly in SQL Server, but it fails on Oracle and, in my opinion, it shouldn't.

This is the example to reproduce it:

CREATE TABLE TEST
   ( TEST_ID     NUMBER(37,0) NOT NULL,
     TEST_NAME   VARCHAR2(50 BYTE), 
     TEST_GROUP  VARCHAR2(20 BYTE), 
     CONSTRAINT TEST_PK PRIMARY KEY (TEST_ID) );

INSERT INTO TEST (TEST_ID, TEST_NAME) VALUES (1, 'TEST 1');
INSERT INTO TEST (TEST_ID, TEST_NAME, TEST_GROUP) VALUES (2, 'TEST 2', 'A');
INSERT INTO TEST (TEST_ID, TEST_NAME, TEST_GROUP) VALUES (3, 'TEST 3', 'B');
INSERT INTO TEST (TEST_ID, TEST_NAME, TEST_GROUP) VALUES (4, 'TEST 4', 'A');

This query returns the expected information:

SELECT TEST_GROUP, COUNT(R$), MIN(R$) R$_A, MAX(R$) R$_Z 
  FROM (
          SELECT MAIN.*, ROW_NUMBER() OVER (ORDER BY TEST_GROUP, TEST_ID) R$
            FROM ( SELECT TEST_ID, TEST_NAME, TEST_GROUP 
                   FROM TEST 
                   GROUP BY TEST_ID, TEST_NAME, TEST_GROUP 
                 ) MAIN
       ) MAIN
GROUP BY TEST_GROUP

It returns three TEST_GROUPS with the right calculations.

TEST_GROUP     COUNT(R$)    R$_A    R$_Z
-------------- --------- ------- -------
A                      2       1       2
B                      1       3       3
(null)                 1       4       4

Explain Plan:

OPERATION                OBJECT_NAME     CARDINALITY     COST 

SELECT STATEMENT                                   4        3       
SORT (GROUP BY NOSORT)                             4        3               
VIEW                                               4        3                       
WINDOW (NOSORT)                                    4        3                               
SORT (GROUP BY)                                    4        3                                       
TABLE ACCESS (FULL)      TEST                      4        3 

Other XML 
{info} 

info type="db_version" 
12.1.0.1 

info type="parse_schema" 
"BABTEC" 

info type="dynamic_sampling" 
2 

info type="plan_hash" 
1486410247 

info type="plan_hash_2" 
1249517352 

{hint} 

FULL(@"SEL$335DD26A" "TEST"@"SEL$3") 
NO_ACCESS(@"SEL$1" "MAIN"@"SEL$1") 
OUTLINE(@"SEL$3") 
OUTLINE(@"SEL$2") 
OUTLINE_LEAF(@"SEL$1") 
MERGE(@"SEL$3") 
OUTLINE_LEAF(@"SEL$335DD26A") 
ALL_ROWS 
DB_VERSION('12.1.0.1') 
OPTIMIZER_FEATURES_ENABLE('12.1.0.1') 
IGNORE_OPTIM_EMBEDDED_HINTS 

But if we change the sorting in the ROW_NUMBER (by changing from the default ASC to DESC) it does not:

SELECT TEST_GROUP, COUNT(R$), MIN(R$) R$_A, MAX(R$) R$_Z 
FROM (
        SELECT MAIN.*, ROW_NUMBER() OVER (ORDER BY TEST_GROUP **DESC**, TEST_ID) R$
        FROM ( SELECT TEST_ID, TEST_NAME, TEST_GROUP 
               FROM TEST 
               GROUP BY TEST_ID, TEST_NAME, TEST_GROUP 
             ) MAIN
     ) MAIN
GROUP BY TEST_GROUP;

It only returns a single group.

TEST_GROUP     COUNT(R$)    R$_A    R$_Z
-------------- --------- ------- -------
A                      4       1       4

Explain Plan:

OPERATION                OBJECT_NAME     CARDINALITY     COST 

SELECT STATEMENT                                   4        3       
HASH(GROUP BY)                                     4        3               
VIEW                                               4        3                       
WINDOW (NOSORT)                                    4        3                               
SORT (GROUP BY)                                    4        3                                       
TABLE ACCESS (FULL)      TEST                      4        3 

Other XML 
{info} 

info type="db_version" 
12.1.0.1 

info type="parse_schema" 
"BABTEC" 

info type="dynamic_sampling" 
2 

info type="plan_hash" 
1128091058 

info type="plan_hash_2" 
3776505473 

{hint} 

FULL(@"SEL$335DD26A" "TEST"@"SEL$3") 
NO_ACCESS(@"SEL$1" "MAIN"@"SEL$1") 
OUTLINE(@"SEL$3") 
OUTLINE(@"SEL$2") 
OUTLINE_LEAF(@"SEL$1") 
MERGE(@"SEL$3") 
OUTLINE_LEAF(@"SEL$335DD26A") 
ALL_ROWS 
DB_VERSION('12.1.0.1') 
OPTIMIZER_FEATURES_ENABLE('12.1.0.1') 
IGNORE_OPTIM_EMBEDDED_HINTS 

Notice that to reproduce the problem, it is required that the most internal query has a GROUP BY expression. If not, the result is the one we expect:

SELECT TEST_GROUP, COUNT(R$), MIN(R$) R$_A, MAX(R$) R$_Z 
FROM (
          SELECT MAIN.*, ROW_NUMBER() OVER (ORDER BY TEST_GROUP DESC, TEST_ID) R$
          FROM ( SELECT TEST_ID, TEST_NAME, TEST_GROUP 
                 FROM TEST ) MAIN
     ) MAIN
GROUP BY TEST_GROUP;

TEST_GROUP     COUNT(R$)    R$_A    R$_Z
----------------------------------------
(null)                 1       1       1
B                      1       2       2
A                      2       3       4

We are using Oracle Database 12c Release 12.1.0.1.0 - 64bit

There is a workaround for this problem that is add an ORDER BY clause after the GROUP BY, but this is only valid in Oracle, it fails in SQLServer. The query will be:

SELECT TEST_GROUP, COUNT(R$), MIN(R$) R$_A, MAX(R$) R$_Z 
FROM (
         SELECT MAIN.*, ROW_NUMBER() OVER (ORDER BY TEST_GROUP DESC, TEST_ID) R$
         FROM ( SELECT TEST_ID, TEST_NAME, TEST_GROUP 
                FROM TEST 
                GROUP BY TEST_ID, TEST_NAME, TEST_GROUP 
                ORDER BY TEST_GROUP DESC ) MAIN
     ) MAIN
GROUP BY TEST_GROUP;

TEST_GROUP     COUNT(R$)    R$_A    R$_Z
-------------- --------- ------- -------
(null)                 1       1       1
B                      1       2       2
A                      2       3       4

Any help will be appreciated


Solution

  • This seems to be bug 18353141. It's reproducible in 11.2.0.4 as well as 12.1.0.1 if NLS_SORT is set and NLS_COMP is set to binary:

    alter session set NLS_SORT=spanish;
    alter session set NLS_COMP=binary;
    
    -- your second query
    
    T  COUNT(R$)       R$_A       R$_Z
    - ---------- ---------- ----------
    A          4          1          4
    

    Changing the sort to linguistic fixes it:

    alter session set NLS_SORT=spanish;
    alter session set NLS_COMP=linguistic;
    
    -- your second query
    
    T  COUNT(R$)       R$_A       R$_Z
    - ---------- ---------- ----------
               1          1          1
    B          1          2          2
    A          2          3          4
    

    You can also modify the query to make the analytic order-by and the group-by different; e.g. this works just concatenating null (before the DESC):

    alter session set NLS_SORT=spanish;
    alter session set NLS_COMP=binary;
    
    SELECT TEST_GROUP, COUNT(R$), MIN(R$) R$_A, MAX(R$) R$_Z 
      FROM (
            SELECT MAIN.*, ROW_NUMBER() OVER (ORDER BY TEST_GROUP||null DESC, TEST_ID) R$
    -------------------------------------------------------------^^^^^^
              FROM ( SELECT TEST_ID, TEST_NAME, TEST_GROUP 
                       FROM TEST 
                   GROUP BY TEST_ID, TEST_NAME, TEST_GROUP 
                   ) MAIN
           ) MAIN
    GROUP BY TEST_GROUP;
    
    T  COUNT(R$)       R$_A       R$_Z
    - ---------- ---------- ----------
               1          1          1
    B          1          2          2
    A          2          3          4
    

    but it sounds like you want the same query to work in both SQL Server and Oracle so you'll need to find a way of modifying it that is valid for both.

    It's fixed in the 12.1.0.2 patch set, and an individual patch is available for 12.1.0.1 if you aren't able to apply the patch set.