Search code examples
sqloraclecaselistagganalytic-functions

listagg produces ORA-01489 if used as window function in conditional expression


My query returns many (thousands of) rows. Column l has certain value for very small amount of rows (up to 10). For each such row I want to output aggregated comma-separated values of very short (up to 5 chars) varchar column v over all of these rows. For rows not having the special value of l I want to simply output the v value for that row.

Synthetized example of same problem: from first 10000 integers, I want to output 1,2,3,4,5,6,7,8,9 for each single-digit number; that number for multiple-digit number. (Yes, silly example but real case makes sense.)

with x (v,l) as (
  select to_char(level), length(to_char(level)) from dual connect by level <= 10000
)
select case l
         when 1 then listagg(v,',') within group (order by v) over (partition by l)
         else v
       end
from x
order by 1;

The problem is, listagg function fails on ORA-01489: result of string concatenation is too long error.

I am aware of 4000 char limit of listagg function as well as xmlagg-based workaround. I just don't get the limit is enough for data I want to concatenate even though not enough for all data. In example above, the partition of 9 single-digit numbers fits into 4000 chars, the partition of 9000 four-digit numbers not. I expected the case expression would prevent execution of window for unrelated rows but, for some reason, it seems the db engine evaluates window for all rows. (Also note that order by clause causes query to fail-fast - without it some rows are returned before failure.)

Can you please explain some reasoning for this behaviour? I suspect the window computation is logically before select clause but without any evidence. Reproduced on Oracle 11g, 18c and 19 (livesql).


Solution

  • Well you are using SQL which is not procedural, so you can't expect that some parts of the code path will not be executed, only because they are not used. (So filling a bug as other suggested will have no success).

    Anyway you can do the often used trick based on the fact that listagg ignores null values.

    So this formulation works fine:

    with x (v,l) as (
      select to_char(level), length(to_char(level)) from dual connect by level <= 10000
    )
    select   nvl(listagg(case when l = 1 then v end,',') within group (order by v) over (partition by l),v) lst
    from x
    order by 1;
    

    giving

    LST
    ------------------
    1,2,3,4,5,6,7,8,9
    1,2,3,4,5,6,7,8,9
    ..
    10
    100
    1000
    10000
    

    The explanation of the problem can be found in the execution plan (showing only the relevant part)

    ----------------------------------------------------------------------------------------
    | Id  | Operation                       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                |      |     1 |    35 |     4  (50)| 00:00:01 |
    |   1 |  SORT ORDER BY                  |      |     1 |    35 |     4  (50)| 00:00:01 |
    |   2 |   WINDOW SORT                   |      |     1 |    35 |     4  (50)| 00:00:01 |
    |   3 |    VIEW                         |      |     1 |    35 |     2   (0)| 00:00:01 |
    |*  4 |     CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
    |   5 |      FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
    ...
    Column Projection Information (identified by operation id):
    -----------------------------------------------------------
     
       1 - (#keys=1) CASE "L" WHEN 1 THEN LISTAGG("V",',') WITHIN GROUP ( ORDER BY 
           "V") OVER ( PARTITION BY "L") ELSE "V" END [4000]
       2 - (#keys=2) "L"[NUMBER,22], "V"[VARCHAR2,40], LISTAGG("V",',') WITHIN 
           GROUP ( ORDER BY "V") OVER ( PARTITION BY "L")[4000]
       3 - "V"[VARCHAR2,40], "L"[NUMBER,22]
       4 - LEVEL[4]
    

    So in the line 2 the listagg is calculated (for all rows) only to be filtered in the line 1.