Search code examples
sqloracleplsqloracle19cpipelined-function

Oracle Query over Pipeline Function - Strange Error when Data contains ASCII Extended Characters


I have an issue with an Oracle pipeline function, and I am getting crazy to understand what is happening. My Oracle Database is version 19c running over Red Hat 7.2 and configured in AL32UTF8 as CharacterSet.

Let me explain the scenario.

I have the following setup with two types and one pipeline function in order to generate files using parallel processes, thereby I can speed up incredibly the generation of big files.

Two types

--
-- DUMP_PARALLEL_OBJECT  (Type) 
--
CREATE OR REPLACE TYPE CPL_DATA_OUT.dump_parallel_object AS OBJECT
(file_name VARCHAR2 (128), no_records NUMBER, seq_id NUMBER);
/

--
-- DUMP_PARALLEL_OBJECT_NTT  (Type) 
--
CREATE OR REPLACE TYPE CPL_DATA_OUT.dump_parallel_object_ntt AS TABLE OF cpl_data_out.dump_parallel_object;
/

Pipelined Function

This is the pipeline function to get the output files in chunks that I can join then using cat in Linux.

CREATE OR REPLACE function CPL_DATA_OUT.fn_generate_parallel_file
(
p_source    IN SYS_REFCURSOR,
p_filename  IN VARCHAR2,
p_directory IN VARCHAR2,
p_extension IN VARCHAR2 DEFAULT 'csv',
p_limit     IN NUMBER DEFAULT 10000
) return dump_parallel_object_ntt
pipelined
parallel_enable (partition p_source by any)
as
   type row_ntt is table of varchar2(32767);
   v_rows    row_ntt;
   v_file    UTL_FILE.FILE_TYPE;
   v_buffer  VARCHAR2(32767);
   v_sid     NUMBER;
   v_name    VARCHAR2(128);
   v_lines   PLS_INTEGER := 0;
   c_eol     CONSTANT VARCHAR2(1) := CHR(10);
   c_eollen  CONSTANT PLS_INTEGER := LENGTH(c_eol);
   c_maxline CONSTANT PLS_INTEGER := 32767;
begin

   SELECT generate_random_number.nextval INTO v_sid FROM dual;
   v_name := p_filename || '_' || TO_CHAR(v_sid) || '.' || p_extension;
   v_file := UTL_FILE.FOPEN(p_directory, v_name, 'w', 32767);

   LOOP
     FETCH p_source BULK COLLECT INTO v_rows LIMIT p_limit;

      FOR i IN 1 .. v_rows.COUNT LOOP

         IF LENGTH(v_buffer) + c_eollen + LENGTH(v_rows(i)) <= c_maxline THEN
            v_buffer := v_buffer || c_eol || v_rows(i);
         ELSE
            IF v_buffer IS NOT NULL THEN
               UTL_FILE.PUT_LINE(v_file, v_buffer);
            END IF;
            v_buffer := v_rows(i);
         END IF;

      END LOOP;

      v_lines := v_lines + v_rows.COUNT;

      EXIT WHEN p_source%NOTFOUND;
   END LOOP;
   CLOSE p_source;

   UTL_FILE.PUT_LINE(v_file, v_buffer);
   UTL_FILE.FCLOSE(v_file);

   PIPE ROW (dump_parallel_object(v_name, v_lines, v_sid));
   RETURN;

END fn_generate_parallel_file;
/

I am using a sequence inside the function to assign unique numbers to those files. Let's test the scenario

Problematic Table

SQL> desc SRD_OUT.FCT_EMPROLE_TRANSFORM
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 DAT_MONTH                                          DATE
 PERSNR                                             VARCHAR2(6 CHAR)
 ARBEITS_STATUS                                     VARCHAR2(50 CHAR)
 NAME                                               VARCHAR2(50 CHAR)
 VORNAME                                            VARCHAR2(50 CHAR)
 FTE                                                NUMBER
 WOCHENSTUNDEN                                      NUMBER
 FUNKTION                                           VARCHAR2(50 CHAR)
 OE                                                 VARCHAR2(70 CHAR)
 DIREKTION                                          VARCHAR2(50 CHAR)
 BEREICH                                            VARCHAR2(50 CHAR)
 N_NUMMER                                           VARCHAR2(50 CHAR)
 FTE_VALUE                                          NUMBER
 CENTERKEY                                          VARCHAR2(200 CHAR)
 ROLLE                                              VARCHAR2(200 CHAR)
 BEMESSUNGSFAKTOR                                   VARCHAR2(50 CHAR)
 COD_PROCESS                                        VARCHAR2(30 CHAR)
 DAT_EFFECTIVE                                      DATE

SQL> select count(*) from SRD_OUT.FCT_EMPROLE_TRANSFORM ;

  COUNT(*)
----------
     20436

If I run the function against the dba_objects or other similar table/views, everything works as it should.

SQL> COL FILE_NAME FOR A50
SQL> set lines 220
SQL> r
  1  SELECT *
  2  FROM TABLE(
  3  cpl_data_out.fn_generate_parallel_file(
  4  CURSOR(
  5  SELECT /*+ PARALLEL(s,10) */
  6  "OWNER"          ||'~'||
  7  "OBJECT_NAME"    ||'~'||
  8  "SUBOBJECT_NAME" ||'~'||
  9  "OBJECT_ID"      ||'~'||
 10  "DATA_OBJECT_ID" ||'~'||
 11  "OBJECT_TYPE"    ||'~'||
 12  "CREATED"        ||'~'||
 13  "LAST_DDL_TIME" as csv
 14  FROM DBA_OBJECTS s)
 15  , 'test_file'
 16  , 'DIR_SRD_OUT'
 17  , 'csv')
 18  ) nt
 19*

FILE_NAME                                          NO_RECORDS     SEQ_ID
-------------------------------------------------- ---------- ----------
test_file_459.csv                                       25496        459
test_file_449.csv                                       25496        449
test_file_453.csv                                       25496        453
test_file_461.csv                                       25496        461
test_file_455.csv                                       25499        455
test_file_451.csv                                       25496        451
test_file_447.csv                                       25496        447
test_file_443.csv                                       25496        443
test_file_457.csv                                       25496        457
test_file_445.csv                                       25497        445

10 rows selected.

As you may see, the pipelined function works as expected, it creates 10 csv files that I can join later on using cat. However, if I try to run it against the table shown above, this happens ( for the purpose of the example, I am just using some columns of the table )

Working

SQL> SELECT *
  2  FROM TABLE(
  3  cpl_data_out.fn_generate_parallel_file(
  4  CURSOR(
  5  SELECT /*+ PARALLEL(s,10) */
  6  "DAT_MONTH"      ||'~'||
  7  "PERSNR"         ||'~'||
  8  "COD_PROCESS"    ||'~'||
  9  "DAT_EFFECTIVE"
 10  as csv
 11  FROM SRD_OUT.FCT_EMPROLE_TRANSFORM s)
 12  , 'test_file'
 13  , 'DIR_SRD_OUT'
 14  , 'csv')
 15* ) nt
SQL> /

FILE_NAME                                          NO_RECORDS     SEQ_ID
-------------------------------------------------- ---------- ----------
test_file_569.csv                                         456        569
test_file_571.csv                                         489        571
test_file_575.csv                                         314        575
test_file_573.csv                                         483        573
test_file_577.csv                                         496        577
test_file_581.csv                                         487        581
test_file_579.csv                                         430        579
test_file_567.csv                                        3500        567
test_file_565.csv                                        3606        565
test_file_563.csv                                       10175        563

10 rows selected.

Not Working

SQL> SELECT *
  2  FROM TABLE(
  3  cpl_data_out.fn_generate_parallel_file(
  4  CURSOR(
  5  SELECT /*+ PARALLEL(s,10) */
  6  "DAT_MONTH"      ||'~'||
  7  "PERSNR"         ||'~'||
  8  "COD_PROCESS"    ||'~'||
  9  "DAT_EFFECTIVE"  ||'~'||
 10  "ROLLE"
 11  as csv
 12  FROM SRD_OUT.FCT_EMPROLE_TRANSFORM s)
 13  , 'test_file'
 14  , 'DIR_SRD_OUT'
 15  , 'csv')
 16* ) nt
SQL> /
ERROR:
ORA-12801: error signaled in parallel query server P005
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at "CPL_DATA_OUT.FN_GENERATE_PARALLEL_FILE", line 34
ORA-06512: at line 1

The only difference between the two queries is the column "ROLLE" which contains ASCII extended characters ( as letters in German such as "äüöß" ). It happens with every column that contains such characters.

Actually the error refers to this line: v_buffer := v_buffer || c_eol || v_rows(i); , but I have no clue what is wrong there when such characters are involved.

SQL> set pages 200
SQL> r
  1* select distinct rolle from SRD_OUT.FCT_EMPROLE_TRANSFORM

ROLLE
------------------------
Filialleiter (große Filiale)
Vertriebsdirektor Vermögensberatung

I don't really understand what relationship exists between those Extended ASCII characters and the function. What should I change in my function to make it work with such characters ?

Thank you all for your help.


Solution

  • When you do:

    IF LENGTH(v_buffer) + c_eollen + LENGTH(v_rows(i)) <= c_maxline
    

    you are counting the number of characters in the buffer and collection variables. When you only have single-byte characters that's OK, but with any multibyte characters you could get a situation where the number of characters totals less than 32767, but the number of bytes exceeds that. The check passes; but then you do:

    v_buffer := v_buffer || c_eol || v_rows(i)
    

    which exceeds the size of the buffer, and throws the error. If your buffer was declared smaller than the maximum and using character semantics you might still get away with that; but with the maximum size (and any semantics) it will fail.

    If you count bytes instead of characters it won't exceed the byte limit:

    IF LENGTHB(v_buffer) + c_eollen + LENGTHB(v_rows(i)) <= c_maxline