Search code examples
oracle-databaseplsqlclob

Migration script with dbms_lob.substr gets "character string buffer too small"


I have a problem with a script that migrates data from an old table to a new one. One of the columns in the old table is a CLOB but in the new table it is VARCHAR2. I tried to insert those using the code a below. But I got in trouble with error:

ORA-06502: PL/SQL: numeric or value error: character string buffer too small.

DECLARE
    CURSOR CUR IS
        SELECT T.*
          FROM ACTIVITY_EVENT T
         WHERE T.POST_TEXT IS NOT NULL;
    R CUR%ROWTYPE;
BEGIN
 FOR R IN CUR
    LOOP
        INSERT INTO STREAM_TEXT
            WITH STR AS
             (SELECT T.*
                FROM STREAM T
               WHERE T.OLD_ID = R.ID)
            SELECT SEQ$STREAM_TEXT.NEXTVAL AS ID,
                   DBMS_LOB.SUBSTR(T.POST_TEXT, 4000, 1) AS TEXT,
                   T.DT AS DT,
                   'READY' AS STATE,
                   STR.ID AS STREAM_ID
              FROM ACTIVITY_EVENT T
              LEFT JOIN STR
                ON STR.OLD_ID = T.ID
             WHERE T.POST_TEXT IS NOT NULL
               AND STR.OLD_ID = T.ID;
    END LOOP;
END;

I did this code without the loop first and got the same problem so I tried to create a loop. But the result is the same.

This simple query fails with the same error:

SELECT T.ID, DBMS_LOB.SUBSTR(T.POST_TEXT, 4000, 1)
FROM ACTIVITY_EVENT T
WHERE T.POST_TEXT IS NOT NULL

Solution

  • You'll get this error if your source column is actually a NCLOB, rather than a CLOB. This is OK:

    create table t42 (id number, dt date, post_text clob);
    insert into t42 (id, dt, post_text) values (1, sysdate, dbms_random.string('p', 4000));
    select id, dbms_lob.substr(post_text, 4000, 1) from t42;
    

    But this errors, just changing CLOB to NCLOB, with any length above 2000:

    create table t42 (id number, dt date, post_text nclob);
    insert into t42 (id, dt, post_text) values (1, sysdate, dbms_random.string('p', 4000));
    select id, dbms_lob.substr(post_text, 4000, 1) from t42;
    
    SQL Error: ORA-06502: PL/SQL: numeric or value error: character string buffer too small
    ORA-06512: at line 1
    

    This is with AL32UTF8 and AL16UTF16 as the database and national character sets.

    So if your source table is NCLOB, you can only extract the first 2000 characters to put into your stream_text table.

    You will also see this if the source column is a CLOB and the first 4000 characters contain any multibyte characters. dbms_log.substr(x, 4000, 1) always gets the first 4000 characters for a CLOB, which may be more than 4000 bytes - which is the maximum size of a VARCHAR2 value in a SQL context, even if it's declared as varchar2(4000 char), as it still can't exceed the 4000-byte limit.

    If you want to get the maximum 4000 characters out then you can do that via a PL/SQL VARCHAR2 variable, with an additional substrb() call:

    DECLARE
        CURSOR CUR IS
            SELECT SEQ$STREAM_TEXT.NEXTVAL AS ID,
                   T.POST_TEXT,
                   T.DT AS DT,
                   'READY' AS STATE,
                   S.ID AS STREAM_ID
              FROM ACTIVITY_EVENT T
              LEFT JOIN STREAM S
                ON S.OLD_ID = T.ID
             WHERE T.POST_TEXT IS NOT NULL;
    
        TMP_TEXT VARCHAR2(4000);
    BEGIN
        FOR R IN CUR
        LOOP
            TMP_TEXT := SUBSTRB(DBMS_LOB.SUBSTR(R.POST_TEXT, 4000, 1), 1, 4000);
            INSERT INTO STREAM_TEXT (ID, TEXT, DT, STATE, STREAM_ID)
            VALUES (R.ID, TMP_TEXT, R.DT, R.STATE, R.STREAM_ID);
        END LOOP;
    END;
    /
    

    The substrb(..., 1, 4000) part won't work in SQL either, as the inner expression is still too big, but it does work in PL/SQL. You get the first 4000 bytes of the first 4000 characters. (Though you may have a problem still if the 4000th bytes is part-way through a multi-byte character).

    I've guessed the column names in the target table, so use the real ones, obviously. Doing a bulk insert would be better if you have a lot of data; fetching into a collection and using FORALL to insert in bulk rather than row-by-row; something like this would work as a starting point:

    DECLARE
        TYPE TMP_REC_TYPE IS RECORD (
          ID STREAM_TEXT.ID%TYPE,
          POST_TEXT ACTIVITY_EVENT.POST_TEXT%TYPE,
          TEXT STREAM_TEXT.TEXT%TYPE,
          DT STREAM_TEXT.DT%TYPE,
          STATE STREAM_TEXT.STATE%TYPE,
          STREAM_ID STREAM_TEXT.STREAM_ID%TYPE
        );
        TYPE TMP_REC_TAB_TYPE IS TABLE OF TMP_REC_TYPE;
        TMP_REC_TAB TMP_REC_TAB_TYPE;
        RC SYS_REFCURSOR;
    BEGIN
        OPEN RC FOR
            SELECT SEQ$STREAM_TEXT.NEXTVAL AS ID,
                   T.POST_TEXT,
                   NULL AS TEXT,
                   T.DT AS DT,
                   'READY' AS STATE,
                   S.ID AS STREAM_ID
              FROM ACTIVITY_EVENT T
              LEFT JOIN STREAM S
                ON S.OLD_ID = T.ID
             WHERE T.POST_TEXT IS NOT NULL;
        LOOP
            FETCH RC BULK COLLECT INTO TMP_REC_TAB LIMIT 100;
            FOR I IN 1..TMP_REC_TAB.COUNT LOOP -- populate text field
                TMP_REC_TAB(I).TEXT := SUBSTRB(
                  DBMS_LOB.SUBSTR(TMP_REC_TAB(I).POST_TEXT, 4000, 1), 1, 4000);
            END LOOP;
            FORALL I IN 1..TMP_REC_TAB.COUNT -- bulk insert
                INSERT INTO STREAM_TEXT (ID, TEXT, DT, STATE, STREAM_ID)
                VALUES (TMP_REC_TAB(I).ID, TMP_REC_TAB(I).TEXT, TMP_REC_TAB(I).DT,
                    TMP_REC_TAB(I).STATE, TMP_REC_TAB(I).STREAM_ID);
            EXIT WHEN RC%NOTFOUND;
        END LOOP;
    END;
    /