Search code examples
sqloraclevarcharclob

Concatenating CLOBs and VARCHAR in Oracle SQL


I have a table (WORKLOG) in an Oracle SQL database which has a VARCHAR description field of 250 characters. On a separate table (LONGDESCRIPTION) there is a CLOB field which holds a long description. My goal is to build a query which returns the first 4000 characters of the concatenation of the short and long descriptions.

I initially tried this with SUBSTR functions, but that didn't work so well with the CLOB: it was overflowing the VARCHAR(4000) field I'd set up for it (ORA-64203: Destination buffer too small to hold CLOB data after character set conversion.). After searching on here I tried using DBMS_LOB.SUBSTR, but this didn't seem to work either: I got the same 64203, or ORA-22831: Offset or offset+amount does not land on character boundary. I have tried a few ways to fix it, using slightly different functions, but each runs into trouble, and I'm having no luck working out a way to avoid cutting a character but also ensuring I don't overflow the 4000 limit.

Here's the code I'm using. The idea is that it should use the short description if there is no long description, use a truncated long description if there is no short description or if the short description is the same as the start of the long description, and otherwise concatenate the two. The REGEXP functions are to try and remove HTML formatting.

SELECT

  WORKLOG.WORKLOGID as Worklog,
  WORKLOG.DESCRIPTION as Description,
  CASE
    WHEN  WORKLOG.DESCRIPTION is null  AND  LENGTH(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>')) > 4000
      THEN  CAST(SUBSTR(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>'), 1, 3996) as VARCHAR(3996)) || '...'
    WHEN  WORKLOG.DESCRIPTION is null  
      THEN  CAST(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>') as VARCHAR(4000))
    WHEN  LONGDESCRIPTION.LDKEY is null
      THEN  WORKLOG.DESCRIPTION
    WHEN  CAST(SUBSTR(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>'), 1, LENGTH(WORKLOG.DESCRIPTION)) as VARCHAR(4000)) = WORKLOG.DESCRIPTION  AND  LENGTH(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>')) > 4000
      THEN  CAST(SUBSTR(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>'), 1, 3996) as VARCHAR(3996)) || '...'
    WHEN  CAST(SUBSTR(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>'), 1, LENGTH(WORKLOG.DESCRIPTION)) as VARCHAR(4000)) = WORKLOG.DESCRIPTION  
      THEN  CAST(SUBSTR(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>'), 1, 4000) as VARCHAR(4000))
    WHEN  LENGTH(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>')) > 3732
      THEN  WORKLOG.DESCRIPTION || ' // ' || CAST(SUBSTR(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>'), 1, 3732) as VARCHAR(3732)) || '...' 
    ELSE    WORKLOG.DESCRIPTION || ' // ' || CAST(SUBSTR(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>'), 1, 3732) as VARCHAR(3735))
    END as Description_Full

FROM
  Maximo.WORKLOG
  
LEFT JOIN
  Maximo.LONGDESCRIPTION
  ON  LONGDESCRIPTION.LDOWNERTABLE = 'WORKLOG'
  AND  LONGDESCRIPTION.LDOWNERCOL = 'DESCRIPTION'
  AND  LONGDESCRIPTION.LDKEY = WORKLOG.WORKLOGID
  AND  LENGTH(REGEXP_REPLACE(LONGDESCRIPTION.LDTEXT, '<.*?>')) > 0

Examples where the description field is 2 characters long and the maximum is 10:

Description Long description Expected result
ABC ABC
ABCDEFGHIJKL ABCDEFGHIJ
AB AB
AB ABCDE ABCDE
AB ABCDEFGHIJKL ABCDEFGHIJ
AB XY AB // XY
AB XYZABCDEF AB // XYZA

Any help would be greatly appreciated.


Solution

  • Assuming that your CLOB contains well-formed XML or XHTML then you should use a proper XML parser (and not regular expressions) and can use:

    SELECT w.WORKLOGID as Worklog,
           w.DESCRIPTION as Description,
           SUBSTR(
             CASE
             WHEN w.description IS NOT NULL AND xml.text LIKE w.description || '%'
             THEN xml.text
             WHEN w.description IS NOT NULL AND xml.text IS NOT NULL
             THEN TO_CLOB(w.DESCRIPTION) || '//' || xml.text
             WHEN w.description IS NOT NULL
             THEN TO_CLOB(w.DESCRIPTION)
             WHEN xml.text IS NOT NULL
             THEN xml.text
             END,
             1,
             4000
           ) AS Description_Full
    FROM   /*Maximo.*/WORKLOG w
           LEFT JOIN (
             /*Maximo.*/LONGDESCRIPTION ld
             CROSS APPLY XMLTABLE(
               '/HTML/BODY'
               PASSING XMLTYPE(
                 COALESCE(ld.LONGDESCRIPTION, TO_CLOB('<HTML></HTML>') )
               )
               COLUMNS
                 text CLOB PATH '.'
             ) xml
           )
           ON      ld.LDOWNERTABLE = 'WORKLOG'
               AND ld.LDOWNERCOL   = 'DESCRIPTION'
               AND ld.LDKEY        = w.WORKLOGID
    

    Which, for the sample data:

    CREATE TABLE worklog (
      worklogid   NUMBER,
      description VARCHAR2(250)
    );
    
    CREATE TABLE longdescription (
      LDOWNERTABLE    VARCHAR2(30),
      LDOWNERCOL      VARCHAR2(30),
      LDKEY           NUMBER,
      LONGDESCRIPTION CLOB
    );
    
    INSERT INTO worklog (worklogid, description)
    SELECT 1, 'Value1' FROM DUAL UNION ALL
    SELECT 2, 'Value2' FROM DUAL UNION ALL
    SELECT 3, NULL FROM DUAL UNION ALL
    SELECT 4, NULL FROM DUAL;
    
    INSERT INTO longdescription
    SELECT 'WORKLOG', 'DESCRIPTION', 1, '<HTML><BODY>Test</BODY></HTML>' FROM DUAL UNION ALL
    SELECT 'WORKLOG', 'DESCRIPTION', 2, NULL FROM DUAL UNION ALL
    SELECT 'WORKLOG', 'DESCRIPTION', 3, '<HTML><BODY>Test</BODY></HTML>' FROM DUAL UNION ALL
    SELECT 'WORKLOG', 'DESCRIPTION', 4, NULL FROM DUAL;
    

    Outputs:

    WORKLOG DESCRIPTION DESCRIPTION_FULL
    1 Value1 Value1//Test
    3 null Test
    2 Value2 Value2
    4 null null

    db<>fiddle here