Search code examples
sqlxmlxml-parsingdb2xmltable

How to skip special characters in XML when reading XML using XMLPARSE in SQL


I'm trying to parse an XML for reading some special nodes. However one of the nodes has user comments which contain all types of special characters including &, <, >, & etc. Unfortunately I cannot edit the XML's.

While I am using REPLACE to remove some of them, the '<' and '>' is clashing with the node delimiters.

Any suggestions from experts here? Maybe some possible solutions. Please note I am a novice so detailed guidance will be appreciated.

  1. REPLACE all special characters including '<' and '>'
  2. Skip reading any special character
  3. Limit the data I read. The comment node comes quiet far down in the XML where as I just need the top of the XML CLOB. If there is a way to limit XMLPARSE to certain number of characters, that can work but its not an ideal solution
  4. Utilize another function instead of XMLPARSE.

Here's how my script looks like:

SELECT * 
FROM TRANSACTION_DATA_TABLE tdt, trxData,
XMLTABLE
('$d/*:Request/*:AppData' PASSING XMLPARSE(DOCUMENT REPLACE(tdt.TRAN_DATA, '&', '') AS "d"
COLUMNS
FNAME CHAR(20) PATH '*:FirstName',
LNAME CHAR(20) PATH '*:LastName'
)  

TIA.


Solution

  • It would really help to have DDL and DML provided so that the problem can be treated fully, but here is something that may assist. You might also add to the fiddle(s) as an addition to your question.

    Here is a simple example of how to return wanted elements of XML and then strip some characters from them - for DB2:

    CREATE TABLE TRANSACTION_DATA_TABLE (
      ID INTEGER,
      TRAN_DATA XML
    );
    
    INSERT INTO TRANSACTION_DATA_TABLE (ID, TRAN_DATA) VALUES (
      1,
      XMLPARSE(DOCUMENT '<TransactionData>
                  <Request>
                    <AppData>
                      <FirstName>John & Freddy</FirstName>
                      <LastName>Doe</LastName>
                    </AppData>
                  </Request>
                </TransactionData>' PRESERVE WHITESPACE)
    );
    
    SELECT tdt.ID,
           TRANSLATE(d.FirstName, '', '&<>') AS FirstName,
           TRANSLATE(d.LastName, '', '&<>') AS LastName
    FROM TRANSACTION_DATA_TABLE tdt,
         XMLTABLE('/TransactionData/Request/AppData'
                  PASSING tdt.TRAN_DATA
                  COLUMNS
                    FirstName VARCHAR(200) PATH 'FirstName',
                    LastName VARCHAR(200) PATH 'LastName') as d;
    
    ID FIRSTNAME LASTNAME
    1 John Freddy Doe

    DB2 fiddle

    or. use regex_replace perhaps?

    SELECT tdt.ID,
           REGEXP_REPLACE(d.FirstName, '[&<>]', '') AS FirstName,
           REGEXP_REPLACE(d.LastName, '[&<>]', '') AS LastName
    FROM TRANSACTION_DATA_TABLE tdt,
         XMLTABLE('/TransactionData/Request/AppData'
                  PASSING tdt.TRAN_DATA
                  COLUMNS
                    FirstName VARCHAR2(200) PATH 'FirstName',
                    LastName VARCHAR2(200) PATH 'LastName') as d;
    

    Oracle fiddle (as I wasn't sure which db this was for)