Search code examples
sqlxmloracle-databaseplsqlpayload

PLSQL - REGEXP_SUBSTR Remove XML Tags


I currently have a payload that was generated by oracle xml gateway that I need to pull some exact information out. The payload information is store within a db table meaning that I am attempting to us regexp_substr to accomplish this task.

This is the tag that is in the middle of the XML document

<IDCODE>S2200</IDCODE>

    "<?xml version="1.0" encoding="UTF-8" standalone='no'?>
<!DOCTYPE PROCESS_INVOICE_002 SYSTEM "asfasdf.dtd">
<!-- Oracle eXtensible Markup Language Gateway Server  -->
<PROCESS_INVOICE_002>
  <CNTROLAREA>
    <BSR>
      <VERB value="PROCESS"/>
      <NOUN value="INVOICE"/>
      <REVISION value="002"/>
    </BSR>
    <SENDER>
      <LOGICALID/>
      <COMPONENT/>
      <TASK/>
      <REFERENCEID/>
      <CONFIRMATION/>
      <LANGUAGE/>
      <CODEPAGE/>
      <AUTHID/>
    </SENDER>
    <DATETIME qualifier="CREATION">
      <YEAR/>
      <MONTH/>
      <DAY/>
      <HOUR/>
      <MINUTE/>
      <SECOND/>
      <SUBSECOND/>
      <TIMEZONE/>
    </DATETIME>
  </CNTROLAREA>
  <DATAAREA>
    <PROCESS_INVOICE>
      <INVHEADER>
        <AMOUNT qualifier="DOCUMENT" type="T" index="1">
          <VALUE>78538</VALUE>
          <NUMOFDEC>8</NUMOFDEC>
          <SIGN>+</SIGN>
          <CURRENCY>USD</CURRENCY>
          <DRCR>D</DRCR>
        </AMOUNT>
        <DATETIME qualifier="DOCUMENT" index="1">
          <YEAR>2020</YEAR>
          <MONTH>11</MONTH>
          <DAY>28</DAY>
          <HOUR>00</HOUR>
          <MINUTE>00</MINUTE>
          <SECOND>00</SECOND>
          <SUBSECOND>0000</SUBSECOND>
          <TIMEZONE>+0000</TIMEZONE>
        </DATETIME>
        <DOCUMENTID>81989184</DOCUMENTID>
        <DESCRIPTN/>
        <DOCTYPE>INV</DOCTYPE>
        <PAYMETHOD/>
        <REASONCODE/>
        <USERAREA>
          <NOTEREFCODE/>
          <NOTESREF/>
          <VENDNUMQUAL>IA</VENDNUMQUAL>
          <VENDNUM>98181</VENDNUM>
          <DEPTNUMQUAL>DP</DEPTNUMQUAL>
          <DEPTNUM>85</DEPTNUM>
          <ORDNUMQUAL/>
          <ORDNUM>0</ORDNUM>
          <CUSTCODEQUAL/>
          <CUSTCODE/>
          <NETDAYS/>
          <DATETIMEQUAL/>
          <FOBCODE/>
          <UOM/>
          <TOTALQUANTITY/>
        </USERAREA>
        <PARTNER>
          <NAME index="1">COMPANY NAME</NAME>
          <ONETIME/>
          <PARTNRID/>
          <PARTNRTYPE>Supplier</PARTNRTYPE>
          <SYNCIND/>
          <ACTIVE/>
          <CURRENCY/>
          <DESCRIPTN/>
          <DUNSNUMBER/>
          <GLENTITYS/>
          <PARENTID/>
          <PARTNRIDX/>
          <PARTNRRATG/>
          <PARTNRROLE/>
          <PAYMETHOD/>
          <TAXEXEMPT/>
          <TAXID/>
          <TERMID/>
          <USERAREA>
            <IDQUAL/>
            <IDCODE/>
          </USERAREA>
          <CONTACT>
            <NAME index="1">PROFILE</NAME>
            <CONTCTTYPE/>
            <DESCRIPTN/>
            <EMAIL/>
            <FAX index="1"/>
            <TELEPHONE index="1"/>
            <USERAREA/>
          </CONTACT>
        </PARTNER>
        <PARTNER>
          <NAME index="1">CUSTOMER NAME</NAME>
          <ONETIME/>
          <PARTNRID>981698198</PARTNRID>
          <PARTNRTYPE>ShipTo</PARTNRTYPE>
          <SYNCIND/>
          <ACTIVE/>
          <CURRENCY/>
          <DESCRIPTN/>
          <DUNSNUMBER/>
          <GLENTITYS/>
          <PARENTID/>
          <PARTNRIDX/>
          <PARTNRRATG/>
          <PARTNRROLE/>
          <PAYMETHOD/>
          <TAXEXEMPT/>
          <TAXID/>
          <TERMID/>
          <USERAREA>
            <IDQUAL>ZZ</IDQUAL>
            <IDCODE>S2200</IDCODE>
          </USERAREA>
          <ADDRESS>
            <ADDRLINE index="1">123 MAIN STREET</ADDRLINE>
            <ADDRTYPE/>
            <CITY>HAM CITY</CITY>
            <COUNTRY>United States</COUNTRY>
            <COUNTY>NEW YORK</COUNTY>
            <DESCRIPTN/>
            <FAX index="1"/>
            <POSTALCODE>18080</POSTALCODE>
            <REGION/>
            <STATEPROVN>NY</STATEPROVN>
            <TAXJRSDCTN/>
            <TELEPHONE index="1"/>
            <URL/>
            <USERAREA/>
          </ADDRESS>

REGEX that I am using in the query

TRIM(regexp_substr(ed.payload, '?.+(</IDCODE>)')) Store_NUM,
TRIM(regexp_substr(ed.payload, '(^IDCODE)?.+(</IDCODE>)')) Store_Number

The Outcome that I am receiving from the above SQL regexp_substr. The issue is that I have made it to the correct tab but I can't figure out how to strip the \<IDCODE> and the \</IDCODE> for the output

-Field can have 4 or 5 chars -letters or numbers

<IDCODE>S2200</IDCODE> Store_NUM

<IDCODE>S2200</IDCODE> Store_Number


Solution

  • I believe you are looking for this if I am understanding you correctly. Return everything in the group between the tags.

    SELECT REGEXP_SUBSTR('<IDCODE>S2200</IDCODE>', '<IDCODE>(.*)</IDCODE>', 1, 1, NULL, 1) Store_Number
    from dual;
    
    
    STORE_NUMBER
    ------------
    S2200       
    1 row selected.