Search code examples
sqlregexoracle-databasesubstr

Manipulating with regexp_substr


I have an ETL task for datawarehouse-ing purposes, I need to extract the second part of a String after a delimiter occurence such as: '#', 'ý', '-'. For example test case string:

'Tori 1#MHK-MahallaKingaveKD' I should retrieve only 'MHK'

'HPHelm2ýFFS-Tredddline' I should retrieve only 'FFS'

I already tried using the cases above:

TRIM(CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline', '#',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^#]+', 1,2), 
          '#'
       ))
    ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline', '-',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^-]+', 1,2), 
          '-'
       ))
       ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline','-') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','ý') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','#') = 0
    THEN 'HPHelm2ýFFS-Tredddline'
        ELSE (CASE
            WHEN INSTR('HPHelm2ýFFS-Tredddline','ý',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^ý]+', 1,2), 
          'ý'
       ))
            END)
          END)   
            END)
END)

Using the code above I can retrieve:

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK-MahallaKingaveKD'
'HPHelm2ýFFS-Tredddline' ====> 'FFS-Tredddline'

Expected output:

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK'
'HPHelm2ýFFS-Tredddline' ====> 'FFS'

So I have to exclude '-' and the string after.

I guess I should modify the regexp_substr pattern but can't seem to find a clear solution since '-' is specified in the case when statements as a delimiter.


Solution

  • I suggest retrieving the second occurrence of 1+ chars other than your delimiter chars:

    regexp_substr(col, '[^#ý-]+', 1, 2)
    

    Here, the search starts with the first char in the record (1), and the second occurrence is returned (2).

    The [^#ý-]+ pattern matches one or more (+) chars other than #, ý and -.