Search code examples
oraclehivehiveql

How to convert a hive query with regex to oracle


I have this text:

Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples

I just want to get the part after 'Process explanation' but not include 'final activity...'

So like this:

The bottle is then melted to form liquid glass.

This is the current hive query which I want to convert to oracle:

SELECT REGEXP_EXTRACT(
               'Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples',
               '.*(process[ \t]*(explanation)?[ \t]*:[ \t]*)(.*?)([ \t]*;[ \t]*final[ \t]+activity[ \t]+for[ \t]+manager.*$|$)',
               3) as extracted
FROM my_table

Solution

  • I've come up with something like this:

    with strings as
    (SELECT '1Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples' str FROM DUAL
    union all
    SELECT '2Process explanation:The bottle is then melted to form liquid glass;' str FROM DUAL
    union all
    SELECT '3Process :The bottle is then melted to form liquid glass' str FROM DUAL
    union all
    SELECT '4Process explanation: plasma gasification combined with centrifugal activity' str FROM DUAL
    union all
    SELECT '5Final activity for manager:Labeling of previous samples' str FROM DUAL
    )
    SELECT str
    , REGEXP_SUBSTR(
                   str,
               '(.*process[[:blank:]]*(explanation)?[[:blank:]]*:[[:blank:]]*)([A-Za-z0-9 ]*)([[:blank:]]*;[[:blank:]]*final[[:blank:]]*activity[[:blank:]]*for[[:blank:]]*manager.*$)?',
               1, 1, 'i',3)
                    as extracted
    FROM strings
    

    Resulting in:

    STR EXTRACTED
    1Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples The bottle is then melted to form liquid glass
    2Process explanation:The bottle is then melted to form liquid glass; The bottle is then melted to form liquid glass
    3Process :The bottle is then melted to form liquid glass The bottle is then melted to form liquid glass
    4Process explanation: plasma gasification combined with centrifugal activity plasma gasification combined with centrifugal activity
    5Final activity for manager:Labeling of previous samples -

    Assuming matching blank group instead of your space and tab list [ \t] is ok. Edit: Modified the regexp a bit cause with possibility of last group being empty '.*' kept catching entire line.