Search code examples
oraclehivehiveql

How to convert this hive query to oracle


I have this hive query:

select REGEXP_EXTRACT( lower(column_name) , '.*(build[ \t]*(app)?[ \t]*:[ \t]*)(.*?)([ \t]*;[ \t]*essential[ \t]+reason[ \t]+info[ \t]+compilation.*$|$)', 3) from table

How do I convert it to oracle query?

I have tried using regexp_substr() but it doesn't work like it needs to. Thanks!


Solution

  • It would help a lot if you could edit your question to add some example data with your expected output so we could see how it "doesn't work".

    regexp_substr() follows the same syntax as regexp_extract(), but Oracle (like many vendors) only supports a specific set of regular expression operators.

    In your case, the main issue I see is that \t isn't interpreted as "tab character" in Oracle. You have several alternatives to match tab characters:

    with test_data as (select 'build'||chr(9)||':' as s from dual) -- 'build' with a tab character then a ':'
    select 
        regexp_substr(s, 'build[ \t]*:') as slash_t, -- doesn't work
        regexp_substr(s, 'build[[:space:]]*:') as posix_cc, -- matches any whitespace characters
        regexp_substr(s, 'build\s*:') as perl_cc, -- same as posix, but perl dialect
        regexp_substr(s, 'build[    ]*:') as literal_tab, -- StackOverflow formatting ruins this, but there was a tab there
        regexp_substr(s, 'build[ '||chr(9)||']*:') as chr_tab -- chr(9) is tab
    from test_data;
    

    Output:

    SLASH_T POSIX_CC PERL_CC LITERAL_TAB CHR_TAB
    ------- -------- ------- ----------- -------
            build  : build : build   :   build :