Search code examples
stringhiveimpala

In impala/hive, How can I extract the word before and after a specific keyword in a string?


I have a string column in impala called text that contains descriptions. I would like to get the words before and after a specific keyword.

Example:

  • text= This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom....

  • keyword= m2

desired result: two columns, word before = 50 and word after= apartment

Any ideas?


Solution

  • You can use regexp_extract to match words before and after m2 and extract them seperately.

    with t as ( select "This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom" as text)
    select 
        regexp_extract(t.text , "(\\w+)\\s+m2", 1) as word_before,
        regexp_extract(t.text , "m2\\s+(\\w+)", 1) as word_after
    from t ;
    
    +--------------+-------------+--+
    | word_before  | word_after  |
    +--------------+-------------+--+
    | 50           | apartment   |
    +--------------+-------------+--+