I have a string column in impala called text
that contains descriptions. I would like to get the words before and after a specific keyword.
Example:
text
= This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom....
keyword
= m2
desired result: two columns, word before
= 50 and word after
= apartment
Any ideas?
You can use regexp_extract
to match words before and after m2
and extract them seperately.
with t as ( select "This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom" as text)
select
regexp_extract(t.text , "(\\w+)\\s+m2", 1) as word_before,
regexp_extract(t.text , "m2\\s+(\\w+)", 1) as word_after
from t ;
+--------------+-------------+--+
| word_before | word_after |
+--------------+-------------+--+
| 50 | apartment |
+--------------+-------------+--+