I would like to extract 3 words before the selay dervice
but the query returns an empty column :(
with a as (
select * from tablename1 b
where lower(ptranscript) rlike 'selay dervice'
)
select *,regexp_extract(lower(a.ptranscript),'([a-zA-Z0-9]+\s+){3}selay dervice',0) from a
##########update 1
as pointed by Raid earlier, in Hive we cannot use \s
and have to use \\s
. I updated the above regex accordingly and it works
with a as (
select * from tablename1 b
where lower(ptranscript) rlike 'selay dervice'
)
select *,regexp_extract(lower(a.ptranscript),'([a-zA-Z0-9]+\\s+){3}selay dervice',0) from a
Try below:
with a as (
select * from tablename1 b
where lower(ptranscript) rlike 'selay dervice'
)
select *,regexp_extract(lower(a.ptranscript),'(?:[a-zA-Z0-9]+ ){3}selay dervice',0) from a
Note that if there are less than 3 words before selay dervice you will get empty results.
I tested similar query in latest apache hive and got something like below:
+----------------------------------+-----------------------------+
| key | regex_ext |
+----------------------------------+-----------------------------+
| rlk1 selay dervice | |
| selay dervice k4 | |
| k5 selay dervice ew | |
| thre word b4 selay dervice | thre word b4 selay dervice |
| four word be four selay dervice | word be four selay dervice |
+----------------------------------+-----------------------------+
Edit 1: Result does not vary with or without ? All 3 versions below gives same result.
As per docs \s matches any white space not just spacebar