I am using regex function in Impala to find the folder name in file path but it doesn't seem to give me correct result
I want to parse out "one" from this file path:
/this/one/path/to/hdfs
This is the regex which I used:
regexp_extract(filepath,'[/]+',0)
If here we wish to capture the /
, then we might just want to try ([\/]+)
. There should be other expressions to extract one
also, such as:
(?:\/[a-z]+\/)(.+?)(?:\/.+)
and our code might look like:
regexp_extract(filepath, '(?:\/[a-z]+\/)(.+?)(?:\/.+)', 2)
or
regexp_extract(filepath, '(?:\/.+?\/)(.+?)(?:\/.+)', 2)
In this case, we are not capturing what is behind one
using a non-capturing group:
(?:\/[a-z]+\/)
then we capture one
using:
(.+?)
and finally we add a right boundary after one
in another non-capturing group:
(?:\/.+)
jex.im visualizes regular expressions:
Depending on which slash, one
might be located, we can modify our expression. For example, in this case, this expression also might be working:
(?:\/.+?\/)(.+?)(?:\/.+)