Search code examples
regeximpala

Regex for extracting part of a file path


I am using regex function in Impala to find the folder name in file path but it doesn't seem to give me correct result

I want to parse out "one" from this file path:

/this/one/path/to/hdfs

This is the regex which I used:

regexp_extract(filepath,'[/]+',0)

Solution

  • If here we wish to capture the /, then we might just want to try ([\/]+). There should be other expressions to extract one also, such as:

    (?:\/[a-z]+\/)(.+?)(?:\/.+)
    

    and our code might look like:

    regexp_extract(filepath, '(?:\/[a-z]+\/)(.+?)(?:\/.+)', 2)
    

    or

    regexp_extract(filepath, '(?:\/.+?\/)(.+?)(?:\/.+)', 2)
    

    Compartments

    In this case, we are not capturing what is behind one using a non-capturing group:

    (?:\/[a-z]+\/)
    

    then we capture one using:

    (.+?)
    

    and finally we add a right boundary after one in another non-capturing group:

    (?:\/.+)
    

    RegEx Circuit

    jex.im visualizes regular expressions:

    enter image description here

    DEMO

    Depending on which slash, one might be located, we can modify our expression. For example, in this case, this expression also might be working:

    (?:\/.+?\/)(.+?)(?:\/.+)
    

    DEMO