Search code examples
regexscalaregex-group

Scala Regex capture separate groups


I need to capture 3 groups separately. For ex:

s3://some/path/TIMESTAMP/part-1234.parquet

|----- g1 ---------|------ g2 ------|--------- g3 ---------|

where g3 is the file name, g2 is the TIMESTAMP and g1 is anything that precedes the timestamp

I've come up with capturing 2 groups using scala:

val pattern = "(.*?)(part.*.parquet)$".r
val pattern(fileBasePath, filename) = row.file_path

what I'm looking for is something like this:

val pattern(fileBasePath, timestamp, filename) = row.file_path

What would the pattern look like for the above?


Solution

  • You can use

    val pattern = """^(.*?)/([^/]+)/(part.*\.parquet)$""".r
    

    See the regex demo.

    Details

    • ^ - start of string
    • (.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
    • / - a / char
    • ([^/]+) - Group 2: any one or more chars other than /
    • / - a / char
    • (part.*\.parquet) - Group 3: part, any zero or more chars other than line break chars, as many as possible and then a .parquet substring
    • $ - end of string.