I need to capture 3 groups separately. For ex:
s3://some/path/TIMESTAMP/part-1234.parquet
|----- g1 ---------|------ g2 ------|--------- g3 ---------|
where g3 is the file name, g2 is the TIMESTAMP and g1 is anything that precedes the timestamp
I've come up with capturing 2 groups using scala:
val pattern = "(.*?)(part.*.parquet)$".r
val pattern(fileBasePath, filename) = row.file_path
what I'm looking for is something like this:
val pattern(fileBasePath, timestamp, filename) = row.file_path
What would the pattern look like for the above?
You can use
val pattern = """^(.*?)/([^/]+)/(part.*\.parquet)$""".r
See the regex demo.
Details
^
- start of string(.*?)
- Group 1: any zero or more chars other than line break chars, as few as possible/
- a /
char([^/]+)
- Group 2: any one or more chars other than /
/
- a /
char(part.*\.parquet)
- Group 3: part
, any zero or more chars other than line break chars, as many as possible and then a .parquet
substring$
- end of string.