I'm extracting a substring from a filename with format xxxxx_ID.extension
Example of strings that match correctly:
aaaa_bbbb_ID1.txt
xxxxxx_yy_ID2.xml
xxxx_ID3.zzz
I need the ID part. I tried with
def fileMatch = ("aaaa_bbbb_ID1.txt" =~ /(?<=_)([^_]+)(?=\.\w+$)/);
assert fileMatch.size() > 0
println fileMatch[0]
Where:
(?<=_)
to match the last underscore([^_]+)
matches the ID to be extracted (a string with no underscore inside)(?=\.\w+$)
to match the
extensionIt returns [ID1, ID1]
. Here I was expecting just one result, why does it match the ID twice?
I know I could extract the first match with fileMatch[0][0]
but I'm wondering if I'm doing anything wrong.
I also tried (?<=_)([^_]+)(?=\.[^.]+$)
with the same result.
When you find a regex match with =~
operator in Groovy, you can either obtain a whole match using fileMatch[0]
- if there are no capturing groups in the pattern, or a list with the whole match and "captured" substrings (if you specified capturing groups in the pattern).
If you remove the capturing group (i.e. if you remove the capturing parentheses, ([^_]+)
=> [^_]+
) use
/(?<=_)[^_]+(?=\.\w+$)/
You can obtain the whole match text with fileMatch[0]
.
With fileMatch.size()
, you check if there are those "captured" substrings with explicit capturing groups in your pattern. So, if there are capturing groups, you will be able to access them via fileMatch[0][0]
, fileMatch[0][1]
, etc.
Note that the number of "groups" is the number of capturing groups in the pattern + 1 (a group for the entire match value).