I am looking for a regex statement that I have already racked my brains over. I want to give the following inputs:
Input
T_0-p(T_1-p(T_2,K_0),CW_0)
T_0, K_0 and CW_0 are elements. These elements always consists of word characters followed by _ and an integer. These elements are seperated to each other by an - or are inside the p() operator. Inside the p() operator the elements are divided by a comma. The occurrence of another p() or - inside a p() operator is also possible.
What i want is to have 2 regex statements to capture these elements. One for capturing the elements which are outside of any brackets. Currently i am using this one:
Regex
(?<![,\(])(?<s>\w+_\d)(?![,)])
This gives me :
Capture
T_0
This is working fine for me.
The other one is the one i am struggling with. This should capture whats inside the outermost p() operator and also what is seperated by commas.
So I could work with an output like this:
Capture
Capture 1 : T_1-p(T_2,K_0)
Capture 2 : CW_0
What i tried to do was this:
Regex
p\((?<p1>.+?),(?<p2>.+?)\)
But this obviously does not work, if you have another p() operator inside a p() operator. In order to do so, it need to be modified. It has to check that the capture has as many open bracket as enclosed ones. Is there are way to do that with a regular expression? Can anyone help me with that? Or do you have another idea on how to accomplish that?
Sorry if there is an obvious way on doing that, i am new to regex.
I want to implement this with Julia. Julia has Perl-compatible regular expressions, as provided by the PCRE library.
You need these regexps:
(\((?:[^()]++|(?1))*\))(*SKIP)(*F)|\w+_\d+
(?:\G(?!^),|p\()(\w+_\d+(?:-p(\((?:[^()]++|(?2))*\)))?)
See the regex demo #1 and regex demo #2.
Regex #1 details
(\((?:[^()]++|(?1))*\))(*SKIP)(*F)
- a string between (potentially nested balanced) parentheses that is matched and the match is failed|
- or\w+_\d+
- one or more letters, digits or underscores, _
, and then one or more digits.You may add (
and )
around the \w+_\d+
pattern if you need a group by all means. Note it will be Group 2.
Regex #2 details:
(?:\G(?!^),|p\()
- either the end of the previous match and a ,
char, or p(
(\w+_\d+(?:-p(\((?:[^()]++|(?2))*\)))?)
- Group 1:
\w+_\d+
- one or more letters, digits or underscores, _
, and then one or more digits(?:-p(\((?:[^()]++|(?2))*\)))?
- an optional occurrence of
-p
- a -p
string(\((?:[^()]++|(?2))*\))
- Group 2 (it must be here defined because we need to recurse the pattern): (
, then zero or more occurrences of one or more chars other than (
and )
or the Group 2 pattern recursed and then a )
char.