I need help extracting/splitting these parameters within the string using regex into parameter and value groups.
Input -> collection_a=['U1', 'U2'], collection_b=['U1', 'U2']
output -> Group Parameter = collection_a
Group Value = ['U1', 'U2']
Group Parameter = collection_b
Group Value = ['U1', 'U2']
Input -> collection=['U1', 'U2'], callback_macro=utils.user_email(user_id=$$)
output -> Group Parameter = collection
Group Value = ['U1', 'U2']
Group Parameter = callback_macro
Group Value = utils.user_email(user_id=$$)
Input -> collection=['U1', 'U2'], callback_macro=utils.user_email(user_id=$$, config={'user': 'ADMIN'})
output -> Group Parameter = collection
Group Value = ['U1', 'U2']
Group Parameter = callback_macro
Group Value = utils.user_email(user_id=$$)
Input -> collection=['U1','U2'], callback_macro=string.replace(value=$$, pattern=^(.*)$, replacement={'user': $1})
output -> Group Parameter = collection
Group Value = ['U1', 'U2']
Group Parameter = callback_macro
Group Value = string.replace(value=$$, pattern=^(.*)$, replacement={'user': $1})
I'm using this Regex /((\s*(?<parameter>[a-z_]+)\s*=\s*(?<value>((?!(,\s*[a-z_]+)\s*=\s*).)*)),{1,})/g
and it works perfectly in case 1 and case 2 but breaks in case 3 & 4 as in case 3 & 4 it contains =
within the argument's value.
Regex link - https://regex101.com/r/U2CaLb/1
It would be very difficult to do a split with these requirements.
You could just match key/value and put them into an array.
Note that the third input sample does not follow the other ones pattern.
Here are some options to choose from.
Method 1 This regex uses a single level braces matching as part of it.
(\w+)=((?:\[.*?\]|\(.*?\)|{.*?}|[^,\r\n])*)
https://regex101.com/r/fnPb6e/1
( \w+ ) # (1)
=
( # (2 start)
(?:
\[ .*? \]
| \( .*? \)
| { .*? }
|
[^,\r\n]
)*
) # (2 end)
For Balanced braces text, PCRE or Not-Net engines :
Method 2 This regex is the simple version as nesting of ()
,[]
,{}
where the balanced end is independently found to completion.
(\w+)=((?:(\[(?:[^\[\]]++|(?3))*\])|(\((?:[^()]++|(?4))*\))|({(?:[^{}]++|(?5))*})|[^,\r\n])*)
https://regex101.com/r/dgWMDZ/1
( \w+ ) # (1)
=
( # (2 start)
(?:
( # (3 start)
\[
(?:
[^\[\]]++
| (?3)
)*
\]
) # (3 end)
| ( # (4 start)
\(
(?:
[^()]++
| (?4)
)*
\)
) # (4 end)
| ( # (5 start)
{
(?:
[^{}]++
| (?5)
)*
}
) # (5 end)
| [^,\r\n]
)*
) # (2 end)
Method 3 This regex will balance the different braces nested inside
each other if applicable. This will allow for more internal brace structure
items and is probably not needed for this scenario, but could be in the future.
This is the enhanced Method 2 version and encompasses that functionality.
(\w+)=((?:(\[(?:[^\[\](){}]++|(?3))*\]|\((?:[^\[\](){}]++|(?3))*\)|{(?:[^\[\](){}]++|(?3))*})|[^,\r\n])*)
https://regex101.com/r/ZubSke/1
( \w+ ) # (1)
=
( # (2 start)
(?:
( # (3 start)
\[
(?:
[^\[\](){}]++
| (?3)
)*
\]
|
\(
(?:
[^\[\](){}]++
| (?3)
)*
\)
|
{
(?:
[^\[\](){}]++
| (?3)
)*
}
) # (3 end)
|
[^,\r\n]
)*
) # (2 end)
Method 4 Same as method 3 and added handling of simple single or double quoted strings.
This regex will blend in quote parsing within any other delimiter pair
balance, as well as outside of these delimiters.
Note that there is a pass through, garbage collector [^,\r\n]
to get catch unbalanced
delimiters. This is by design as balanced text is really just a suggestion during the fleshing out process.
(\w+)=((?:(\[(?:[^\[\](){}'"]++|(?4)|(?3))*\]|\((?:[^\[\](){}'"]++|(?4)|(?3))*\)|{(?:[^\[\](){}'"]++|(?4)|(?3))*})|('[^'\r\n]*?'|"[^"\r\n]*?")|[^,\r\n])*)
https://regex101.com/r/090iI7/1
( \w+ ) # (1)
=
( # (2 start)
(?:
( # (3 start)
\[
(?:
[^\[\](){}'"]++
| (?4)
| (?3)
)*
\]
|
\(
(?:
[^\[\](){}'"]++
| (?4)
| (?3)
)*
\)
|
{
(?:
[^\[\](){}'"]++
| (?4)
| (?3)
)*
}
) # (3 end)
| ( # (4 start)
' [^'\r\n]*? '
|
" [^"\r\n]*? "
) # (4 end)
|
[^,\r\n]
)*
) # (2 end)