I'm trying to match a comma separated list with prefixed values which contains also a comma.
I finally made it to match all occurrence which doesn't have a ,
.
Sample String (With NL for visualization - original string doesn't have NL):
field01=Value 1,
field02=Value 2,
field03=<xml value>,
field04=127.0.0.1,
field05=User-Agent: curl/7.28.0\r\nHost: example.org\r\nAccept: */*,
field06=Location, Resource,
field07={Item 1},{Item 2}
My actual RegEx looks like this not optimized piece ....
(?'fields'(field[0-9]{2,3})=?([\s\w\d_<>.:="*?\-\/\\(){}<>'#]+))([^,](?&fields))*
Any one has a clue how to solve this?
EDIT:
The first pattern is near to my expected result.
This is a anonymized full example of the string:
asm01=Predictable Resource Location,Information Leakage,asm02=N/A,asm04=Uncategorized,asm08=2021-02-15 09:18:16,asm09=127.0.0.1,asm10=443,asm11=N/A,asm15=,asm16=DE,asm17=User-Agent: curl/7.29.0\r\nHost: dev.example.com\r\nAccept: */*\r\nX-Forwarded-For: 127.0.0.1\r\n\r\n,asm18=/Common/_www.example.com_live_v1,asm20=127.0.0.1,asm22=,asm27=HEAD,asm34=/Common/_www.example.com_live_v1,asm35=HTTPS,asm39=blocked,asm41=0,asm42=3,asm43=0,asm44=Error,asm46=200000028,200100015,asm47=Unix hidden (dot-file) access,.htaccess access,asm48={Unix/Linux Signatures},{Apache/NCSA HTTP Server Signatures},asm50=40622,asm52=200000028,asm53=Unix hidden (dot-file) access,asm54={Unix/Linux Signatures},asm55=,asm61=,asm62=,asm63=8985143867830069446,asm64=example-waf.example.com,asm65=/.htaccess,asm67=Attack signature detected,asm68=<?xml version='1.0' encoding='UTF-8'?><BAD_MSG><violation_masks><block>13020008202d8a-f803000000000000</block><alarm>417020008202f8a-f803000000000000</alarm><learn>13000008202f8a-f800000000000000</learn><staging>200000-0</staging></violation_masks><request-violations><violation><viol_index>42</viol_index><viol_name>VIOL_ATTACK_SIGNATURE</viol_name><context>request</context><sig_data><sig_id>200000028</sig_id><blocking_mask>7</blocking_mask><kw_data><buffer>Ly5odGFjY2Vzcw==</buffer><offset>0</offset><length>2</length></kw_data></sig_data><sig_data><sig_id>200000028</sig_id><blocking_mask>4</blocking_mask><kw_data><buffer>Ly5odGFjY2Vzcw==</buffer><offset>0</offset><length>3</length></kw_data></sig_data><sig_data><sig_id>200100015</sig_id><blocking_mask>7</blocking_mask><kw_data><buffer>Ly5odGFjY2Vzcw==</buffer><offset>1</offset><length>9</length></kw_data></sig_data></violation></request-violations></BAD_MSG>,asm69=5,asm71=/Common/_dev.example.com_SSL,asm75=127.0.0.1,asm100=,asm101=HEAD /.htaccess HTTP/1.1\r\nUser-Agent: curl/7.29.0\r\nHost: dev.example.com\r\nAccept: */*\r\nX-Forwarded-For: 127.0.0.1\r\n\r\n#015
The pattern does not work as the fields
group matches the string field
You are trying to repeat the named group fields
but the example strings do not have the string field
.
Note that [^,]
matches any char except a comma, you can omit the capture group inside the named group field
as it already is a group and \w
also matches \d
With 2 capture groups:
\b(asm[0-9]+)=(.*?)(?=,asm[0-9]+=|$)
\b
A word boundary(asm[0-9]+)
Capture group 1, match asm and 1+ digits=
Match literally(.*?)
Capture group 2, match any char as least as possible(?=
Positive lookahead, assert what is at the right is
,asm[0-9]+=
Match ,asm
followed by 1+ digits and =
|
Or$
Assert the end of the string)
Close lookahead