I have a custom transport format that packages data up in the following format
[a:000,"name","field","field","field"]
I'm trying to split the individual lines out to get the first character after the left bracket and all the CSV values. a, 000, "name", "field", "field" etc...
I cobbled together
[^?,:\[\]]
This splits all the individual characters out not the colon/comma delimited fields. I understand this won't accommodate commas within quotes.So it's clearly rubbish!
Embedded commas isn't really a huge issue as we're in control of the data at both ends so I could just escape them.
Thanks for any insight!
Instead of trying to split on multiple characters and ignore some of them, try to match whatever you want to match. Since you didn't specify the implementation language I am posting this for Perl but you could apply it to any flavor which supports lookbehind and lookaheads.
while ($subject =~ m/(\w+(?=:)|(?<=:)\d+|(?<=,")[^"]*?(?="))/g) {
# matched text = $&
}
Explanation:
# (\w+(?=:)|(?<=:)\d+|(?<=,")[^"]*?(?="))
#
# Match the regular expression below and capture its match into backreference number 1 «(\w+(?=:)|(?<=:)\d+|(?<=,")[^"]*?(?="))»
# Match either the regular expression below (attempting the next alternative only if this one fails) «\w+(?=:)»
# Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=:)»
# Match the character “:” literally «:»
# Or match regular expression number 2 below (attempting the next alternative only if this one fails) «(?<=:)\d+»
# Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=:)»
# Match the character “:” literally «:»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Or match regular expression number 3 below (the entire group fails if this one fails to match) «(?<=,")[^"]*?(?=")»
# Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=,")»
# Match the characters “,"” literally «,"»
# Match any character that is NOT a “"” «[^"]*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")»
# Match the character “"” literally «"»