For Simple Java Mail I'm trying to deal with a somewhat free-format of delimited email addresses. Note that I'm specifically not validating, just getting the addresses out of a list of addresses. For this use case the addresses can be assumed to be valid.
Here is an example of a valid input:
",Sixpack, Joe 1 <>, Sixpack, Joe 2 <> ;Sixpack, Joe, 3<> ,,;;"
So there are two basic forms "" and "Joe Sixpack ", which can appear in a comma / semicolon delimited string, ignoring white space padding. The problem is that the names can contains delimiters as valid characters.
The following array shows the data needed (trailing spaces or delimiters would not be a big problem):
"Sixpack, Joe 1 <>",
"Sixpack, Joe 2 <>",
"Sixpack, Joe, 3<>",
I can't think of a clean way to deal with this. Any suggestion how I can reliably recognize whether a comma is part of a name or is a delimiter?
Final solution (variation on the accepted answer):
var string = ",Sixpack, Joe 1 <>, Sixpack, Joe 2 <> ;Sixpack, Joe, 3<> ,,;;"
// recognize value tails and replace the delimiters there, disambiguating delimiters
const result = string
.replace(/(@.*?>?)\s*[,;]/g, "$1<|>")
.replace(/<\|>$/,"") // remove trailing delimiter
.split(/\s*<\|>\s*/) // split on delimiter including surround space
Or in Java:
public static String[] extractEmailAddresses(String emailAddressList) {
return emailAddressList
.replaceAll("(@.*?>?)\\s*[,;]", "$1<|>")
.replaceAll("<\\|>$", "")
Using Java's replaceAll and split functions (mimicked in javascript below), I would say lock onto what you know ends an item (the ".com"), replace separator characters with a unique temp (a uuid or something like <|>
), and then split using your refactored delimiter.
Here is a javascript example, but Java's repalceAll and split can do the same job.
var string = ",Joe Sixpack <>, Sixpack, Joe <> ;Sixpack, Joe<> ,,;;"
const result = string.replace(/(\.com>?)[\s,;]+/g, "$1<|>").replace(/<\|>$/,"").split("<|>")