Looking for a quick and dirty way to parse Australian street addresses into its parts:
3A/45 Jindabyne Rd, Oakleigh, VIC 3166
should split into:
"3A"
, 45
, "Jindabyne Rd"
"Oakleigh"
, "VIC"
, 3166
Suburb names can have multiple words, as can street names.
See: Parse A Steet Address into components
Has to be in Java, cannot make http requests (e.g. to web APIs).
EDIT: Assume that format specified is always followed. I have no issue with spitting incorrectly formatted strings back at the user with a message telling them to follow the format (which I've described above).
Given your reply to my other answer, this should do for the strictly-formatted case you specify:
String sample = "3A/45 Jindabyne Rd, Oakleigh, VIC 3166";
Pattern pattern = Pattern.compile("(([^/ ]+)/)?([^ ]+) ([^,]+), ([^,]+), ([^ ]+) (\\d+)");
Matcher m = pattern.matcher(sample);
if (m.find()) {
System.out.println("Unit: " + m.group(2));
System.out.println("Number: " + m.group(3));
System.out.println("Street: " + m.group(4));
System.out.println("Suburb: " + m.group(5));
System.out.println("State: " + m.group(6));
System.out.println("Postcode: " + m.group(7));
} else {
throw new IllegalArgumentException("WTF");
}
This works if you remove the '3A/' (in which case m.group(2)
will be null), if the street number is '45A' or '45-47', if we add a space to the road ('Jindabyne East Rd') or to the suburb ('Oakleigh South').
Just to explain that regex further, if you're not familiar with regular expressions:
(([^/ ]+)/)?
is the equivalent of just ([^/ ]+/)?
-- that is, 'anything not including a forward slash or a space, followed by a slash'. The question mark makes it optional (so the whole clause can be missing), and the extra parentheses in the final version are to create a smaller inner group, without the slash, for later extraction.
([^ ]+)
is 'capture anything that's not a space (which is followed by a space)' -- this is the street number.
([^,]+),
is 'capture anything that's not a comma (which is followed by comma and space)' -- this is the street name. Anything is valid in the street name as long as it's not a comma.
([^,]+),
is the same again, in this case to capture the suburb.
([^ ]+)
captures the next non-space string (state abbrevation) and skips the space after it.
(\\d+)
rounds off by capturing any number of digits (the postcode)
Hope that's helpful.