Java: Parse Australian Street Addresses

Looking for a quick and dirty way to parse Australian street addresses into its parts:
3A/45 Jindabyne Rd, Oakleigh, VIC 3166

should split into:
"3A", 45, "Jindabyne Rd" "Oakleigh", "VIC", 3166

Suburb names can have multiple words, as can street names.

See: Parse A Steet Address into components

Has to be in Java, cannot make http requests (e.g. to web APIs).

EDIT: Assume that format specified is always followed. I have no issue with spitting incorrectly formatted strings back at the user with a message telling them to follow the format (which I've described above).

Solution

Given your reply to my other answer, this should do for the strictly-formatted case you specify:

    String sample = "3A/45 Jindabyne Rd, Oakleigh, VIC 3166";
    Pattern pattern = Pattern.compile("(([^/ ]+)/)?([^ ]+) ([^,]+), ([^,]+), ([^ ]+) (\\d+)");
    Matcher m = pattern.matcher(sample);
    if (m.find()) {
        System.out.println("Unit: " + m.group(2));
        System.out.println("Number: " + m.group(3));
        System.out.println("Street: " + m.group(4));
        System.out.println("Suburb: " + m.group(5));
        System.out.println("State: " + m.group(6));
        System.out.println("Postcode: " + m.group(7));
    } else {
        throw new IllegalArgumentException("WTF");
    }

This works if you remove the '3A/' (in which case m.group(2) will be null), if the street number is '45A' or '45-47', if we add a space to the road ('Jindabyne East Rd') or to the suburb ('Oakleigh South').

Just to explain that regex further, if you're not familiar with regular expressions:

(([^/ ]+)/)? is the equivalent of just ([^/ ]+/)? -- that is, 'anything not including a forward slash or a space, followed by a slash'. The question mark makes it optional (so the whole clause can be missing), and the extra parentheses in the final version are to create a smaller inner group, without the slash, for later extraction.

([^ ]+) is 'capture anything that's not a space (which is followed by a space)' -- this is the street number.

([^,]+), is 'capture anything that's not a comma (which is followed by comma and space)' -- this is the street name. Anything is valid in the street name as long as it's not a comma.

([^,]+), is the same again, in this case to capture the suburb.

([^ ]+) captures the next non-space string (state abbrevation) and skips the space after it.

(\\d+) rounds off by capturing any number of digits (the postcode)

Hope that's helpful.