I am trying to parsing the addresses into groups and I have this regular expression:
(^.*?(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),(?:)? ?(.*?),? ?([A-Z]{2,3}),? ?(\d{,4})$
which is capturing and group these addresses:
139 McKinnon Road, PINELANDS, NT, 829
108 East Point Road, Fannie Bay, NT, 820
3-11 Hamilton Street, Townsville City, QLD, 4810
40 17 Geranium Street, THE GARDENS, NT, 820
Lot 9 Island Point Road, ST GEORGES BASIN, NSW, 2540
316 Sturt Street and 511 Flinders Street, Townsville City, QLD, 4810
but not capturing addresses with these format:
1, 3, 5 Demeter Street & 12 Hermes Avenue ROUSE HILL NSW 2155
31 Stephen Street SOUTH TOOWOOMBA QLD 4350
I would like to have these addresses into separate groups like:
street_address = 1, 3, 5 Demeter Street & 12 Hermes Avenue
subrub = ROUSE HILL
state = QLD
postcode = 4350
How to capture both the addresses using the above expression? Here is my Regex code
You can use specific regex to match each of your four groups separately using the following ones:
<street_address>
:.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)
<subrub>
:[A-Za-z ]+
<state>
:[A-Z]+
\d+
Your final regex is none other than the concatenation of these regex using an optional comma and a mandatory space ,?
.
(?P<street_address>.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),? (?P<subrub>[A-Za-z ]+),? (?P<state>[A-Z]+),? (?P<postcode>\d+)
Check the demo here.
Note: In your Python code, you'll be able to extract each group by its corresponding name.