I have address formats which are like
123, abc street, Dallas, Tx 75701, 500-998-7898, Directions X Y Z west of
I want to delete anything after the ZIP CODE that matches US format in either 5 digit format or 5 digit + 4 digit code format as well as Canadian postal code format like Y89 567 or Y89567.
So my final answer should look like.
123, abc street, Dallas, Tx 75701
Appreciate any help.
Here is one way to do this. Note that anything that "looks like" a U.S. or Canadian zip code will be assumed to be one. Also, you will miss things like Beverly Hills, Calif. 90210
because "Calif." is not "seen" as a state code. I am also a bit careless; if there is anything after the zip code, it shouldn't be a letter or a digit. I didn't write the "test" for that; left as exercise, if needed.
I assume that if no zip code is found, the entire input string must be returned.
with
inputs ( addr ) as (
select '123, abc street, Dallas, Tx 75701, 500-998-7898' from dual union all
select '336 Main St, New City, NZ 39023-8882, john.d@email' from dual union all
select '837B Hilltop, Canadian City, ON Z34802 4028048 kilo' from dual union all
select '12345 Circle Drive, Lakeview, MN' from dual
)
select addr,
regexp_substr( addr,
'(.*?([[:alpha:]]{2} \d{5}(-\d{4})?|[[:alpha:]]\d{2} ?\d{3}|$))',
1, 1, null, 1) as clean_addr
from inputs
;
ADDR CLEAN_ADDR
--------------------------------------------------- --------------------------------------
123, abc street, Dallas, Tx 75701, 500-998-7898 123, abc street, Dallas, Tx 75701
336 Main St, New City, NZ 39023-8882, john.d@email 336 Main St, New City, NZ 39023-8882
837B Hilltop, Canadian City, ON Z34802 4028048 kilo 837B Hilltop, Canadian City, ON Z34802
12345 Circle Drive, Lakeview, MN 12345 Circle Drive, Lakeview, MN