I really appreciate all of you for your help.
We are in the process of standardizing the addresses of our customers.
I have a main customer table which contains the customers data, the address field to be standardized.
I have a mapping table that contains mapping for from_str to to_str values.
I need to get the to_str that is correspond to the from_str into the address data.
If the length of the address is more than 35 characters after standardizing, then from previous space(' ') from right side to end of the string, is created as a seperate field address2
Could you please help with sql or pl/sql? Using Oracle 12c database.
The code I wrote below is not working for all from_str values...only working for first 2 rows.
Any help is appreciated.
code so far:
with addresses as
(
select cust_id,address addr from
(
select 10 cust_id,'9 Help Street, Level 4' address from dual union all
select 11 cust_id,'22 Victoria Street' address from dual union all
select 12 cust_id,'1495 Franklin Str' address from dual union all
select 13 cust_id,'30 Hasivim St.,Petah-Tikva' address from dual union all
select 14 cust_id,'2 Jakaranda St' address from dual union all
select 15 cust_id,'61, Science Park Rd' address from dual union all
select 16 cust_id,'61, Social park road' address from dual union all
select 17 cust_id,'Av. Hermanos Escobar 5756' address from dual union all
select 18 cust_id,'Ave. Hermanos Escobar 5756' address from dual union all
select 19 cust_id,'8000 W FLORISSANT AVE' address from dual union all
select 20 cust_id,'8600 MEMORIAL PKWY SW' address from dual union all
select 21 cust_id,'8200 FLORISSANTMEMORIALWAYABOVE SW' address from dual union all
select 22 cust_id,'8600 MEMORIALFLORISSANT PKWY SW' address from dual
) t1
),
replacements as
(
select id,to_str,from_string from_str from
(
select 1 id,'St' to_str,'Street' from_string from dual union all
select 2 id,'St' to_str,'St.' from_string from dual union all
select 3 id,'St' to_str,'Str' from_string from dual union all
select 4 id,'St' to_str,'St' from_string from dual union all
select 5 id,'Rd' to_str,'Rd.' from_string from dual union all
select 6 id,'Rd' to_str,'road' from_string from dual union all
select 7 id,'Av' to_str,'Av.' from_string from dual union all
select 8 id,'Av' to_str,'Ave.' from_string from dual union all
select 9 id,'Av' to_str,'Avenue' from_string from dual union all
select 10 id,'Av' to_str,'Aven.' from_string from dual union all
select 11 id,'West' to_str,'W' from_string from dual union all
select 12 id,'South West' to_str,'SW.' from_string from dual
) t2
),
r(addr,test_addr,l) as
(
select addr,regexp_replace(addr,'(^|\W)' || from_str || '(\W|$)','\1' || to_str || '\2') test_addr,
id - 1
from
addresses,
replacements
where id = (select count(*) from replacements)
union all
select addr,regexp_replace(addr,'(^|\W)' || from_str || '(\W|$)','\1' || to_str || '\2') test_addr,
l - 1
from r,
replacements
where id = l
)
select addr,test_addr,l
from r
where l=0
;
Expected Output:
cust_id address
10 9 Help St, Level 4
11 22 Victoria St
12 1495 Franklin St
13 30 Hasivim St ,Petah-Tikva
14 2 Jakaranda St
15 61, Science Park Rd
16 61, Social park Rd
17 Av Hermanos Escobar 5756
18 Av Hermanos Escobar 5756
19 8000 West FLORISSANT Ave
20 8600 MEMORIAL PKWY South West
if length of the address is more than 35 characters then the expected output is:
cust_id address address2
21 8200 FLORISSANTMEMORIALWAYABOVE South West
22 8600 MEMORIALFLORISSANT PKWY South West
That's an... interesting way to apply multiple replace operations. So - you have 2 issues, as you mentioned. For the first one, the recursive part of your CTE is doing regexp_replace()
on addr
instead of test_addr
(the modified output of the previous recursive step). So only the last rule in the list is ever applied.
r(addr,test_addr, l) as
(
select addr,regexp_replace(addr,'(^|\W)' || from_str || '(\W|$)','\1' || to_str || '\2') test_addr,
id - 1
from
addresses,
replacements
where id = (select count(*) from replacements)
union all
-- if you do regexp_replace on addr, it throws out the previous replace (which is in r.test_addr)
select addr,regexp_replace(test_addr,'(^|\W)' || from_str || '(\W|$)','\1' || to_str || '\2') test_addr,
l - 1
from r,
replacements
where id = l
)
For the "over-35-characters" issue, I would suggest using substr/instr - while awkward to read, they're usually fast.
select addr,test_addr,l,
case when length(test_addr) > 35 then
substr(test_addr, 1, instr(substr(test_addr,1,35), ' ', -1))
else test_addr
end as addr1,
case when length(test_addr) > 35 then
substr(test_addr, instr(substr(test_addr,1,35), ' ', -1))
else null
end as addr2
from r
where l=0
;
There's probably a more graceful way to do that part, it's just the first thing that came to mind.