I have a table with people and their hometown names, but there are same cities with different written, see:
Name | Hometown |
---|---|
João | São Paulo |
Maria | Sao Paul |
Pedro | São Paulo. |
Maria | S. Paulo |
And I need to process this in order to formalize that data to be like this:
Name | Hometown |
---|---|
João | São Paulo |
Maria | São Paulo |
Pedro | São Paulo |
Maria | São Paulo |
I tried this stack and would it be exactly what I need but does not work with my entire dataset.
Consider below approach (considering you have lookup table with all proper cities names) for purpose of example - I have it as CTE with just few ones
with cities as (
select 'São Paulo' as city union all
select 'Los Angeles' union all
select 'Dnipro' union all
select 'Kyiv'
)
select Name, City as Hometown
from your_table
left join cities
on soundex(Hometown) = soundex(city)
if applied to sample data in your question - output is
Note: you obviously need to take care of potential duplication in case if some cities sounds similar, in this case adding country constraints might help ...