I have a table with records in one column which are just different by how they are written. So how can I find those and save the corresponding id's in a new table?
e.g. I have the following records in a column for cities.
Id name
1 berlin
2 ber lin
3 ber-lin
4 Berlin
5 Hamburg
6 New York
7 NewYork
So my first assumption would be to remove any special characters including white spaces, then lowercase. And see who matches and then write the id to a new table?
What would be the best and most reliable way to find machtes?
If removing some characters (' ' and '-' in the example) and lower-casing is enough to identify duplicates:
CREATE TABLE tbl_folded AS
SELECT lower(translate(name, ' -', '')) AS base_name
, array_agg(id) AS ids
FROM tbl
GROUP BY 1;
translate()
is particularly useful to replace (or remove) a list of single characters.
Use CREATE TABLE AS
to create a new table from the results of a query.
Related: