Platform: PostgreSQL
Tables:
shortlist: name (text), city (text)...
data1: name (text), ranking (integer), score1 (double)...
data2: name (text), ranking (integer), score1 (double)...
data3: name (text), ranking (integer), score1 (double)...
data4: name (text), ranking (integer), score1 (double)...
There is a limited number of data tables of similar format.
I would like to join each row in shortlist
with the row in each data
table that has the most similar name determined by similarity(shortlist.name, data#.name)
.
Pseudo code of the same idea:
for each s_row in shortlist:
select shortlist.*
join (SELECT data1.*, similarity(s_row.name, data1.name) AS sim FROM data1 ORDER BY sim DESC LIMIT 1)
join (SELECT data2.*, similarity(s_row.name, data2.name) AS sim FROM data2 ORDER BY sim DESC LIMIT 1)
join (SELECT data3.*, similarity(s_row.name, data3.name) AS sim FROM data3 ORDER BY sim DESC LIMIT 1)
join (SELECT data4.*, similarity(s_row.name, data4.name) AS sim FROM data4 ORDER BY sim DESC LIMIT 1)
Is there a way to do this in SQL?
I am not entirely sure what you are after but something like this:
select s.name,
d1.name as d1_name,
d2.name as d2_name
from shortlist s
left join lateral (
SELECT data1.*, similarity(s.name, data1.name) AS sim
FROM data1
ORDER BY sim
DESC LIMIT 1
) d1 on true
left join lateral (
SELECT data2.*, similarity(s.name, data2.name) AS sim
FROM data2
ORDER BY sim DESC
LIMIT 1
) d2 on true
You want an outer join (left join
) for each table because otherwise you would not see anything if there is no match in at least one of the tables.