I'm returning a unique list of id
's from the users
table, where
specific columns in a related table (positions
) contain a matching string.
The related table may have multiple records for each user record.
The query is taking a really really long time (its not scaleable), so I'm wondering if I'm structuring the query wrong in some fundamental way?
Users Table:
id | name
-----------
1 | frank
2 | kim
3 | jane
Positions Table:
id | user_id | title | company | description
--------------------------------------------------
1 | 1 | manager | apple | 'Managed a team of...'
2 | 1 | assistant | apple | 'Assisted the...'
3 | 2 | developer | huawei | 'Build a feature that...'
For example: I want to return the user's id
if a related positions
record contains "apple" in either the title
, company
or description
columns.
Query:
select
distinct on (users.id) users.id,
users.name,
...
from users
where (
select
string_agg(distinct users.description, ', ') ||
string_agg(distinct users.title, ', ') ||
string_agg(distinct users.company, ', ')
from positions
where positions.users_id::int = users.id
group by positions.users_id::int) like '%apple%'
I like the idea of moving this into a join
clause. But what I'm looking to do is filter users conditional on below. And I'm not sure how to do both in a join
.
1) finding the keyword in title, company, description
or
2) finding the keyword with full-text search in an associated string version of a document in another table.
select
to_tsvector(string_agg(distinct documents.content, ', '))
from documents
where users.id = documents.user_id
group by documents.user_id) @@ to_tsquery('apple')
So I was originally thinking it might look like,
select
distinct on (users.id) users.id,
users.name,
...
from users
where (
(select
string_agg(distinct users.description, ', ') ||
string_agg(distinct users.title, ', ') ||
string_agg(distinct users.company, ', ')
from positions
where positions.users_id::int = users.id
group by positions.users_id::int) like '%apple%')
or
(select
to_tsvector(string_agg(distinct documents.content, ', '))
from documents
where users.id = documents.user_id
group by documents.user_id) @@ to_tsquery('apple'))
But then it was really slow - I can confirm the slowness is from the first condition, not the full-text search.
Might not be the best solution, but a quick option is:
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
JOIN positions p ON (
p.user_id = u.id
AND ( description || title || company )
LIKE '%apple%'
);
Basically got rid of the subquery, unnecessary string_agg usage, grouping on position table etc.
What it does is doing conditional join and removing duplicate is covered by distinct on
.
PS! I used table aliases u
and p
to shorten the example
EDIT: adding also WHERE example as requested
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
JOIN positions p ON ( p.user_id = u.id )
WHERE ( p.description || p.title || p.company ) LIKE '%apple%'
OR ...your other conditions...;
EDIT2: new details revealed setting new requirements of the original question. So adding new example for updated ask:
Since you doing lookups to 2 different tables (positions and uploads) with OR condition then simple JOIN wouldn't work.
But both lookups are verification type lookups - only looking does %apple%
exists, then you do not need to aggregate and group by and convert the data.
Using EXISTS
that returns TRUE
for first match found is what you seem to need anyway. So removing all unnecessary part and using with LIMIT 1
to return positive value if first match found and NULL if not (latter will make EXISTS
to become FALSE
) will give you same result.
So here is how you could solve it:
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
WHERE EXISTS (
SELECT 1
FROM positions p
WHERE p.users_id = u.id::int
AND ( description || title || company ) LIKE '%apple%'
LIMIT 1
)
OR EXISTS (
SELECT 1
FROM uploads up
WHERE up.user_id = u.id::int -- you had here reference to table 'document', but it doesn't exists in your example query, so I just added relation to 'upoads' table as you have in FROM, assuming 'content' column exists there
AND up.content LIKE '%apple%'
LIMIT 1
);
NB! in your example queries have references to tables/aliases like documents
which doesn't reflect anywhere in the FROM
part. So either you have cut in your example real query with wrong naming or you have made other way typo is something you need to verify and adjust my example query accordingly.