I have the following postgres SQL table with three columns;
id linktype to_id
322754 4 1
322754 4 1
322754 4 2
322798 4 2
322798 4 2
322797 4 3
322791 4 6
322790 4 3
What I am trying to do is filter for to_id
values where there are more than one corresponding id
value that is different.
So from the above I would simply want to_id
value 3 returned, since it is the only to_id
to have more than one DISTINCT id corresponding to it.
Use a having
clause
demo at db<>fiddle
select to_id
from test
group by to_id
having count(distinct id)>1;
to_id |
---|
2 |
3 |
To make that faster, use exists()
select distinct to_id
from test a
where exists(
select from test b
where a.id<>b.id
and a.to_id=b.to_id);
Or a self-join:
select distinct a.to_id
from test a
join test b
on a.id<>b.id
and a.to_id=b.to_id;
There are other methods using except
/intersect
, an in/not in
, but exists()
is usually quicker.
Don't forget about the index:
create index on test (to_id,id);
It speeds things up for all 3 methods above, significantly. Also note that the optimal method and index depends on the characteristics of your data set: on 100k records where only 807 to_id
's match your criteria, the last two examples win: demo1. If out of that 100k records I make 28k match that, the first example becomes faster: demo2.
Make sure to run your own benchmarks and adjust your approach.