I have two tables that I need to run query against each. The output of these two queries then need to be intersected and sorted based on the the number of occurrences of names found in both tables.
For example:
Query 1:
select * from userA
group by name
order by count(name) desc
---------------------------
ID | name | count(name)
---------------------------
1 | John | 2
2 | Mike | 1
3 | Laura | 1
---------------------------
Query 2:
select * from userB
group by name
order by count(name) desc
---------------------------
ID | name | count(name)
---------------------------
3 | Laura | 3
1 | John | 1
5 | Peter | 1
---------------------------
Query 3:
select * from userA
group by name
order by count(name) desc
intersect
select * from userB
group by name
order by count(name) desc
---------------------------
ID | name | count(name)
---------------------------
1 | John | 1
3 | Laura | 1
---------------------------
The problem is the intersect will re-run the count function discarding the count run from each query. What I'd like to see is the following output:
---------------------------
ID | name | count(name)
---------------------------
3 | Laura | 4
1 | John | 3
---------------------------
Does anyone know how to accomplish this?
Assuming your dataset is something like this:
userA
id name
---------- ----------
1 John
1 John
2 Mike
3 Laura
userB
id name
---------- ----------
1 John
3 Laura
3 Laura
3 Laura
5 Peter
You can write a query like so to get the desired result
select id, name, count(*)
from (
select id, name, 'A' as source from userA
union all
select id, name, 'B' from userB
) t
group by id, name
having count(distinct source) = 2;
Result
id name count(*)
---------- ---------- ----------
1 John 3
3 Laura 4
Explanation
Combine the dataset since you want to know the count of John and Laura from both tables, combined. Union All
will allow keeping duplicates from both tables. When combining, remember the sources.
select id, name, 'A' as source from userA
union all
select id, name, 'B' from userB
The query above will combine data from both and give you this result:
id name source
---------- ---------- ----------
1 John A
1 John A
2 Mike A
3 Laura A
1 John B
3 Laura B
3 Laura B
3 Laura B
5 Peter B
Now, let's extract records that come from both tables only. So, we group by id and name and use the having
clause to extract count(distinct source) = 2
. That means, give me records that have 2 sources. Laura and John happen to been in both tables.
When selecting data, we ask for count(*)
to get the number of records for id+name combination.