Search code examples
sqliteintersect

Intersect keeping the count from both queries for ordering purpose


I have two tables that I need to run query against each. The output of these two queries then need to be intersected and sorted based on the the number of occurrences of names found in both tables.

For example:

Query 1:

select * from userA
group by name
order by count(name) desc

---------------------------
ID  |  name  | count(name)
---------------------------
 1  |  John  |  2
 2  |  Mike  |  1
 3  |  Laura |  1
---------------------------

Query 2:

select * from userB
group by name
order by count(name) desc

---------------------------
ID  |  name  | count(name)
---------------------------
 3  |  Laura |  3
 1  |  John  |  1
 5  |  Peter |  1
---------------------------

Query 3:

select * from userA
group by name
order by count(name) desc
intersect
select * from userB
group by name
order by count(name) desc

---------------------------
ID  |  name   | count(name)
---------------------------
 1  |  John   |  1
 3  |  Laura  |  1
---------------------------

The problem is the intersect will re-run the count function discarding the count run from each query. What I'd like to see is the following output:

---------------------------
ID  |  name   | count(name)
---------------------------
 3  |  Laura  |  4
 1  |  John   |  3
---------------------------

Does anyone know how to accomplish this?


Solution

  • Assuming your dataset is something like this:

    userA

    id          name      
    ----------  ----------
    1           John      
    1           John      
    2           Mike      
    3           Laura
    

    userB

    id          name      
    ----------  ----------
    1           John      
    3           Laura     
    3           Laura     
    3           Laura     
    5           Peter
    

    You can write a query like so to get the desired result

    select id, name, count(*)
    from (
        select id, name, 'A' as source from userA
        union all
        select id, name, 'B' from userB
    ) t
    group by id, name
    having count(distinct source) = 2;
    

    Result

    id          name        count(*)  
    ----------  ----------  ----------
    1           John        3         
    3           Laura       4         
    

    Explanation

    Combine the dataset since you want to know the count of John and Laura from both tables, combined. Union All will allow keeping duplicates from both tables. When combining, remember the sources.

    select id, name, 'A' as source from userA
    union all
    select id, name, 'B' from userB
    

    The query above will combine data from both and give you this result:

    id          name        source    
    ----------  ----------  ----------
    1           John        A         
    1           John        A         
    2           Mike        A         
    3           Laura       A         
    1           John        B         
    3           Laura       B         
    3           Laura       B         
    3           Laura       B         
    5           Peter       B
    

    Now, let's extract records that come from both tables only. So, we group by id and name and use the having clause to extract count(distinct source) = 2. That means, give me records that have 2 sources. Laura and John happen to been in both tables.

    When selecting data, we ask for count(*) to get the number of records for id+name combination.