Search code examples
sqlpostgresqlcountinner-joinlateral-join

Merging similar SQL data for counts efficiently?


I am trying to understand the best way to merge a count from an INNER JOIN of two tables and grouping, but the best thing I can do is to query each table separately and then apply a union and further GROUP BY. This seems a bit convoluted, but with joins the counts get messed up, and I am not sure what other method I can use to query this efficiently.

Correct code, but probably inefficient:

SELECT
  x.id,
  sum(x.question_count) AS question_count,
  sum(x.card_count) AS card_count
FROM (
  SELECT
    c.id,
    count(*) AS question_count,
    0 AS card_count
  FROM
    concepts AS c
    INNER JOIN questions ON c.id = questions."conceptId"
  GROUP BY
    c.id
  UNION ALL
  SELECT
    c.id,
    0 AS question_count,
    count(*) AS card_count
  FROM
    concepts AS c
    INNER JOIN cards ON c.id = cards."conceptId"
  GROUP BY
    c.id) AS x
GROUP BY
  x.id
ORDER BY x.id;

output:

id q_count c_count
1 1 2
2 7 9
3 1 1

My hopeful join code, that gives incorrect counts:

SELECT
  x."conceptId",
  q_count,
  c_count
FROM (
  SELECT
    q."conceptId",
    count(*) AS q_count
  FROM
    questions AS q
  GROUP BY
    q."conceptId") AS x
  INNER JOIN (
    SELECT
      c."conceptId",
      count(*) AS c_count
    FROM
      questions AS c
    GROUP BY
      c."conceptId") AS y ON x."conceptId" = y."conceptId";

output:

id q_count c_count
1 1 1
2 7 7
3 1 1

Solution

  • You would typically aggregate the related table first, then join them to the base table:

    select co.id, qu.question_count, ca.card_count
    from concepts co
    inner join (
        select conceptId, count(*) question_count
        from questions
        group by conceptId
    ) qu on qu.conceptId = co.id
    inner join (
        select conceptId, count(*) card_count
        from cards 
        group by conceptId
    ) ca on ca.conceptId = co.id
    

    Depending on the availability of your data in the related tables, you might want left joins instead.

    Depending on your data as well (typically, if you have few concepts and many questions and cards), it might be more efficient to use correlated subqueries or lateral joins, like so:

    select co.id, qu.question_count, ca.card_count
    from concepts co
    cross join lateral (select count(*) question_count from questions qu where qu.conceptId = co.id) qu
    cross join lateral (select count(*) card_count     from cards     ca where ca.conceptId = co.id) ca