Search code examples
javascriptsqlknex.js

Knex subquery to sum data from 2nd table


I'm trying to write a query using knex to SUM the votes for each question but am not getting the correct sum. I can write the subquery in SQL but can't seem to piece it all together. I am a student and not sure if I'm doing something wrong with Knex or if my underlying logic is wrong. Thanks in advance for any help!

My knex query looks like this

return knex
  .from('question')
  .select(
    'question.id AS question_id',
    knex.raw(
      `count(DISTINCT vote) AS number_of_votes`, //this returns the number_of_votes for each question_id as expected
    ),
    knex.raw(
      `sum(vote.vote) AS sum_of_votes`, //something wrong here... E.g., question_id 1 has 3 down votes so the sum should be -3, however I am getting -9
    ),
  )
  .leftJoin('user', 'question.user_id', 'user.id')
  .leftJoin('vote', 'question.id', 'vote.question_id')
  .groupBy('question.id', 'user.id');

There are 3 tables that look like:

user

  • id
  • user_name

question

  • id
  • title
  • body
  • user_id (FK references user.id)

vote

  • question_id (FK references question.id)
  • user_id (FK references user.id)
  • vote (-1 or 1)
  • PRIMARY KEY (question_id, user_id)

I did manage to write the query as a stand-alone SQL query and verified that it works as expected. This is what I am trying to accomplish in the above knex query:

SELECT question.id, sum(vote.vote) AS sum_of_votes FROM question LEFT JOIN vote ON question.id = vote.question_id GROUP BY question.id;

Solution

  • So, broadly your SQL query is correct (after fixing a couple of typos) although as @felixmosh points out it has no user information in it: might be tricky to figure out who voted for what! But perhaps you don't need that for your purposes.

    Your posted solution will do the trick, but is perhaps not the most efficient query for the job as it involves a subquery and several joins. Here's the SQL it generates:

    SELECT "question"."id" AS "question_id",
      count(DISTINCT vote) AS number_of_votes,
      (
        SELECT sum(vote) FROM vote
          WHERE question_id = question.id
          GROUP BY question_id
      ) AS sum_of_votes
      FROM "question"
      LEFT JOIN "user" ON "question"."user_id" = "user"."id"
      LEFT JOIN "vote" ON "question"."id" = "vote"."question_id"
      GROUP BY "question"."id", "user"."id";
    

    We can take a simpler approach to get the same information. How about this?

    SELECT question_id,
      count(vote) AS number_of_votes,
      sum(vote) AS sum_of_votes
      FROM vote
      GROUP BY question_id;
    

    This gets all the information you were looking for, without joining any tables or using subqueries. It also avoids DISTINCT, which could lead to incorrectly counting the number of votes. The Knex to generate such a query looks like this:

    knex("vote")
      .select("question_id")
      .count("vote AS number_of_votes")
      .sum("vote AS sum_of_votes")
      .groupBy("question_id")
    

    You only really need to join tables here if you were looking for further information from those tables (such as the user's name or the question's title).