Search code examples
mysqlsqlmysql-5.6

MySQL - can't retrieve max value within a left join and 2 distinct select perimeters


I'm using MySql 5.6 and I have 2 tables (simplified them to reduce nb of columns to what matters for this question) with a query that is using today a LEFT JOIN to Select rows.

Here is the fiddle to test your solution: https://www.db-fiddle.com/f/nGVFhr3xMwKk9CDw6N6FWc/13

Table 'query_results'

+-----------------------------+------------+--------------+-----------
| query_result_id             | query_id   | author       |  datecol
+-----------------------------+------------+--------------+-----------
| 100                         |         1  | john         |   80
| 101                         |         1  | eric         |   70
| 102                         |         2  | emily        |   100
| 103                         |         2  | emily        |   100
| 104                         |         4  | emily        |   120
| 105                         |         3  | emily        |   50
+-----------------------------+------------+--------------+-----------

Table 'customers_emails'

+-------------------+-----------------+--------------+-----------+-------------+
| customer_email_id | query_result_id | customer_id  | author    |  email_nb 
+-------------------+-----------------+--------------+-----------+-------------+
| 5                 |         758     | 12           |  mathew   |   0  
| 12                |         102     | 12           |  emily    |   0     
| 13                |         103     | 12           |  emily    |   1
| 14                |         104     | 12           |  emily    |   9
| 15                |         102     | 7            |  emily    |   2
+-------------------+-----------------+--------------+-----------+-------------+

My current query successfully fetches all the query_results for a given query_id = 2 and a given customer_id = 12 and some other minor constraints (such as datecol > 30).

  SELECT            
    qr.query_result_id,
    qr.query_id,
    qr.author
  FROM
    query_results qr
  LEFT JOIN
    customers_emails coe
  ON
    qr.author = coe.author AND
    coe.customer_id = 12           
  WHERE        
    qr.query_id = 2 AND
    qr.datecol >= 30 AND
    qr.author IS NOT NULL            
      AND qr.author NOT IN (
        SELECT author
        FROM customers_emails
        WHERE 
          (
            customer_id = 12 AND
            email_nb = 3
          )
      )
  GROUP BY
    qr.author
  ORDER BY 
    qr.query_result_id ASC
  LIMIT 
    2

This query above works perfectly (I'm satisfied) and gives me:

Today My output is :

+-------------------+-----------------+--------------+
| query_result_id   | query_id        | author       | 
+-------------------+-----------------+--------------+
| 102               |         2       | emily        |                 
+-------------------+-----------------+--------------+

Now my goal and where I fail is: I simply want to add a new column called max_email_nb to the current output which would be **"the highest email_nb sent

  • by the given customer_id (=12)
  • on this given query_id (=2)
  • to each line's author from the current SQL output, in the example above it's emily (but there could be more lines and can't be predicted: it comes from the current query!)**

I tried to use MAX():

      SELECT            
        qr.query_result_id,
        qr.query_id,
        qr.author,
        MAX(coe.email_nb) as max_email_nb
      FROM
        query_results qr
      LEFT JOIN
        customers_emails coe
      ON
        qr.author = coe.author AND
        coe.customer_id = 12           
      WHERE        
        qr.query_id = 2 AND
        qr.datecol >= 30 AND
        qr.author IS NOT NULL            
          AND qr.author NOT IN (
            SELECT author
            FROM customers_emails
            WHERE 
              (
                customer_id = 12 AND
                email_nb = 3
              )
          )
      GROUP BY
        qr.author
      ORDER BY 
        qr.query_result_id ASC
      LIMIT 
        2

Today My output is :

+-------------------+-----------------+--------------+-----------------+-------------
| query_result_id   | query_id        | author       |  max_email_nb   | ... 
+-------------------+-----------------+--------------+-----------------+------------
| 102               |         2       | emily        |  9              |                
+-------------------+-----------------+--------------+-----------------+------------

The value in max_email_nb is incorrect : based on what I aim to do, I expect the value of max_email_nb to be equal to 1 instead of 9. The output I expect is:

+-------------------+-----------------+--------------+-----------------+-------------
| query_result_id   | query_id        | author       |  max_email_nb   | ... 
+-------------------+-----------------+--------------+-----------------+------------
| 102               |         2       | emily        |  1              |                
+-------------------+-----------------+--------------+-----------------+------------

Indeed I want to retrieve for each query_result that is outputted by my SQL query, the highest email_nb sent by a given Customer_id 12 on query_id 2 to this author (in this specific line coming from the current SQL query it's emily).

So where does this incorrect 9 value come from ? It comes from from this input line:

+-------------------+-----------------+--------------+-----------+-------------+
| customer_email_id | query_result_id | customer_id  | author    |  email_nb   | 
+-------------------+-----------------+--------------+-----------+-------------+
| 14                |         104     | 12           |  emily    |   9  

... so it's associated with query_result_id= 104, which itself is defined here:

+-----------------------------+------------+--------------+-----------
| query_result_id             | query_id   | author       |  datecol
+-----------------------------+------------+--------------+-----------
| 104                         |         4  | emily        |   120

...so it's with query_id = 4 ! But as I said when defining what my goal was that I was looking for sth associated with query_id = 2 , that's why I should not get 9 but the value 1!

Here is a fiddle to test your solution: https://www.db-fiddle.com/f/nGVFhr3xMwKk9CDw6N6FWc/13

I tried subqueries, I tried inner joins...but nothing works.


Solution

  • I tried this on fiddle and it worked like charm,

    You can also try this.

    SELECT            
        qr.query_result_id,
        qr.query_id,    
        qr.author,
        MAX(coe.email_nb) as max_email_nb
      FROM
        query_results qr
      LEFT JOIN
        customers_emails coe
      ON
        qr.author = coe.author  
        and coe.customer_id = 12
        and qr.query_result_id = coe.query_result_id
      WHERE 
        qr.query_id = 2 AND
        qr.datecol >= 30 AND
        qr.author IS NOT NULL            
          AND qr.query_result_id NOT IN (
            SELECT query_result_id
            FROM customers_emails
            WHERE 
              (
                customer_id = 12 AND
                email_nb = 3
              )
          )
      GROUP BY
        qr.author
      ORDER BY 
        qr.query_result_id ASC
      LIMIT 
        20