Search code examples
mysqlsqlleft-joininner-joinquery-performance

three tables relation order is correct but values are false


i have those tables

     table1
|  id  |  name  |
|  1   |  axe   |
|  2   |  bow   |
|  3   |  car   |
|  4   |  dart  |


        table2                                        table3
|  t1_id  |  number  |                        |  t1_id  |  letter  |
|  1      |  5       |                        |  1      |  a       |
|  1      |  6       |                        |  1      |  b       |
|  1      |  2       |                        |  1      |  c       |
|  2      |  2       |                        |  2      |  a       |
|  2      |  2       |                        |  2      |  c       |
|  2      |  3       |                        |  2      |  r       |
|  3      |  8       |                        |  3      |  y       |
|  3      |  3       |                        |  3      |  i       |
|  3      |  1       |                        |  3      |  a       |
|  4      |  8       |                        |  4      |  a       |
|  4      |  9       |                        |  4      |  b       |
|  4      |  10      |                        |  4      |  c       |

and table1(id) is linked with table2(t1_id), table3(t1_id)

i run it to get them order by the highest letter_count match then by the highest average_number match to get this proper result http://www.sqlfiddle.com/#!9/69086b/8/0

SELECT 
  t1.id, 
  t1.name

FROM 
  table1 t1

INNER JOIN 
  table2 t2
    ON t2.t1_id = t1.id

LEFT JOIN 
  table3 t3
    ON t3.t1_id = t1.id
      AND t3.letter IN ('a', 'b', 'c')

GROUP BY
  t1.id

ORDER BY
  COUNT(t3.letter) DESC,
  AVG(t2.number) DESC

|  id  |  name  |
|  4   |  dart  |
|  1   |  axe   |
|  2   |  bow   |
|  3   |  car   |

and everything is working ok


but when i wanted to check if there is any problems with the query i decided to check the letter_count and avg_number so i used this query

SELECT 
  t1.id, 
  t1.name, 
  COUNT(t3.letter) AS letter_count, 
  AVG(t2.number) AS avg_number

FROM 
  table1 t1

INNER JOIN 
  table2 t2
    ON t2.t1_id = t1.id

LEFT JOIN 
  table3 t3
    ON t3.t1_id = t1.id
      AND t3.letter IN ('a', 'b', 'c')

GROUP BY
  t1.id

ORDER BY
  letter_count DESC,
  avg_number DESC

what i expected the result to be was

|  id  |  name  |  letter_count  |  avg_number   |
|  4   |  dart  |  3             |  9            |
|  1   |  axe   |  3             |  4.3333333333 |
|  2   |  bow   |  2             |  2.3333333333 |
|  3   |  car   |  1             |  4            |

but the result i got was http://www.sqlfiddle.com/#!9/69086b/3/0

|  id  |  name  |  letter_count  |  avg_number   |
|  4   |  dart  |  9             |  9            |
|  1   |  axe   |  9             |  4.3333333333 |
|  2   |  bow   |  6             |  2.3333333333 |
|  3   |  car   |  3             |  4            |

what surprised me was the multiplied row of letter_count which can be solved by derived queries but i don't want to select the letter_count or number_average i only want to ORDER BY them

does keeping the query like it is with ORDER BY only wont affect the query performance or should i still use derived queries even if i don't need to select the data values since the order is correct anyway or would derived query be faster in huge tables?


Solution

  • You've really asked 2 questions here:

    1. Does the ORDER BY clause affect query performance
    2. Why is my letter count not as expected

    Either way, to evaluate the ORDER BY clause, the expressions need to be evaluated so that the order can be determined. In your first example you need to specify the expressions because those columns are not contained within the SELECT statement.

    However in your second query you have selected the columns that you want to order by and because ORDER BY is evaluated AFTER the entire query has been processed, you can simply use the column ALIAS in the ORDER BY clause, instead of executing the function again.

    some RDBMS query optimisers will convert your expressions in the ORDER BY statements to use the column alias for you IF you are ordering on expressions that exist in your SELECT clause

    You were right to do this however, your letter count expression is not current due to the duplicates in the response.

    You can simply change the COUNT expression to use a distinct clause to only count unique values.

    COUNT(DICTINCT t3.letter)
    

    This makes your original query now look like this:

    SELECT 
      t1.id, 
      t1.name
    
    FROM 
      table1 t1
    
    INNER JOIN 
      table2 t2
        ON t2.t1_id = t1.id
    
    LEFT JOIN 
      table3 t3
        ON t3.t1_id = t1.id
          AND t3.letter IN ('a', 'b', 'c')
    
    GROUP BY
      t1.id
    
    ORDER BY
      COUNT(DICTINCT t3.letter) DESC,
      AVG(t2.number) DESC