I have two tables, one called Health_User and the other called Diary. They have users' demographic information, and their recorded values respectively. What I want to do is retrieving the recorded values, but:
So I have several queries which should get the same results:
# Query 1
SELECT d.user_id, d.id AS diary_id, d.glucose_value, d.unit
FROM Diary AS d
JOIN (
SELECT id
FROM Health_User
WHERE is_tester = false
) AS u
ON d.user_id = u.id
WHERE ((d.glucose_value >= 20 AND d.glucose_value <= 600 AND d.unit = 'mg/dL')
OR (d.glucose_value >= 20/18.02 AND d.glucose_value <= 600/18.02 AND d.unit = 'mmol/L'));
# Query 2
SELECT d.user_id, d.id AS diary_id, d.glucose_value, d.unit
FROM Diary AS d
JOIN Health_User AS u
ON d.user_id = u.id
WHERE u.is_tester = false
AND ((d.glucose_value >= 20 AND d.glucose_value <= 600 AND d.unit = 'mg/dL')
OR (d.glucose_value >= 20/18.02 AND d.glucose_value <= 600/18.02 AND d.unit = 'mmol/L'));
# Query 3
SELECT d.user_id, d.id AS diary_id, d.glucose_value, d.unit
FROM Health_User AS u
JOIN (
SELECT id, user_id, glucose_value, unit
FROM Diary
WHERE ((glucose_value >= 20 AND glucose_value <= 600 AND unit = 'mg/dL')
OR (glucose_value >= 20/18.02 AND glucose_value <= 600/18.02 AND unit = 'mmol/L'))
) AS d
ON d.user_id = u.id
WHERE u.is_tester = false;
Here I have three questions:
Question 1: I would speculate that Query 1 would have better performance than Query 2, because a) it joins only one column instead of the whole table of Health_User and b) it filters out testers before joining the tables. Am I correct?
Question 2: The conditional limitation is more complex for Diary (See the last WHERE
clause in Query 1). Is it better to switch Diary inside the JOIN
and make Health_User outside like Query 3, or it makes no difference?
Question 3: Is there any even better solution in terms of performance?
There would be a difference if the database executed the queries in the order your queries suggest (first filter, then join or vice versa).
As it is, PostgreSQL has a query optimizer that rearranges the query to find the most efficient execution order, and all your queries will end up with the same execution plan, which you can verify using the SQL statement EXPLAIN
.
For inner joins, it does not influence the result if you filter before or after the join; you could also write all the conditions into the join condition without changing the result. The optimizer knows that.
You can speed up execution by creating appropriate indexes. It depends on the distribution of the data to know if a certain index is useful. The rule of thumb is that indexes on selective conditions (that filter out many data) are more useful. Work with EXPLAIN
to find the best indexes.