I have come across a postgresql query with a lot of WHERE statements like this
WHERE (category <> 'A' OR category IS NULL)
AND (category <> 'B' OR category IS NULL)
I am struggling to understand what data this query is including/excluding.
I tried rewriting the code above as
WHERE category NOT IN ('A','B')
WHERE category NOT IN ('A', 'B') OR category IS NULL
WHERE (category NOT IN ('A', 'B') OR category IS NULL)
And all three gave different answers to the original code.
Could someone explain to me what data in included/excluded in each of the four cases above?
Say for example the data looked like
ID | Category |
---|---|
1 | A |
2 | B |
3 | C |
4 | D |
5 | NULL |
For (1) I would just get ID's 3, 4. But I am unsure about the others.
EDIT: WHERE (category NOT IN ('A', 'B') OR category IS NULL)
and
WHERE (category <> 'A' OR category IS NULL)
AND (category <> 'B' OR category IS NULL)
Give the same answer.
But WHERE category NOT IN ('A', 'B') OR category IS NULL
without parenthesis gives a different answer.
To correctly understand the output of the mentioned queries you have to think in the following way: take one by one all the lines that satisfy the WHERE clause.
Query 1
WHERE category NOT IN ('A','B')
The query 1 should give all the lines with the attribute category
not in the set specified.
If you proceed step by step, one line at a time you can see that:
category
column contains values in the set ('A','B')
category
column doesn't contain values in the set ('A','B')
To better understand the last point the clause WHERE category NOT IN ('A','B')
can be rewritten as WHERE category<>'A' AND category<>'B'
. Since category
is NULL the logical expression is evaluated in the following way WHERE NULL<>'A' AND NULL<>'B'
, which output is UNKNOWN, so the line will not be included in the output result.
Queries 2 & 3
WHERE category NOT IN ('A', 'B') OR category IS NULL
Queries 2 and 3 are the same, since parentheses in this case doesn't affect the evaluation order of the logical operators.
In this particular case the last line of the example table above is included in the output since category NOT IN ('A','B')
is evaluated as UNKNOWN and category IS NULL
is evaluated to TRUE. For the same reason mentioned above (Three-Valued Logic) the result of UNKNOWN or TRUE
is TRUE.