Search code examples
postgresqlwhere-clausewhere-in

PostgreSQL WHERE Clause what is included/excluded?


I have come across a postgresql query with a lot of WHERE statements like this

WHERE (category <> 'A' OR category IS NULL)
AND (category <> 'B' OR category IS NULL) 

I am struggling to understand what data this query is including/excluding.

I tried rewriting the code above as

  1. WHERE category NOT IN ('A','B')

  2. WHERE category NOT IN ('A', 'B') OR category IS NULL

  3. WHERE (category NOT IN ('A', 'B') OR category IS NULL)

And all three gave different answers to the original code.

Could someone explain to me what data in included/excluded in each of the four cases above?

Say for example the data looked like

ID Category
1 A
2 B
3 C
4 D
5 NULL

For (1) I would just get ID's 3, 4. But I am unsure about the others.

EDIT: WHERE (category NOT IN ('A', 'B') OR category IS NULL) and

WHERE (category <> 'A' OR category IS NULL)
AND (category <> 'B' OR category IS NULL)

Give the same answer.

But WHERE category NOT IN ('A', 'B') OR category IS NULL without parenthesis gives a different answer.


Solution

  • To correctly understand the output of the mentioned queries you have to think in the following way: take one by one all the lines that satisfy the WHERE clause.

    Query 1

    1. WHERE category NOT IN ('A','B')

    The query 1 should give all the lines with the attribute category not in the set specified. If you proceed step by step, one line at a time you can see that:

    • the first 2 lines are not included in the output since the category column contains values in the set ('A','B')
    • the next 2 lines are included in the output since the category column doesn't contain values in the set ('A','B')
    • the last line is not included in the output since the NULL values are evaluated as UNKNOWN according to the Three-Valued Logic

    To better understand the last point the clause WHERE category NOT IN ('A','B') can be rewritten as WHERE category<>'A' AND category<>'B'. Since category is NULL the logical expression is evaluated in the following way WHERE NULL<>'A' AND NULL<>'B', which output is UNKNOWN, so the line will not be included in the output result.

    Queries 2 & 3

    1. WHERE category NOT IN ('A', 'B') OR category IS NULL

    Queries 2 and 3 are the same, since parentheses in this case doesn't affect the evaluation order of the logical operators.

    In this particular case the last line of the example table above is included in the output since category NOT IN ('A','B') is evaluated as UNKNOWN and category IS NULL is evaluated to TRUE. For the same reason mentioned above (Three-Valued Logic) the result of UNKNOWN or TRUE is TRUE.