Search code examples
sqlsql-serversql-server-2008-r2where-clauseclause

Where Clause 'drops' more rows than expected


I have a SQL Server 2008 R2 query that was returning "hypothetically" 100 rows. I'm actually working with 7k - 8k rows.

The Where clause is something like this:

Where Col_a = 'Y'
  And Col_b = 'N'
  And Col_c = 'X'

and 25 of the rows had 'P' in Col_d.

I added:

And Col_d = 'P'

and the query returned the expected 25 rows.

Then I changed to

And Col_d <> 'P'

I expected to get 75 rows but I got only 50.

I thought adding "And Col_d <> 'P'" would only restrict the rows in which there is a 'P' in Col_d.

Why is that not the case and how do I figure out what else is getting dropped when I say And Col_d <> 'P'?

As I said - I am actually working with larger numbers so it is not that easy to eyeball it.

I'd appreciate any help.

Thanks!


Solution

  • As stated in the comment, null is a special case when it comes to comparisons.

    Assume the following data.

    id     someVal
    ----
    0      null
    1      1
    2      2
    

    With a query:

    select id
    from table
    where someVal = 1
    

    would return id 1

    select id
    from table
    where someVal <> 1
    

    would return id 2

    select id
    from table
    where someVal is null
    

    would return id 0

    select id
    from table
    where someVal is not null
    

    would return both ids 1 and 2.

    If you wanted nulls to be "counted" as values in a = <> comparison, it needs to be cast to something like:

    select id
    from table
    where isNull(someVal, -1) <> 1
    

    returns 0 and 2

    Or you can change your ANSI Null setting.

    What I want to do is only exclude the rows that have 'P' in Col_d

    So in your specific case, because you want to treat null in Col_D as a non P row, your query could look like this:

    select * 
    from someTable
    Where Col_a = 'Y'
      And Col_b = 'N'
      And Col_c = 'X'
      And isNull(Col_D, 'someArbitraryValue') <> 'P'
    

    You have to do the above, because as I pointed out throughout the answer and in the links null does not compare the same way as values. You need to make the null something that is not null, (accomplished with isNull(Col_D, 'someArbitraryValue')) or change ANSI NULL setting in order to compare it as equal or not equal to some value.

    Or as @Andrew pointed out magic numbers are bad (someArbitraryValue), so you could instead do:

    select * 
    from someTable
    Where Col_a = 'Y'
      And Col_b = 'N'
      And Col_c = 'X'
      And (Col_D <> 'P' OR Col_D is null)
    

    Normally I would do the directly above query, I was doing it the other way to mostly point out the differences in null comparison vs a value.