Search code examples
sqlamazon-redshiftcase

Redshift CASE filter has unexpected behaviour with integer values


I found a potential bug in Redshift when I tested an SQL code of mine. Although the CASE condition in the calculation column evaluates to 'yes', it's not filtered out by the WHERE statement:

with tt as (
    select
    90 AS num,
    91 AS next_num
)
select 
    num, 
    next_num, 
    (num-next_num) / num AS result
    , case when (num-next_num) / num between -0.2 and 0.2 then 'yes' else 'no' end calculation
from tt 
where calculation = 'no'

It makes no difference if the CASE condition is explicitly stated in the WHERE statement or not. However, it does make a difference if num and next_num are declared as NUMERIC (then it works) or not. So apparently, the calculation is handled differently in the SELECT statement vs. the WHERE statement.

Is this something that's supposed to be like that and I should've known better, or is it a mistake on Redshift's end? In SQL Server, it works well and no contradictory behavior occurs.


Solution

  • This is known "observation" and comes down to implicit casting. See Redshift, casting of a decimal value is not rounding off

    And as I observed in that answer "I'll let the database philosophers debate if this is a bug or not."

    In this case let's simplify your case to its bare bones:

    select (-0.1)::int AS result
    where result < -0.2 or result > 0.2
    

    This shows the same behavior you observe - result is 0 but the row is still selected. Seems odd, right?

    This can be further clarified by looking only at the negative comparison as this is the one that matters.

    select (-0.1)::int AS result
    where result < -0.2
    

    Seems like an obvious bug, right?

    But not so fast. This all comes down to implicit casting. How is Redshift deciding to make the comparison between result and -0.2? Comparing an INT and a DECIMAL. Remember the WHERE clause is computed first, then the result. So there are two different paths being computed and the WHERE clause has an implicit casting process to perform.

    Let's shift things around a little bit:

    select (-0.1) AS result
    where result::int < -0.2
    

    This also produces a row with the value -0.1 but clearly that doesn't meet the WHERE clause. This looks bad for Redshift.

    But let's change this again:

    select (-0.1) AS result
    where result::decimal(3,0) < -0.2
    

    Again a row is produced. Is INT the same as DECIMAL(3,0) when used in evaluating a WHERE condition? It looks like the "WHERE path" for casting is making some simplifications.

    As the above mentioned answer shows implicit casting from one definition to a DECIMAL can result is truncation or rounding depending on the circumstances. The only difference you have in this code is negative truncation and this viewed in the binary world truncates down: -0.2 becomes -1 and thus passes the WHERE condition.

    The bottom line here hasn't changed - if you let the DB make implicit decisions on how to cast your data it can decide 2 different cast paths for the data and the WHERE. The results of the query can look "confused".

    The best answer is to avoid implicit casting which is good coding and also prevents the DB from casting differently on the two paths.