Search code examples
amazon-redshift

Unable to match WHERE condition on a calculated field of a View in AWS Redshift


When I include a column in WHERE condition from a view it does not match the value. Instead, when i use TRIM(column) in where condition it works fine. The data does not include any trailing spaces as its a calculated field created by me. The view is fetching data from other views. Here are the steps to recreate the issue on your side.

I have created a test.

create table test(id int);
insert into test values(1),(2),(3),(4),(2),(3);

I created a view over it to ignore id 4.

create or replace view test_view as select * from test where id <> 4;

I created a calculated field over this view using another view.

create view test_view_1 as 
Select case id 
when 1 then 'Aman'::text 
when 2 then 'Boy'::character varying 
else 'Active' 
end as status from test_view;

Now let's see the data.

select * from test_view_1

Output: Status (column name)

Aman
Boy
Active
Boy
Active

When I use the where condition it gives me no output.

select * from test_view_1 where status = 'Aman';

No Output

When I use TRIM on status it gives me output.

select * from test_view_1 where TRIM(status) = 'Aman';

Output: Status (column name)

Aman

I am so lost. I hope this step-wise description of this problem will help you understand and help me in better way. I am required to use view over multiple views due to some restrictions. I hope I can find some help on this.

NOTE: This was not the issue when I was using COLLATE SENSITIVE which is default in Redshift. This issue arised when I was working on a database with COLLATE INSENSITIVE.


Solution

  • Interesting case. I believe you have found a Redshift bug with the new collate insensitive feature (at lease as it applies as a database feature). You should submit this to AWS.

    What appears to be going on here is that collate insensitive is getting crosswise with block metadata comparisons. You see Redshift compares WHERE clause values with block metadata values to see which blocks need to be read in. This can only happen if the column referenced in the WHERE clause is not operated on by a function. If a block is not read in, any results from this block can show up in the result. I can't prove this but it fits with the experiments I ran - see below.

    Redshift is taking all your SQL, including view definitions, and running it through the query compiler. The it looks for ways to "safely" work backwards from your WHERE clause to find out if it can apply your WHERE clause to the block metadata for the table. Since your CASE statement is simple it can to this: 'Aman' only comes about if id is a 1 so check the metadata for a value of 1. This all works fine when collate sensitive is on.

    Somehow Redshift is getting crossed up applying this inferred metadata check when collate is insensitive for the database. I was able to reproduce your example - see below.

    Your TRIM() case works because applying a function to the column in the WHERE clause breaks Redshift's ability to make a metadata comparison. I changed this function to NVL() and again the query worked. Here's this recreation:

    -- create database foo COLLATE  CASE_INSENSITIVE;
    -- switch to this new DB
    
    drop table if exists test cascade;
    create table test(id int);
    insert into test values(1),(2),(3),(4),(2),(3);
    
    create or replace view test_view as select * from test where id <> 4;
    
    
    create view test_view_1 as 
    Select case id 
    when 1 then 'Aman'::text 
    when 2 then 'Boy'::character varying 
    else 'Active' 
    end as status from test_view;
    
    select * from test_view_1;
    
    select * from test_view_1 where nvl(status,'') = 'Aman';
    

    Other examples of functions that make the query work:

    select * from test_view_1 where status||'' = 'Aman';
    select * from test_view_1 where translate(status,'x','y') = 'Aman';
    

    Not surprisingly this function did not make the query work as changing the string case is a no-op for this DB:

    select * from test_view_1 where lower(status) = 'Aman';
    

    Beyond submitting a bug report you can work around this issue exactly as you have already with TRIM() or almost any other function applied in the WHERE clause. This means that all queries can hit this hazard if not adjusted and this could cause continued problems until the bug is fixed. Other workarounds might be possible but getting a read as to the true cause would give confidence in any workaround.

    Now this is just a theory as to the mechanism behind this issue but you definitely contact AWS support.