Search code examples
sqlselectsnowflake-cloud-data-platformwhere-clause

SQL Snowflake: Value does not show up in SELECT * search but does show up when included in WHERE Clause


Hello and thank you in advance!

I have a SQL query that returns a list of customers for a course ID where the status is 'active'. I noticed the total count was incorrect and that there were a few missing customers who do not show up in the result list. When I include those customers in the WHERE clause, they do show up. If they satisfy all other WHERE filters, shouldn't they show up without being directly referenced?

Query where the specific customer does NOT show up in the result list

 SELECT * FROM COURSES_ACTIVE
 WHERE COURSE_ID = '123aaaBBB'
 AND COURSE_STATUS = 'Active'; 

Query where the specific customer DOES show up in the result list

 SELECT * FROM COURSES_ACTIVE
 WHERE COURSE_ID = '123aaaBBB'
 AND COURSE_STATUS = 'Active'
 AND CUSTOMER_ID = '555cccDDD'; 

Things I have tried

Originally I thought maybe it was something to do with the fact that IDs needed to be case sensitive and maybe in a previous SELECT statement things were getting combined incorrectly with GROUP BY. I made sure all IDs were COLLATE("CUSTOMER_ID ", 'binary'). I have confirmed in all previous queries the CUSTOMER_IDs do show up without including them in the WHERE clause.

I have gotten the correct CUSTOMER_IDs to show up when I stick a "::VARCHAR" after the SELECT CUSTOMER_ID::VARCHAR, but as that is not case sensitive, I am worried I again won't have all IDs.

If I stick "::BINARY" at the end of the SELECT CUSTOMER_ID::BINAY, I get the error "The following string is not a legal hex-encoded value: '888ooo000'" <--this fills with a random CUSTOMER_ID

Full code

WITH COURSES_ALL AS (SELECT CUSTOMER_ID,
                            COURSE_ID,
                            MAX(COURSE_EXPIRATION_DATE) CURRENT_EXPIRATION_DATE
                     FROM DATABASE
                     GROUP BY CUSTOMER_ID,
                              COURSE_ID

  -- Course get new row entries so I use MAX to make sure I am only working with the most recent entry.  
  -- I have confirmed that if I run my query here, all CUSTOMER_IDs show up.

COURSES_ACTIVE AS (SELECT CUSTOMER_ID,
                          COURSE_ID,
                          CURRENT_EXPIRATION_DATE,
                          CASE
                              WHEN CURRENT_EXPIRATION_DATE < '2024-01-31' THEN 'Inactive'
                              WHEN CURRENT_EXPIRATION_DATE >= '2024-01-31' THEN 'Active'
                          END  COURSE_STATUS,
                          FROM COURSES_ALL)

 SELECT * FROM COURSES_ACTIVE
 WHERE COURSE_ID = '123aaaBBB'
 AND COURSE_STATUS = 'Active';
 --   AND CUSTOMER_ID = '555cccDDD'; //adding this line back in results in the customer showing up in the results

Solution

  • For anyone who finds this in the future, turns out our source data was corrupted with incorrect data types. Once our admin reset and refreshed all IDs, they now work as expected.