Search code examples
sqlpostgresqlsubquery

Subquery question error "Too many arguments"


I have a question.

Why this code works:

-- users's first purchase day 
WITH befor_campaign AS (
    SELECT user_id, 
           MIN(created_at) AS first_day_of_purchase
    FROM marketing_campaign
    GROUP BY user_id ) 
    
-- Find users purchased products on the first day:
    ,user_w_first_day_purchase_after_mkt_work AS
    (SELECT 
        user_id, product_id
    FROM marketing_campaign
    WHERE (user_id, created_at) IN (SELECT * FROM befor_campaign)
    )

-- Pull all users who meet requirments   
    ,data_set AS (
    SELECT * 
    FROM marketing_campaign
    WHERE (user_id, created_at) NOT IN (SELECT * FROM befor_campaign)
    AND (user_id, product_id) NOT IN (SELECT * FROM user_w_first_day_purchase)
    )

SELECT count(distinct user_id) FROM data_set

But this doesn't works and give me the error "Too many arguments"?

    WITH Before_MKT as
    (
     SELECT user_id, product_id,
       Min(created_at)
     FROM marketing_campaign
     Group By user_id, product_id
     Order by user_id
     )
 
    ,data_set AS (
    SELECT user_id 
    FROM marketing_campaign
    where (user_id) NOT IN (SELECT * FROM Before_MKT)
    )

SELECT count(distinct user_id) FROM data_set


I'm trying to optimize my query transforming two steps in one, but it doesn't works and I don't understand why. Someone can help me, please? Thanks a lot!


Solution

  • The * in this context expands out to all columns. So this code:

    WHERE (user_id, created_at) IN (SELECT * FROM befor_campaign)
    

    is really:

    WHERE (user_id, created_at) IN (SELECT user_id, first_day_of_purchase FROM befor_campaign)
    

    A tuple with two elements is compared to a tuple with two elements. No problem.

    However, this code:

    where (user_id) NOT IN (SELECT * FROM Before_MKT)
    

    Expands out to:

    where (user_id) NOT IN (SELECT user_id, product_id, "MIN(created_at)" FROM Before_MKT)
    

    One column on the left, three on the right. No go. You get an error.

    I strongly discourage you from using NOT IN with subqueries. It does not handle NULL values intuitively -- if any of the returned values are NULL, then no rows are returned at all. A safe construct is NOT EXISTS.

    I also discourage you from using SELECT * in such queries. Just list out the columns explicitly, so it is clear what the code should be doing.