I have a question.
Why this code works:
-- users's first purchase day
WITH befor_campaign AS (
SELECT user_id,
MIN(created_at) AS first_day_of_purchase
FROM marketing_campaign
GROUP BY user_id )
-- Find users purchased products on the first day:
,user_w_first_day_purchase_after_mkt_work AS
(SELECT
user_id, product_id
FROM marketing_campaign
WHERE (user_id, created_at) IN (SELECT * FROM befor_campaign)
)
-- Pull all users who meet requirments
,data_set AS (
SELECT *
FROM marketing_campaign
WHERE (user_id, created_at) NOT IN (SELECT * FROM befor_campaign)
AND (user_id, product_id) NOT IN (SELECT * FROM user_w_first_day_purchase)
)
SELECT count(distinct user_id) FROM data_set
But this doesn't works and give me the error "Too many arguments"?
WITH Before_MKT as
(
SELECT user_id, product_id,
Min(created_at)
FROM marketing_campaign
Group By user_id, product_id
Order by user_id
)
,data_set AS (
SELECT user_id
FROM marketing_campaign
where (user_id) NOT IN (SELECT * FROM Before_MKT)
)
SELECT count(distinct user_id) FROM data_set
I'm trying to optimize my query transforming two steps in one, but it doesn't works and I don't understand why. Someone can help me, please? Thanks a lot!
The *
in this context expands out to all columns. So this code:
WHERE (user_id, created_at) IN (SELECT * FROM befor_campaign)
is really:
WHERE (user_id, created_at) IN (SELECT user_id, first_day_of_purchase FROM befor_campaign)
A tuple with two elements is compared to a tuple with two elements. No problem.
However, this code:
where (user_id) NOT IN (SELECT * FROM Before_MKT)
Expands out to:
where (user_id) NOT IN (SELECT user_id, product_id, "MIN(created_at)" FROM Before_MKT)
One column on the left, three on the right. No go. You get an error.
I strongly discourage you from using NOT IN
with subqueries. It does not handle NULL
values intuitively -- if any of the returned values are NULL
, then no rows are returned at all. A safe construct is NOT EXISTS
.
I also discourage you from using SELECT *
in such queries. Just list out the columns explicitly, so it is clear what the code should be doing.