I have a where clause in Spark SQL that for some reason doesn't return any records. I think it doesn't work and so I would like to ask what would be equivalent to it?
SELECT
c.client_id,
current_date() as insert_date
FROM
CLIENT_SUB c
WHERE
(c.client_id, insert_date) not in (SELECT client_id, insert_date from CLIENT_SUBSCRIBER_CONTRACT)
I heard I could do it with a join
I would recommend not exists
: it is null
-safe, while not it
isn't - and it usually scales better, too.
I am also suspiscious about the reference to insert_date
: do you really mean that, or you actually want current_date()
?
select cs.client_id, current_date() as insert_date
from client_sub cs
where not exists (
select 1
from client_subscriber_contract csc
where
csc.client_id = c.client_id
and csc.insert_date = cs.insert_date
-- or, maybe: csc.insert_date = current_date()
)
For performance, consider an index on client_subscriber_contract(client_id, insert_date)
.