Below is the T-SQL code attached. I tried to convert it to pyspark using window functions which is also attached.
case
when eventaction = 'OUT' and lag(eventaction,1) over (PARTITION BY barcode order by barcode,eventdate,transactionid) <> 'IN'
then 'TYPE4'
else ''
end as TYPE_FLAG,
Pyspark code giving error using window function lag
Tgt_df = Tgt_df.withColumn(
'TYPE_FLAG',
F.when(
(F.col('eventaction')=='OUT')
&(F.lag('eventaction',1).over(w).isNotIn(['IN'])),
"TYPE4"
).otherwise(''))
But it's not working. What to do!?
It is giving you an error because there is no isNotIn
method for columns object.
That would have been obvious if you just posted the error message...
Instead, use the ~
(not) operator.
&( ~ F.lag('eventaction',1).over(w).isin(['IN'])),
List of available methods are in the official documentation.