Search code examples
t-sqlpysparkwindowapache-spark-sqllag

window functions( lag) implementation and the use of IsNotIn in pyspark


Below is the T-SQL code attached. I tried to convert it to pyspark using window functions which is also attached.

  case 
             when eventaction = 'OUT' and lag(eventaction,1) over (PARTITION BY barcode order by barcode,eventdate,transactionid) <> 'IN'  
                  then 'TYPE4'
             else ''
      end as TYPE_FLAG,

Pyspark code giving error using window function lag

Tgt_df = Tgt_df.withColumn(
    'TYPE_FLAG',
    F.when(
        (F.col('eventaction')=='OUT')
        &(F.lag('eventaction',1).over(w).isNotIn(['IN'])),
    "TYPE4"
).otherwise(''))  

But it's not working. What to do!?


Solution

  • It is giving you an error because there is no isNotIn method for columns object. That would have been obvious if you just posted the error message...

    Instead, use the ~ (not) operator.

    &( ~ F.lag('eventaction',1).over(w).isin(['IN'])),
    

    List of available methods are in the official documentation.