Search code examples
pysparkleft-joindatabricksazure-databricks

Pyspark join doesn't take 5 positional arguments?


I'm implementing LEFT JOIN on 5 columns in Pyspark. But it's throwing an error as shown below

TypeError: join() takes from 2 to 4 positional arguments but 5 were given

Code implemented :

Tgt_df_time_in_zone_detail = Tgt_df_view_time_in_zone_detail_dtaas.join(Tgt_df_individual_in_shift_tiz
,Tgt_df_view_time_in_zone_detail_dtaas.id_individual == Tgt_df_individual_in_shift_tiz.id_individual, 
(Tgt_df_view_time_in_zone_detail_dtaas.timestamp_start >= Tgt_df_individual_in_shift_tiz.swipein)
 &   (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_start <= Tgt_df_individual_in_shift_tiz.swipeout)
 & (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_end >= Tgt_df_individual_in_shift_tiz.swipein) 
&(Tgt_df_view_time_in_zone_detail_dtaas.timestamp_end <= Tgt_df_individual_in_shift_tiz.swipeout)
, "left_outer") 

Why Pyspark doesn't take join on 5 columns? What's the better way to do it then!?


Solution

  • Guess, you missed & in between your 1st and 2nd condition. Try this, if it works.

    Tgt_df_time_in_zone_detail = Tgt_df_view_time_in_zone_detail_dtaas.join(Tgt_df_individual_in_shift_tiz,
    (Tgt_df_view_time_in_zone_detail_dtaas.id_individual == Tgt_df_individual_in_shift_tiz.id_individual)
    & (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_start >= Tgt_df_individual_in_shift_tiz.swipein)
    & (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_start <= Tgt_df_individual_in_shift_tiz.swipeout)
    & (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_end >= Tgt_df_individual_in_shift_tiz.swipein) 
    & (Tgt_df_view_time_in_zone_detail_dtaas.timestamp_end <= Tgt_df_individual_in_shift_tiz.swipeout)
    , "left_outer")