Search code examples
joinpysparkdatabricks

Joining tables in pyspark based on condition


I would like to join pyspark tables where the first table has id, startTime and endTime where as the second table has Time and value. The merged table will have for every id the columns Time and value where the startTime<=Time<=endTime. The startTime and endTime among the ids may overlap


Solution

  • assuming df1 and df2 are your 2 dataframes :

    from pyspark.sql import functions as F
    
    result = df1.join(
        df2,
        on=F.col("Time").between(F.col("startTime"), F.col("endTime")),
        how="inner",
    )