I would like to join pyspark tables where the first table has id
, startTime
and endTime
where as the second table has Time
and value
. The merged table will have for every id
the columns Time
and value
where the startTime<=Time<=endTime
. The startTime
and endTime
among the id
s may overlap
assuming df1 and df2 are your 2 dataframes :
from pyspark.sql import functions as F
result = df1.join(
df2,
on=F.col("Time").between(F.col("startTime"), F.col("endTime")),
how="inner",
)