Find average of value within a range defined in a different table

I'm trying to find the average of value within a range defined in a different table and store in a new column.

Table 1 will provide the start and stop times. Start time will be used as a start timestamp where the average calculation should start and stop time will be used as a stop timestamp where the average calculation should stop.

Table 1.

StartTime	StopTime
100	140

Table 2.

Timestamp	Value
80	15.
90	10.
100	13.
110	9.
120	19.
130	38.
140	1.
150	39.

Does someone know how to do this in PySpark?

According to the table 1 and 2, the average value will be (13 + 9 + 19 + 38 + 1) / 5 = 16 and it should be stored in a new column called Average Value in table 1.

Table 1.

StartTime	StopTime	AverageValue
100	140	16

I've tried the following lines, but it was not working.

w = Window.orderBy("Timestamp").rangeBetween(table_1.StartTime, table_1.StopTime)
table_1 = table_1.withColumn('AverageValue', F.avg(table_2.Value).over(w))

Solution

Join the dataframe such that Timestamp in table 2 is contained between the Start and Stop time interval from table 1 then group the joined dataframe and aggregate the value column

df3 = df1.join(df2, on=(df1['StartTime'] <= df2['Timestamp']) & (df1['StopTime'] >= df2['Timestamp']), how='left')
df3 = df3.groupby('StartTime', 'StopTime').agg(F.avg('Value').alias('AverageValue'))

df3.show()

+---------+--------+------------+
|StartTime|StopTime|AverageValue|
+---------+--------+------------+
|      100|     140|        16.0|
+---------+--------+------------+