apache-spark pyspark apache-spark-sql spark-structured-streaming

Pivoting streaming dataframes without aggregation in pyspark

ID	type	value
A	car	camry
A	price	20000
B	car	tesla
B	price	40000

Example dataframe that is being streamed.

I need output to look like this. Anyone have suggestions?

ID	car	price
A	camry	20000
B	tesla	40000

Whats a good way to transform this? I have been researching pivoting but it requires an aggregation which is not something I need.

Solution

You could filter the frame (df) twice, and join

(
    df.filter(df.type=="car").withColumnRenamed("value","car")
    .join(
        df.filter(df.type=="price").withColumnRenamed("value","price")
        , on="ID"
    )
    .select("ID", "car", "price")
)