Search code examples
apache-sparkpysparkapache-spark-sqlspark-structured-streaming

Pivoting streaming dataframes without aggregation in pyspark


ID type value
A car camry
A price 20000
B car tesla
B price 40000

Example dataframe that is being streamed.

I need output to look like this. Anyone have suggestions?

ID car price
A camry 20000
B tesla 40000

Whats a good way to transform this? I have been researching pivoting but it requires an aggregation which is not something I need.


Solution

  • You could filter the frame (df) twice, and join

    (
        df.filter(df.type=="car").withColumnRenamed("value","car")
        .join(
            df.filter(df.type=="price").withColumnRenamed("value","price")
            , on="ID"
        )
        .select("ID", "car", "price")
    )