Search code examples
apache-sparkpysparkdatabricksspark-structured-streamingazure-eventhub

How to apply user defined function over read stream data in pyspark data bricks


I have below code to read the event hub data into data bricks. event-hub-data bricks-code

Question : in the read_df data frame I have body that is encrypted json. I want to apply a user defined function that returns a datafarme with decoded body values. Let's say function name is decode(encoded_body_value). How to apply it over the read stream data so that this operation also becomes streaming. Means as an when event arrives it should trigger the decode and create the dataframe with the decoded values of body.


Solution

  • As UDF works on the row level, it will work with streaming dataframe as well. Just do:

    read_df.select(decode(col("value")).alias("decoded")