I would like to read events from eventhub using Databricks, events are in json format but they can have different schema (it's important because i find solutions in which the schema was given to from_json(jsonStr,schema) function, but i cannot use it in my use case). When i use
.withColumn('Value', col('value').cast(StringType()
in dataframe returns json output with backslashes "{\"time\": 1432826855000,\"host\":......
.
I found a solution How to prevent spark sql with kafka from adding backslash to JSON string in dataframe but in Delta Live Tables framework we create streaming tables by returning a dataframe, so i cant use this solution.
Should i use non pyspark functions in etl process such as How to remove backslash from decoded JSON string? ? Will it be efficient during streaming from eventhub to bronze?
You shouldn't worry about that backslashes - it's just a visual representation of your string when you display data and it has "
character embedded into a string. Internally, data will be stored without backslashes, like: {"time": 1432826855000,"host":......
.