I have the following sql statement to query the databricks pipeline event logs and it works.
I tried to rewrite it into a python code, but I failed.
Could somebody provide me any advice? Many thanks!!
SELECT timestamp, details:user_action:action, details:user_action:user_name
FROM event_log_raw
WHERE event_type = 'user_action'
Please Note the details column here is string type, not struct nor array
The following solutions I tried didn't work
df
is a spark dataFrame generated from the table event_log_raw
df.filter(df.event_type == 'user_action').select("timestamp", "details:user_action:action", "details:user_action:user_name")
df.filter(df.event_type == 'user_action').select("timestamp", "details.user_action.action", "details.user_action.user_name")
Instead of using select
you need to use selectExpr
because the string like details:user_action:action
is a JSON path expression for extracting data from the JSON string (doc):
df.filter(df.event_type == 'user_action')
.selectExpr("timestamp", "details:user_action:action", "details:user_action:user_name")