I am trying to run a SQL query in azure databricks. But this query is taking too long to execute. Could you please suggest what optimization I can do in this below query to make it more faster
SELECT
LOWER(get_json_object(element_at(params,'ans.client_payload'), '$.ACTIONBLOCK_ID')) AS content_id,
lower(element_at(params,'event.type')) as event_type,
count(distinct(userguid)) as num_usr,
count(*) as num_events
FROM my_table
where event_date >= '2023-04-03'
AND event_code = 'ABC_SERVICE'
AND lower(element_at(params,'event.subcategory')) = 'message'
AND lower(element_at(params,'event.type')) in ('render', 'click')
AND element_at(params,'consumer.name') in ('ABC', 'BCC')
AND element_at(params,'source.name') = 'NNC'
and regexp_like(element_at(params,'exp.campaign_id'), '\b12345\b')
and userguid is not NULL
GROUP BY
LOWER(get_json_object(element_at(params,'ans.client_payload'), '$.ACTIONBLOCK_ID')),
lower(element_at(params,'event.type'))
I am trying to execute this query but this is taking too long to execute. looking for suggestions to optimize this query.
SELECT
content_id,
event_type,
count(distinct(userguid)) as num_usr,
count(*) as num_events
FROM (
SELECT
LOWER(get_json_object(element_at(params,'ans.client_payload'), '$.ACTIONBLOCK_ID')) AS content_id,
lower(element_at(params,'event.type')) as event_type,
userguid
FROM my_table
WHERE
event_date >= '2023-04-03'
AND event_code = 'ABC_SERVICE'
AND lower(element_at(params,'event.subcategory')) = 'message'
AND lower(element_at(params,'event.type')) in ('render', 'click')
AND element_at(params,'consumer.name') in ('ABC', 'BCC')
AND element_at(params,'source.name') = 'NNC'
and regexp_like(element_at(params,'exp.campaign_id'), '\b12345\b')
and userguid is not NULL
)
GROUP BY
content_id,
event_type
you may want to create an index on event_date
event_code
element_at(params,'event.subcategory')
element_at(params,'event.type')
element_at(params,'consumer.name')
element_at(params,'source.name')
element_at(params,'exp.campaign_id')
userguid
ps. be mindful of the size of the index(you don't want it to be too big)