We run this below query daily and this query runs for 3 hours or so, owing due to sheer volume of data in the transaction table. Is there any way we can tune this query or reduce the execution time?
CREATE TEMPORARY TABLE t1 AS
SELECT DISTINCT EVENT_DATE FROM (
SELECT DISTINCT EVENT_DATE FROM mstr_wrk.cust_transation
WHERE load_date BETWEEN CAST(CAST('2019-03-05 04:00:31.0' AS TIMESTAMP) AS DATE) AND CURRENT_DATE() AND event_title = 'SETUP'
AND state != 'INACTIVE' AND mode != 'DORMANT') T
I tried to reduce the number of reducers to help speed up, and also tried to enable vectorization but not much luck here. We are running on tez.
mstr_wrk.cust_transation
is partitioned by load_date
, partition pruning will not work because you are using functions. This will cause table full scan. Calculate dates in the shell script and pass as a parametersCheck this script performance before parametrizing your script
CREATE TEMPORARY TABLE t1 AS
SELECT DISTINCT EVENT_DATE FROM mstr_wrk.cust_transation
WHERE load_date >= '2019-03-05' AND load_date <= '2019-03-07'
AND event_title = 'SETUP'
AND state != 'INACTIVE' AND mode != 'DORMANT'