I want to optimize the following DB SELECT query:
SELECT
ls.caller_id,
ls.caller_path_id,
ls.caller_path_name,
ls.caller_data->>'worker' AS worker,
ls.caller_stage,
ls.current_status,
ls.date_inserted AS date_closed,
ls.caller_data->>'name' AS firstname,
ls.caller_data->>'surname' AS surname,
ls.caller_data->>'cell' AS cell,
ls.caller_data->>'home' AS home,
ls.caller_data->>'work' AS work,
ls.caller_data->>'other' AS other
FROM mv.caller_stage ls
WHERE ls.caller_stage = 'ACTIVE'
AND (REPLACE(ls.caller_data->>'caller_expiry_date', '/', '-' ))::DATE > NOW()::DATE
AND ls.active
AND caller_data @> jsonb_build_object('worker', 'Alice')
AND ls.date_inserted >= (NOW()- CONCAT('3 days')::INTERVAL)
AND ((ls.caller_path_name LIKE 'Mode %' AND ls.current_status IN (
'Received',
'Wrong Number',
'Does Not Know ID',
'No Such Person','Unreachable',
'Under 21',
'No Permission',
'Do Not Contact',
'Not Interested',
'Does Not Understand',
'Query',
'Engaged',
'No Address',
'Wrong ID'))
OR (ls.caller_path_name LIKE 'Method %' AND ls.current_status IN (
'Successfully Captured'
)))
AND ( _search_text IS NULL
OR(
_search_text IS NOT NULL AND(
ls.caller_data->>'surname' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'name' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'cell' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'home' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'work' ILIKE CONCAT('%',_search_text,'%')
OR ls.caller_data->>'other' ILIKE CONCAT('%',_search_text,'%')
)
))
Runtime is still too high so I was asked to further optimize it.
I tried adding relevant indexes, and I was able to get a 3x improvement.
Would like to know if I can further improve the SELECT
query performance via modifying any of the clauses, such as the IN
or ILIKE
clauses?
You have sacrificed performance for a simpler data model by using JSON in your table. That is okay. You must live with the fact that accessing data from a table where you want to filter by JSON content is rather slow.
But there are some columns that you use in your WHERE
clause that are not JSON. Let's look at these and decide which is the most restrictive condition.
Does ls.caller_stage = 'ACTIVE'
limit the number of rows immensely? Does ls.active
? Does caller_path_name LIKE 'Mode %'
? Does ls.date_inserted >= (NOW()- CONCAT('3 days')::INTERVAL)
? The latter is a strange expression by the way. You concat the string '3 days'? With what? You make this an interval? Why not INTERVAL '3 days'
right away? You subtract this from NOW
, thus getting a timestamp that you compare with a date. (At least the name date_inserted
suggests a date.) Is this intended?
I don't consider current_status IN (...)
by the way, because IN
is like many OR
and indexes usually don't help much with that.
Let's suppose that the condition on the date is the most restrictive, so I put this first in the index. Then let's just say caller_stage
is the second restrictive, caller_path_name the third and active
the least one. So we build this index:
create index idx on mv.caller_stage (date_inserted, caller_stage, caller_path_name, active);
This index is likely to help the DBMS find the data quickly. However, it may decide not to use it. That would be the case for instance, if there were only four days of data in the table, of which you want three.
Disclaimer: I am not a PostgreSQL developer. Postgres offers a lot that many other DBMS don't. It is probable that they offer ways to index JSON data, and this may or may not help here.