I have performance issue on a query, potentiall wrongly written.
Dear all. I have two tables. One table "test_cdr" , with rougly 5M rows, contains three columns
Another table "codes" contains two columns
Now I need to puplate the test_cdr.dest by:
like this, I can then sum the duration of each call per destination type.
I am, atm, using this query
UPDATE test_cdr AS main
JOIN (
SELECT c.POINT_TARGET, t.dest
FROM test_cdr c
JOIN codes t ON c.POINT_TARGET LIKE CONCAT(t.area_codes, '%')
WHERE c.dest IS NULL -- Only update rows where dest is NULL
LIMIT 1000 -- Adjust the batch size as needed
) AS matched ON main.POINT_TARGET = matched.POINT_TARGET
SET main.dest = matched.dest;
It only works with LIMIT and it returns roughly 10000 rows (every 20 seconds - pretty lame performance) As soon as I remove LIMIT, the query runs but nothing is performed on the db (I see no disk activity, while with LIMIT there is disk activity).
Could you help me figure out the issue or maybe provide a better option? Btw, I have used INDEXES.
Thank you.
I'm not sure if this query improves the performance.
Furthermore, it updates all CDRs having destination NULL and maybe that's not what you want.
It does however take the longest area code in case of multiple matches.
For instance, if the dialed number is +3906123456 it considers area code +3906 (Rome) instead of +390 (Italy fixed).
Anyway, give it a try and please be gentle in case it's not useful.
UPDATE test_cdr AS cdrs
SET cdrs.dest = (SELECT cod1.dest
FROM codes AS cod1
WHERE cod1.area_codes = LEFT(cdrs.point_target, LENGTH(cod1.area_codes))
AND NOT EXISTS (SELECT *
FROM codes AS cod2
WHERE cod2.area_codes = LEFT(cdrs.point_target, LENGTH(cod2.area_codes))
AND LENGTH(cod2.area_codes) > LENGTH(cod1.area_codes))
LIMIT 1)
WHERE cdrs.dest IS NULL;
I'm quite convinced that
t.area_codes = LEFT(c.point_target, LENGTH(t.area_codes))
is faster than
c.point_target LIKE CONCAT(t.area_codes, '%')
UPDATE:
I suppose there's a unique key on codes.area_codes. In that case you can remove LIMIT 1
from the subquery since that query will never return more than 1 value.
Furthermore, if it doesn't find the corresponding area code, the destination will be updated to NULL which causes the same dialed number to be re-examined the next time.