Search code examples
mysqlzabbix

How to make this MySQL query work faster if I can not do indexing and partition?


I am building web application around Zabbix MySQL (MariaDB) database.

I need to display table with all hostnames and list of current problems for the hosts, I do the following sql query (during one http GET request I do 7 such sql queries with different events.name values to check all possible problem) to do so:

SELECT distinct(hosts.hostid), max(CONVERT(CONCAT(events.eventid, events.value, events.severity), UNSIGNED))
FROM hosts
INNER JOIN hosts_groups ON hosts.hostid = hosts_groups.hostid
INNER JOIN hstgrp ON hosts_groups.groupid = hstgrp.groupid

INNER JOIN items ON hosts.hostid = items.hostid
INNER JOIN functions ON items.itemid = functions.itemid
INNER JOIN events ON functions.triggerid = events.objectid
WHERE events.name = %s
AND hstgrp.groupid = %s
AND hosts.status != 3 # 3 - not templates
GROUP BY hosts.hostid;

The sum time of the sql queries can range from 20 seconds to 120 seconds, I suppose that the issue is related to the size of events table and the fact that new events are added to the table really fast.

The results of EXPLAIN command: enter image description here

I suppose that I can try to do indexing of events.name column, but I am afraid that it can be negative factor for Zabbix application. Another option is partition but Zabbix have its' own partition howto plan, so I afraid to do it too.

What other options do I have to make the query work faster and what can be the reason of such a great difference in the query time (up to 6-7 times)?

EDIT:

If I restrict time of events e.g. up to 10 last days, the queries work faster, but I loose some of events, since error event could take place 1 month ago and never was not solved then.

SELECT hosts.hostid, max(CONVERT(CONCAT(events.eventid, events.value, events.severity), UNSIGNED))
FROM hosts
INNER JOIN hosts_groups ON hosts.hostid = hosts_groups.hostid
INNER JOIN hstgrp ON hosts_groups.groupid = hstgrp.groupid

INNER JOIN items ON hosts.hostid = items.hostid
INNER JOIN functions ON items.itemid = functions.itemid
INNER JOIN events ON functions.triggerid = events.objectid
WHERE events.eventid >= (select eventid from events  where events.clock >= 1602773508 limit 1) AND events.name = "Устройство недоступно"
AND hstgrp.groupid = 15
AND hosts.status != 3 # 3 - not templates
GROUP BY hosts.hostid;

EDIT

The results from problem table contradict to results from events table, hosts which are called not reachable table are available by ping and not marked as not reachable in zabbix intereface, query:

SELECT distinct(hosts.hostid) FROM hosts
INNER JOIN hosts_groups ON hosts.hostid = hosts_groups.hostid
INNER JOIN hstgrp ON hosts_groups.groupid = hstgrp.groupid

INNER JOIN items ON hosts.hostid = items.hostid
INNER JOIN functions ON items.itemid = functions.itemid
INNER JOIN problem ON functions.triggerid = problem.objectid
WHERE problem.name = "Device is unreachable"
AND hstgrp.groupid = 15
AND hosts.status != 3 ;

In addition I found that for one host there are several problems with the same name but different time (clock), though I expected maximum one problem with the specified name for the concrete host:

SELECT hosts.hostid, problem.name, problem.clock FROM hosts
INNER JOIN hosts_groups ON hosts.hostid = hosts_groups.hostid
INNER JOIN hstgrp ON hosts_groups.groupid = hstgrp.groupid

INNER JOIN items ON hosts.hostid = items.hostid
INNER JOIN functions ON items.itemid = functions.itemid
INNER JOIN problem ON functions.triggerid = problem.objectid
WHERE problem.name = "Device is unreachable"
AND hstgrp.groupid = 15
AND hosts.status != 3 ;

Results for one host from problems table:

10398 Device is unreachable 1603625463 10398 Device is unreachable 1603630863 10398 Device is unreachable 1603661463 10398 Device is unreachable 1603679463 10398 Device is unreachable 1603697463


Solution

  • Answer "built" from the comments.

    While the best course of action is to use the problem.get API, from the documentation you can infer how the problem table works and use it for a SQL query:

    This method is for retrieving unresolved problems. It is also possible, if specified, to additionally retrieve recently resolved problems. The period that determines how old is “recently” is defined in Administration → General.

    Problems that were resolved prior to that period are not kept in the problem table. To retrieve problems that were resolved further back in the past, use the event.get method.

    You should join that table and not the events table, which contains every event that happened in the past.