A performance test was run for one hour on our system. The run produced so many logs, that it actually took three more hours for Filebeat (version 7.2.1) to process all of them. What is confusing is reports on CPU utilization produces by Zabbix and Kibana Stack Monitoring.
Zabbix report looks like this
It shows that the CPUs were used with 20% rate for three hours after the test was finished (i.e. from 13:00 on).
On the other hand, Kibana Stack Monitoring shows that the CPU was used with 80% rate for the same period.
The tooltip on Kibana says:
Percentage of CPU time spent executing (user+kernel mode) for the Beat process.
So, the usage is clearly only for Filebeat process. This does not go well with 20% reported by Zabbix.
To mention, in the filebeat.yml
the value for max_procs
is not set, which means by default it uses all the logical CPUs in the system. (see here). We have four cores in the system, and in total four logical CPUs. Output from lscpu
CPU(s): 4
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
We use systemctl to run Filebeat.
Anyone could explain this behavior? Could it be that Filebeat is somehow using only one core or systemctl is limiting it to one core?
Just an hypothesis:
Zabbix system.cpu.*
items are normalized, so your 20% usage should match both scenarios:
By checking online, seems to me (but correct me if I'm wrong, I'm not an elk expert!) that the kibana/metricbeat CPU usage is not normalized by default, so the upper limit with 4 CPUs is 400%.
There's a discussion here and the corresponding workaround with normalized percentages
That can explain that Zabbix's normalized 20% is equal to the Kibana/Metricbeat's not normalized 80% value.
But, is metricbeat using 1 CPU at 80% or four CPUs at 20%? I can't tell, but according to max_procs
it should be using all 4