I have an executor service that is supposed to write some things to disk every minute.
It is scheduled like this:
scheduledCacheDump = new ScheduledThreadPoolExecutor(1);
scheduledCacheDump.scheduleAtFixedRate(this::saveCachedRecords,
60,
60,
TimeUnit.SECONDS
);
The task uses a shared list that is filled by the main thread, so it is synchronized on that list:
private void saveCachedRecords() {
LOG.info(String.format("Scheduled record dump to disk. We have %d records to save.", recordCache.size()));
synchronized (recordCache) {
Iterator<Record> iterator = recordCache.iterator();
while (iterator.hasNext()) {
// save record to disk
iterator.remove();
}
}
}
My list is declared as such:
private final List<Record> recordCache = new ArrayList<>();
The main thread receives data by batch, so every second or so, it receives 30 records that it caches in the list. The rest of the time it is waiting on a socket.
What I don't understand, is that from the logs, my scheduled task is often triggered with way more that 1 minute in between:
sept. 16 09:30:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:31:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:32:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:33:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:34:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:35:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:42:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:43:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:44:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:45:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:46:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:55:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:56:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:57:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:58:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 09:59:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 10:04:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 10:05:43 Scheduled record dump to disk. We have 27 records to save.
sept. 16 10:06:43 Scheduled record dump to disk. We have 27 records to save.
Look at this:
=> 5 minutes
Or even:
=> 9 minutes
My log is in the synchronized()
scope, so I don't know if the task is actually scheduled on time and waits 10 minutes on the lock, or if it is just a real scheduling issue. I will be moving it out of it, but generally I can't fathom how a thread could stay blocked for 10 minutes on a lock that is released about every second.
How can I investigate this?
For information: the machine on which it runs is a KVM machine, could this be a factor?
Oh my..
This was not Java's fault at all. This was systemd's fault.
My process runs as a systemd service, so the logs I pull come from systemd-journald. And guess what, there's a rate limiting in systemd. And my daemon would hit it when the scheduler was triggered, so I got PLENTY of lines like this:
Suppressed 570 messages from /system.slice/xxx.service
Suppressed 769 messages from /system.slice/xxx.service
Suppressed 745 messages from /system.slice/xxx.service
Suppressed 729 messages from /system.slice/xxx.service
Suppressed 717 messages from /system.slice/xxx.service
Suppressed 95 messages from /system.slice/xxx.service
Suppressed 543 messages from /system.slice/xxx.service
So.. yeah, I removed the throttling on journald and I have my traces every minute now.
The solution was:
/etc/systemd/journald.conf
RateLimitInterval=0
at the endsystemctl restart systemd-journald
systemctl restart myservice
Now I have all things going smoothly.
I'll update the title for future reference :)