How to do reliable integration testing of Unix signal handling in PHP?

I am writing a server system that runs in the background. In very simplified terms it has its own scripting language, which means that a process can be written in that language to run on its own, or it can call another process, etc. I am converting this system from a trivial PHP cron-job in which only one instance is permitted at a time to a set of long-running processes managed by Supervisor.

With that in mind, I am aware that these processes can be killed at any time, either by myself in development, or perhaps by Supervisord in the normal course of stopping or restarting a worker. I would like to add some proper signal handling to ensure that workers tidy up after themselves, and log where a task was left in an interrupted state where appropriate.

I have worked out how to enable signal handling using ticks and pcntl_signal(), and my handling currently seems to work OK. However, I would like to test this to make sure it is reliable. I have written some early integration tests but they don't feel all that solid, mainly because during development there were all sorts of weird race-condition issues that were tricksy to pin down.

I'd like some advice or direction on how to send kill signals in PHPUnit tests, with a view to improving confidence that my sig handling is robust. My present strategy:

Uses PHPUnit
As the core system runs it creates log files of various kinds, which can be used to monitor when to kill off the task
The core system is launched using a separate PHP script in the background using a system() command in the PHPUnit test. My command is similar to php script.php > $logFile 2>&1 & i.e. redirect all output to a log file and then push it to the background, so the test method can monitor it
The background script writes its PID to a file, which will be the PID to kill
This is picked up reliably by the test by scanning repeatedly for it and usleeping between scans
The test then waits for a specific state by scanning the database, usleeping between scans, and issuing a kill <pid> when it is ready
It then waits for the signal handler to kick in and write a new database state, usleeping again to avoid hammering the database
Finally it will either determine if the database is in a correct state or not, after a maximum delay time, which passes/fails a test.

Of course, with all this waiting/checking, it feels a bit ropey, and quite ripe for race conditions of all sorts. My current feeling is that the tests will fail around 2% of the time, but I've not been able to get the test to fail for a day or so. I plan to do some immersion testing, and if I get any failures from that I'll post that here.

I wonder if I can simplify it by asking the system under test to kill itself, which will remove two levels of wait-checking (one to wait for the PID, and another to wait for the database to enter the correct state before the kill command)^†. That would still leave the wait-check loop after the kill is issued, but I may yet find that having that one check is not a problem in practice.

That said, I am conscious that my whole approach may be ham-fisted, and there is a better approach to do this sort of thing. Any ideas? At present my thinking is just to increase my wait timeouts, in case PHPUnit is introducing any strange delays. I'll also see if I can get a failure case to examine the logs.

† Ah, sadly it won't simplify things much. I just tried this on an simple signal integration test I regard as reliable, and since the backgrounded system() returns immediately, it still has to loop-wait to identify the right log record, and then for the right post-kill result. However, it no longer has to wait for a PID to be written to a temp file, so that is at least one loop eliminated.

Solution

As I mentioned in the question, the first reliability change I tried was to inject the ability for worker tasks to run kill on themselves. In my case this was built into the system, but readers may find that writing a child test class and changing their DI config would be a convenient way to do it.

This seems to have improved reliability a good deal. Originally, there were several wait loops in the tests, and the test would have to run the kill at the right moment:

Wait for the PID of the child to become available
Wait for the child log files to indicate it is ready to kill
Issue the kill
Wait for the child log files to indicate the signal handler ran correctly

The issue may have been in (2) - if this is too short then the kill may sometimes arrive too late, and even if a reliable max wait time is found, if the CPU is under unexpected load then it may still be prone to failure.

I have now written a quick script to repeatedly run the PHPUnit tests, either for 200 iterations, or to the first failure, whichever comes first. This now passes 200 iterations, so for the time being I'll regard the test reliability as having gone up. However I will update here if this changes - perhaps running the tests with a high nice will trigger a failure.