If a running process's executable is deleted, I've noticed fork
fails where the child process is never executed.
For example, consider the code below:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void) {
sleep(5);
pid_t forkResult;
forkResult = fork();
printf("after fork %d \n", forkResult);
return 0;
}
If I compile this and delete the resulting executable before fork
is called, I never see fork
return a pid of 0, meaning the child process never starts. I only have a Mac running Big Sur, so not sure if this repros on other OS's.
Does anyone know why this would be? My understanding is an executable should work just fine even if it's deleted while still running.
The expectation that the process should continue even if the binary was deleted is correct, however not fully correct in case of macOS
. The example is tripping on a side-effect of the System Integrity Protection
(SIP
) mechanism inside the macOS kernel, however before explaining what is exactly going on, we need to make several experiments which will help us to better understand the whole scenario.
To demonstrate what is going on, I had modified the example to count to 9, than do the fork, after the fork, the child will print a message "I am done", wait 1 second and exit by printing the 0
as the PID. The parent will continue to count to 14 and print the child PID. The code is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void) {
for(int i=0; i <10; i++)
{
sleep(1);
printf("%i ", i);
}
pid_t forkResult;
forkResult = fork();
if (forkResult != 0) {
for(int i=10; i < 15; i++) {
sleep(1);
printf("%i ", i);
}
} else {
sleep(1);
printf("I am done ");
}
printf("after fork %d \n", forkResult);
return 0;
}
After compiling it, I have started the normal scenario:
╰> ./a.out
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 4385
So, the normal scenario works as expected. The fact that we see the count from 0 to 9 two times, is due to the copy of the buffers for stdout
that was done in the fork call.
Now is time to do the negative scenario, we will wait for 5 seconds after the start and remove the binary.
╰> ./a.out & (sleep 5 && rm a.out)
[4] 8555
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 8677
[4] 8555 done ./a.out
We see that the output is only from the parent. Since the parent had counted to 14, and shows valid PID for the child, however the child is missing, it never printed anything. So, the child creation failed after the fork()
was performed, otherwise fork()
would have received and error instead of a valid PID
. Traces from ktrace
reveal that the child was created under the pid and was waken up:
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.3 MACH_DISPATCH 1bc 0 84 4 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.2 TMR_TimerCallEnter 9931ba49ead1bd17 0 330e7e4e9a59 41 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.0(0.0) TMR_TimerCallEnter 9931ba49ead1bd17 0 330e7e4e9a59 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623783 +04 0.0 TMR_TimerCallEnter 9931ba49ead1bd17 0 330e7e4e9a59 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623854 +04 0.0 imp_thread_qos_and_relprio 88775d 20000 20200 6 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623854 +04 0.0 imp_update_thread 88775d 811200 140000100 1f 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.1(0.8) imp_update_thread 88775d c15200 140000100 25 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0(1.1) imp_thread_qos_and_relprio 88775d 30000 20200 40 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0 imp_thread_qos_workq_override 88775d 30000 20200 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0 imp_update_thread 88775d c15200 140000100 25 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.1(0.1) imp_update_thread 88775d c15200 140000100 25 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623855 +04 0.0(0.2) imp_thread_qos_workq_override 88775d 30000 20200 40 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623857 +04 1.3 TURNSTILE_turnstile_added_to_thread_heap 88775d 9931ba6049ddcc77 0 0 888065 2 a.out(8677)
test5-ko.txt:2021-04-07 13:34:26.623858 +04 1.0 MACH_MKRUNNABLE 88775d 25 0 5 888065 2 a.out(8677)
t
So the child's process was dispatched with MACH_DISPATCH
and made runnable with MACH_MKRUNNABLE
. This is the reason the parent got valid PID
after the fork()
.
Further more the ktrace
for the normal scenario shows that the process had issued BSC_exit
and and imp_task_terminated
system call occurred, which is the normal way for a process to exit. However, in the second scenario where we had deleted the file, the trace doesn't show BSC_exit
. This means that the child was terminated by the kernel, not by a normal termination. And we know that the termination happend after the child was created properly, since the parent had received the valid PID and the PID was made runnable.
This bring us closer to the understanding of what is going on here. But, before we have the conclusion, let's show another even more "twisted" example.
What if we replace the binary on the filesystem after we started the process?
Here is the test to answer this question: we will start the process, remove the binary and create an empty file with the same name on his place with touch
.
╰> ./a.out & (sleep 5 && rm a.out; touch a.out)
[1] 6264
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 6851
[1] + 6722 done ./a.out
Wait a minute, this works!? What is going on here!?!?
This strange example gives us important clue that will help us to explain what is going on.
The reason why the third example works, while the second one is failing, reveals a lot of what is going on here. As mentioned on the beginning, we are tripping on a side-effect of SIP
, more precisely on the runtime protection
mechanism.
To protect the system integrity, SIP
will examine the running processes for the system protection
and special entitlement
. From the apple documentation: ...When a process is started, the kernel checks to see whether the main executable is protected on disk or is signed with an special system entitlement. If either is true, then a flag is set to denote that it is protected against modification. Any attempt to attach to a protected process is denied by the kernel...
When we had removed the binary from the filesystem, the protection mechanism was not able to identify the type of process for the child nor the special system entitlements since the binary file was missing from the disk. This triggered the protection mechanism to treat this process as an intruder in the system and terminate it, hanse we had not seen the BSC_exit
for the child process.
In the third example, when we created dummy entry on the file system with touch
, the SIP
was able to detect that this is not a special process nor it has special entitlements and allowed the process to continue. This is a very solid indication that we ware tripping on the SIP
realtime protection mechanism.
To prove that this is the case, I have disabled the SIP
which requires a restart in the recovery mode and executed the test
╰> csrutil status
System Integrity Protection status: disabled.
╰> ./a.out & (sleep 5 && rm a.out)
[1] 1504
0 1 2 3 4 5 6 7 8 9 I am done after fork 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 after fork 1626
So, the whole issue was caused by the System Integrity Protection
. More details can be fond in the documentation
All the SIP
needed was to have a file on the filesystem with the process name, so the mechanism can run the verification and decide to allow the child to continue the execution. This is showing us that we are observing a side-effect, rather than designed behavior, since the empty file was not even a valid dwarf
, yet the execution had proceed.