Here is my situation: I am writing a UNIX |
for a Operating Systems course. I finished my implementation; however, I noticed that I was non-deterministically passing test cases. The test case in question is of the form:
def test_no_orphans(self):
self.assertTrue(self.make, msg='make failed')
subprocess.call(('strace', '-o', 'trace.log','./pipe','ls','wc','cat','cat'))
ps = subprocess.Popen(['grep','-o','clone(','trace.log'], stdout=subprocess.PIPE)
out1 = subprocess.check_output(('wc','-l'), stdin=ps.stdout)
ps.wait()
ps.stdout.close()
ps = subprocess.Popen(['grep','-o','wait','trace.log'], stdout=subprocess.PIPE)
out2 = subprocess.check_output(('wc','-l'), stdin=ps.stdout)
ps.wait()
ps.stdout.close()
out1 = int(out1.decode("utf-8")[0])
out2 = int(out2.decode("utf-8")[0])
if out1 == out2 or out1 < out2:
orphan_check = True
else:
orphan_check = False
self.assertTrue(orphan_check, msg="Found orphan processes")
subprocess.call(['rm', 'trace.log'])
self.assertTrue(self._make_clean, msg='make clean failed')
After inspecting the relevant log, I find that:
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f04174ad850) = 1330
close(4) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1330, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
pipe([4, 5]) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f04174ad850) = 1331
close(5) = 0
close(3) = 0
pipe([3, 5]) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f04174ad850) = ? ERESTARTNOINTR (To be restarted)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1331, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f04174ad850) = 1332
close(5) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1332, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
close(4) = 0
pipe([4, 5]) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f04174ad850) = 1333
specifically, the line with ERESTARTNOINTR (To be restarted)
is causing me to fail. I reported this to the TA in charge of the assignment after my professor helped me debug and said it is possibly a bug in the test cases.
I was wondering why exactly clone()
could fail, as it seems like fork()
handles this in a nice C wrapper for us. I was also wondering what specifically is happening here, and also why it seems that system calls in general seem to have an incomplete
and a resumed
state, as my OS professor said he doesn't know too much about it, and he basically said it could be "race conditions in the kernel".
The man page for clone(2)
says that ERESTARTNOINTR
is returned when
System call was interrupted by a signal and will be restarted. (This can be seen only during a trace.)
The failing clone
syscall in the provided log was interrupted by the SIGCHLD
signal for pid 1331.