I'm trying to use the third argument of a SA_SIGINFO sigaction
to jump to the interrupted context directly.
Thought this:
void action(int Sig, siginfo_t *Info, void *Uctx) {
ucontext_t *uc = Uctx; setcontext(uc);
}
would have the same effect as just:
void action(int Sig, siginfo_t *Info, void *Uctx) {
return;
}
but curiously it accepts three signals (that invoke the setcontext-calling handler), and then it segfaults
in setcontext
:
Dump of assembler code for function setcontext:
0x00007ffff7a34180 <+0>: push %rdi
0x00007ffff7a34181 <+1>: lea 0x128(%rdi),%rsi
0x00007ffff7a34188 <+8>: xor %edx,%edx
0x00007ffff7a3418a <+10>: mov $0x2,%edi
0x00007ffff7a3418f <+15>: mov $0x8,%r10d
0x00007ffff7a34195 <+21>: mov $0xe,%eax
0x00007ffff7a3419a <+26>: syscall
0x00007ffff7a3419c <+28>: pop %rdi
0x00007ffff7a3419d <+29>: cmp $0xfffffffffffff001,%rax
0x00007ffff7a341a3 <+35>: jae 0x7ffff7a34200 <setcontext+128>
0x00007ffff7a341a5 <+37>: mov 0xe0(%rdi),%rcx
--Type <RET> for more, q to quit, c to continue without paging--
0x00007ffff7a341ac <+44>: fldenv (%rcx)
=> 0x00007ffff7a341ae <+46>: ldmxcsr 0x1c0(%rdi)
0x00007ffff7a341b5 <+53>: mov 0xa0(%rdi),%rsp
0x00007ffff7a341bc <+60>: mov 0x80(%rdi),%rbx
and the fault address shown by strace is 0 (a catchable SIGSEGV).
Here's an example program that's using a timer to send the three signals:
#include <unistd.h>
#include <sys/time.h>
#include <ucontext.h>
#include <signal.h>
void action(int Sig, siginfo_t *Info, void *Uctx) {
ucontext_t *uc = Uctx; setcontext(uc);
}
int main(void) {
char ch[100];
sigaction(SIGALRM, &(struct sigaction){.sa_sigaction = action, .sa_flags = SA_SIGINFO}, 0);
setitimer(ITIMER_REAL, &(struct itimerval){.it_interval.tv_sec = 1,.it_value.tv_sec = 1}, 0);
write(1, "enter\n", 6);
for (;;) {
write(1, "{\n", 2);
read(0, &ch[0], sizeof(ch));
write(1, "}\n", 2);
}
}
What is going on in this situation?
I think this is something that's simply not meant to work: you are only supposed to call setcontext
with a context obtained from getcontext
or makecontext
, and not with the context passed to a signal handler.
The man page hints at this obliquely:
If the context was obtained by a call to a signal handler, then old standard text says that "program execution continues with the program instruction following the instruction interrupted by the signal". However, this sentence was removed in SUSv2, and the present verdict is "the result is unspecified".
Also, the glibc source of setcontext
has a comment:
This implementation is intended to be used for synchronous context switches only. Therefore, it does not have to restore anything other than the PRESERVED state.
Indeed, it does not attempt to restore any of the floating-point registers, and it zeroes rax
(as for getcontext
returning 0). That would be pretty bad for trying to resume code that isn't expecting its registers to change spontaneously.
Asynchronous context switching would be needed for something like preemptive multitasking in userspace. I think the idea is that since pthreads is now firmly established, people should have no need for this, so it's not supported. getcontext/setcontext
date from an earlier era, and in fact have since been removed from the POSIX spec on the premise that pthreads should be used instead.
This particular crash seems to be caused by a mismatch between the kernel's layout of struct ucontext_t
, and what libc expects. In particular, libc expects the floating-point state, including the saved value of mxcsr
, at a particular offset within struct ucontext_t
. However the kernel pushes the floating point state at a separate location on the stack (which happens to overlap where libc expects it), and includes a pointer to it inside struct ucontext_t
. So libc's setcontext
attempts to load some garbage value into mxcsr
, which has some of the reserved bits 16-31 set, and this causes a general protection fault.
However, as noted above, this mismatch is the least of the problems.