Why systemTap script report a read fault near operator error?

I'm running SystemTap on CentOS Linux release 7.6.1810. The version of SystemTap is:

$ stap -V
Systemtap translator/driver (version 4.0/0.172/0.176, rpm 4.0-11.el7)
Copyright (C) 2005-2018 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
tested kernel versions: 2.6.18 ... 4.19-rc7
enabled features: AVAHI BOOST_STRING_REF DYNINST BPF JAVA PYTHON2 LIBRPM LIBSQLITE3 LIBVIRT LIBXML2 NLS NSS READLINE


$ uname -rm
3.10.0-957.21.3.el7.x86_64 x86_64

$ rpm -qa | grep kernel-devel
kernel-devel-3.10.0-957.21.3.el7.x86_64

$ rpm -qa | grep kernel-debuginfo
kernel-debuginfo-3.10.0-957.21.3.el7.x86_64
kernel-debuginfo-common-x86_64-3.10.0-957.21.3.el7.x86_64

I have a systemTap script named sg.stp, which use to monitor why k8s pods of a rabbitmq cluster terminated with exit code 137 occasionally:

global target_pid = 32719
probe signal.send{
  if (sig_pid == target_pid) {
    printf("%s(%d) send %s to %s(%d)\n", execname(), pid(), sig_name, pid_name, sig_pid);
    printf("parent of sender: %s(%d)\n", pexecname(), ppid())
    printf("task_ancestry:%s\n", task_ancestry(pid2task(pid()), 1));
  }
}

When I run the script, it reported an error after a while:

$  stap sg.stp
ERROR: read fault [man error::fault] at 0x4a8 near operator '@cast' at /usr/share/systemtap/tapset/linux/task.stpm:2:5
epmd(29073) send SIGCHLD to rabbitmq-server(32719)
parent of sender: rabbitmq-server(32719)
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

Solution

pid2task() can return NULL

Check for pid2task(pid()) or current_task() returning NULL like that:

task = pid2task(pid());
if (task) {
  printf("task_ancestry:%s\n", task_ancestry(task, 1));
} else {
  printf("task_ancestry more available\n");
}

Note that I am not completely sure about following explanation:

It can happen that the task_struct is no more available, even when you are in the context of the running pid(), because the process has already died and the task_struct is cleaned up because it is no more needed.

In that case pid2task() returns NULL. AFAICS this can happen to pid() in following two situations (and perhaps more):

Your probe is asynchronous to the running process - in your case with signal probes this seems to be the case here.
The .return probe executes too late, perhaps because it was stuck in the kernel too long (like for blocking calls).

For the latter there seems to be some easy workaround:

Instead of task_ancestry(current_task()) use @entry(task_ancestry(current_task())). This way the data is gathered at the entry point of the syscall, where it is very likely that the process is still perfectly alive.

However in your Signal case I do not see such simple workaround, hence you must check for NULL.

Note that I am not completely sure that this is your problem and that checking for NULL without some page locking is the perfect solution. Because even if you get a pointer to some structure, the pages which contain the structure might go away in the middle of the probe, thanks to SMP. Perhaps stap somehow protects against this. But I doubt. Race conditions like this are really weird to debug and avoid.