I'm developing an executor in Rust to handle compute-intensive tasks as part of a larger web application. The executor receives jobs from RabbitMQ (using lapin) and processes various CPU-bound operations, which include some that internally utilize rayon. I'm aware that async can complicate rayon's usage. Post-computation or task cancellation, it communicates results or cancellation status back to RabbitMQ.
The operations in question are out of my control, consisting mainly of long-running, parallel graph algorithms. I'm exploring strategies to cancel these tasks reactively through a separate RabbitMQ cancellation queue without rewriting the underlying libraries due to their (to my understanding) non-compatibility with async patterns and the general lack of built-in task cancellation support for safety reasons.
I've prototyped an executor using tokio, only to recognize the potential issues with rayon after the fact. The current cancellation strategy is based on tokio oneshot channels, which inefficiently drop tasks without liberating CPU resources. I am looking for advice on crafting an idiomatic executor that enables effective task cancellation and conforms to Rust's safety guarantees.
Edit: I‘ll try to be more specific: Essentially, I am looking for a strategy to cancel (long-running, blocking, CPU-bound) tasks from another thread, effectively aborting the computation and freeing up the CPU. Gracefully signaling shutdown using a mechanism such as tokio‘s oneshot channel is not an option since I don‘t have control over the implementation of the long running operation.
The only sound way to stop a thread is for it to cooperate with that. There are multiple ways to do this, but they all involve something pervasive:
async code may intentionally yield (return
poll()) to give the executor or containing future a chance to decide not to poll it again.
So, given your constraints of calling into code you do not control, the most you can possibly do is let the thread continue until control returns to code you wrote (at which point you can
panic!() to cleanly terminate the thread).
Some ideas for how to proceed:
rayon::join or equivalent.
You may wonder: why does cancelling a thread corrupt state? Two examples:
The classic example that got Java to deprecate its
Thread.stop() is the consideration of what happens to locks the thread is holding: if they remain locked, then whatever the locks are shared with will be itself blocked forever, creating a contagion of hung threads; if they are unlocked, then invalid half-edited state may be revealed. (In Rust, we have “poisoning” of locks, but this does not address e.g. possibly dangling pointers or other kinds of invalid values; only application-level situations that could arise from a panic unwind.)
Threads may borrow data from other threads' stacks; both Rayon and
std::thread::scope() do this. Cancelling a thread abruptly would lead to use-after-free.
Both of these things could potentially be addressed, by adding suitable checks to them or prohibiting them (e.g. using strictly channels and atomics, no locks), but you'd have to make sure that all of the code the thread executes obeys suitable restrictions, and Rust, being a language without a “VM” or “runtime”, does not impose such restrictions globally.