Search code examples
cwindowsassemblyx86-64delimited-continuations

Practical Delimited Continuations in C / x64 ASM


I've look at a paper called A Primer on Scheduling Fork-Join Parallelism with Work Stealing. I want to implement continuation stealing, where the rest of the code after calling spawn is eligible to be stolen. Here's the code from the paper.

1 e();
2 spawn f(); 
3 g();
4 sync;
5 h();

An import design choice is which branch to offer to thief threads. Using Figure 1, the choices are:

Child Stealing:

  • f() is made available to thief threads.
  • The thread that executed e() executes g().

Continuation Stealing:

  • Also called “parent stealing”.
  • The thread that executed e() executes f().
  • The continuation (which will next call g()) becomes available to thief threads.

I hear that saving a continuation requires saving both sets of registers (volatile/non-volatile/FPU). In the fiber implementation I did, I ended up implementing child stealing. I read about the (theoretical) negatives of child stealing (unbounded number of runnable tasks, see the paper for more info), so I want to use continuations instead.

I'm thinking of two functions, shift and reset, where reset delimits the current continuation, and shift reifies the current continuation. Is what I'm asking even plausible in a C environment?

EDIT: I'm thinking of making reset save return address / NV GPRs for the current function call (= line 3), and making shift transfer control to the next continuation after returning a value to the caller of reset.


Solution

  • I've implemented work stealing for a HLL called PARLANSE rather than C on an x86. PARLANSE is used daily to build production symbolic parallel programs at the million line scale.

    In general, you have preserve the registers of both the continuation or the "child". Consider that your compiler may see a computation in f() and see the same computation in g(), and might lift that computation to the point just before the spawn, and place that computation result in a register that both f() and g() use as in implied parameter. Yes, this assumes a sophisticated compiler, but if you are using a stupid compiler that doesn't optimize, why are you trying to go parallel for speed?

    In specific, however, your compiler could arrange for the registers to be empty before the call to spawn if it understood what spawn means. Then neither the continuation or the child has to preserve registers. (The PARLANSE compiler in fact does this).

    So how much has to be saved depends on how much your compiler is willing to help, and that depends on whether it knows what spawn really does.

    Your local friendly C compiler likely doesn't know about your implementation of spawn. So either you do something to force a register flush (don't ask me, its your compiler) or you put up with the fact that you personally don't know what's in the registers, and your implementation preserves them all to be safe.

    If the amount of work spawned is significant, arguably it wouldn't matter if you saved all the registers. However, the x86 (and other modern architectures) seems have an enormous amount of state, mostly in the vector registers, that might be in use; last time I looked it was well in excess of 500 bytes ~~ 100 writes to memory to save these and IMHO that's an excessive price. If you don't believe these registers are going to be passed from the parent thread to the spawned thread, then you can work on enforcing spawn with no registers.

    If you spawn routine wakes up using a standard continuation mechanism you have invented, then you have worry about whether your continuations pass large register state or not, also. Same problem, same solutions as for spawn; the compiler has to help or you personally have to intervene.

    You'll find this a lot of fun.

    [If you want to make it really interesting, try timeslicing the threads in case they go into deep computation without an occasional yeild causing thread starvation. Now you surely have save the entire state. I managed to get PARLANSE to realize spawning with no registers saved, yet have the time slicing save/restore full register state, by saving full state on a time slice, and continuing at a special place that refilled all the registers before it passed control to the time-sliced PC location].