Search code examples
javacachingthread-localvirtual-threads

Is it possible to create a ThreadLocal for the carrier thread of a Java virtual thread?


JEP-425: Virtual Threads states that a new virtual thread "should be created for every application task" and makes twice a reference to the possibility of having "millions" of virtual threads running in the JVM.

The same JEP implies that each virtual thread will have access to its own thread-local value:

Virtual threads support thread-local variables [...] just like platform threads, so they can run existing code that uses thread locals.

Thread locals are many times used for the purpose of caching an object that is not thread-safe and expensive to create. The JEP warns:

However, because virtual threads can be very numerous, use thread locals after careful consideration.

Numerous indeed! Especially given how virtual threads are not pooled (or at least shouldn't be). As representative of a short-lived task, using thread locals in a virtual thread for the purpose of caching an expensive object seems to be borderline void of meaning. Unless! We can from a virtual thread create and access thread locals bound to its carrier thread 🤔

For clarification, I would like to go from something like this (which would have been perfectly acceptable when using only native threads capped to the size of a pool, but this is clearly no longer a very effective caching mechanism when running millions of virtual threads continuously re-created:

static final ThreadLocal<DateFormat> CACHED = ThreadLocal.withInitial(DateFormat::getInstance);

To this (alas this class is not part of the public API):

static final ThreadLocal<DateFormat> CACHED = new jdk.internal.misc.CarrierThreadLocal();
// CACHED.set(...)

Before we even get there. One must ask, is this a safe practice?

Well, as far as I have understood virtual threads correctly, they are merely logical stages executed on a platform thread (aka. the "carrier thread") with the ability to unmount instead of being blocked waiting. So I am assuming - please correct me if I am wrong - that 1) the virtual thread will never be interleaved by another virtual thread on the same carrier thread or rescheduled on another carrier thread unless the code would have blocked otherwise and therefore, if 2) the operation we invoke on the cached object never blocks, then the task/virtual thread will simply run from start to finish on the same carrier and so yes, it would be safe to cache the object on a platform thread-local.

With the risk of answering my own question, JEP-425 indicates this is not possible:

Thread-local variables of the carrier are unavailable to the virtual thread, and vice-versa.

I could not find a public API to get the carrier thread or allocate thread locals explicitly on a platform thread [from a virtual thread], but that's not to say my humble research is definitive. Maybe there is a way?

Then I read JEP-429: Scoped Values which at first glance seems to be a stab by the Java Gods to get rid of the ThreadLocal altogether, or at least provide an alternative for virtual threads. In fact, the JEP uses verbiage such as "migration to scoped values" and says they are "preferred to thread-local variables, especially when using large numbers of virtual threads".

For all the use cases discussed in the JEP, I can only agree. But towards the bottom of this document, we also find this:

There are a few scenarios that favor thread-local variables. An example is caching objects that are expensive to create and use, such as instances of java.text.DateFormat. Notoriously, a DateFormat object is mutable, so it cannot be shared between threads without synchronization. Giving each thread its own DateFormat object, via a thread-local variable that persists for the lifetime of the thread, is often a practical approach.

In light of what was discussed earlier, using a thread-local may be "practical" but not very ideal. In fact, JEP-429 itself actually started off with a very telling remark: "if each of a million virtual threads has mutable thread-local variables, the memory footprint may be significant".

To summarize:

Have you found a way to allocate thread locals on a carrier thread from the virtual thread?

If not, is it safe to say that for applications using virtual threads, the practice of caching objects in a thread-local is dead and one will have to implement/use a different approach such as a concurrent cache/map/pool/whatever?


Solution

  • You wrote

    So I am assuming - please correct me if I am wrong - that

    1. the virtual thread will never be interleaved by another virtual thread on the same carrier thread or rescheduled on another carrier thread unless the code would have blocked otherwise and therefore, if
    2. the operation we invoke on the cached object never blocks, then the task/virtual thread will simply run from start to finish on the same carrier and so yes, it would be safe to cache the object on a platform thread-local.

    But the State of Loom document states:

    You must not make any assumptions about where the scheduling points are any more than you would for today’s threads. Even without forced preemption, any JDK or library method you call could introduce blocking, and so a task-switching point.

    and further:

    To that end, we plan for the VM to support an operation that tries to forcefully preempt execution at any safepoint. How that capability will be exposed to the schedulers is TBD, and will likely not make it to the first Preview.

    So

    1. The assumption that a virtual thread only releases the carrier thread when it is about to be blocked, only applies to the current preview. Preemptive switching between virtual threads is allowed and even planned for the future.

    2. Even if we assume that a virtual thread can only release the carrier thread when performing blocking operations, we can’t predict when a blocking operation may occur.

      • One example of operations outside our control, is class loading. Loading class data is a blocking operation and class loading is implemented lazily for common JVMs. It’s even possible that a method that has been invoked multiple times suddenly executes an uncommon path that uses a class that has not been used before.

      • Another example is resource loading. Even an example as simple as your DateFormat already involves resources organized in an unspecified way, time zone data or localized month and weekday names, for example.

    So, there’s no way of having a safely working carrier-local cache and your assumption that using thread locals (or alike) for caching is dead, is indeed right. You may use an object pool instead, but since this implies some sort of synchronization, you might as well consider just using a single DateFormat¹ and synchronize on it. This would implement your initial idea of not releasing the carrier thread during the use of the object.

    Of course, in this specific example, the better option is to use a DateTimeFormatter from the java.time API which is thread safe and hence, allows a single instance to be shared by all threads.

    ¹ or one of multiple, selected in a way that does not involve synchronization