Search code examples
javaperformanceblockingjava-21virtual-threads

Virtual threads seem to block carrier threads on external service call


I have a Spring Boot microservice that consumes messages from a RabbitMQ, composes emails, and sends them to an SMTP server. It consists of the following components:

  1. Email sender that composes an email end sends it to the SMTP server:
@Service
@RequiredArgsConstructor
public class MessageSender {

    private final JavaMailSender sender;

    public void sendMessage(final RabbitEmailDto emailDto) {
        MimeMessage message = sender.createMimeMessage();
        message.setRecipients(Message.RecipientType.TO, emailDto.getTo());

        MimeMessageHelper helper = new MimeMessageHelper(message, CharEncoding.UTF_8);
        helper.setSubject(emailDto.getData().getEmail().getSubject());
        helper.setText(emailDto.getHtml(), true);
        helper.setFrom(emailDto.getFrom());

        sender.send(message);
    }
}
  1. Message processor that gets a list of RabbitMQ messages, for each of them invokes the message sender in a separate virtual thread, and returns a list of futures of the emails send result
@Service
@RequiredArgsConstructor
public class MessageProcessor {

    private final ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor();
    private final MessageSender messageSender;
    private final Jackson2JsonMessageConverter converter;

    public List<MessageProcessingFuture> processMessages(List<Message> messages) {
        List<MessageProcessingFuture> processingFutures = new ArrayList<>();

        for (Message message : messages) {
            MessageProcessingFuture messageProcessingFuture = new MessageProcessingFuture(
                    message.getMessageProperties().getDeliveryTag(),
                    processMessageAsync(message, executor)
            );
            processingFutures.add(messageProcessingFuture);
        }

        return processingFutures;
    }

    private Future<?> processMessageAsync(final Message message) {
        RabbitEmailDto rabbitEmailDto = (RabbitEmailDto) converter.fromMessage(message);
        MessageDeliverySystemEmailDto email = rabbitEmailDto.getData().getEmail();

        return executor.submit(() -> messageSender.sendMessage(rabbitEmailDto));
    }
}
  1. RabbitMQ message listener that consumes messages from the Rabbit queue, passes them to the processor, and then handles the futures it got from the processor by sending acknowledgement or rejection to the RabbitMQ depending on whether the Future.get() threw an exception or did not.
@Component
@RequiredArgsConstructor
public class BatchMessageListener implements ChannelAwareBatchMessageListener {

    private final MessageProcessor messageProcessor;

    @Override
    @MeasureExecutionTime
    public void onMessageBatch(final List<Message> messages, final Channel channel) {
        messageProcessor.processMessages(messages)
                .forEach(processingFuture -> processFuture(processingFuture, channel));
    }

    private void processFuture(final MessageProcessingFuture future, final Channel channel) {
        try {
            future.deliveryFuture().get();
            channel.basicAck(future.deliveryTag(), false);
        } catch (Exception e) {
            channel.basicReject(future.deliveryTag(), false);
        }
    }
}

I can see in logs that the MessageSender.sendMessage method is indeed executing in a virtual thread, identified like VirtualThread[#100]/runnable@ForkJoinPool-1-worker-1.

And I can see that I have 4 of those workers on our production server. (Am I correct that those workers are the actual platform threads, or carrier threads?)

I also see that it usually takes about 1 sec for the MessageSender.sendMessage method to complete, with most of this time spent awaiting a response from the SMTP server.

Based on what I had learned about the virtual threads, I anticipated that processing of a batch of 100 messages (it is my configured batch size for the BatchMessageListener) will take about 1 second because platform threads will not block on the calls to the SMTP server. And those 4 platform threads will be shared among 100 virtual threads, effectively allowing 100 nearly simultaneous calls to the SMTP server.

However, in practice, I have observed that the messages are being processed 4 at a time, and it takes about 25 sec to process all the 100 messages.

During local testing on my computer, I intentionally introduced a 1-second delay by adding the Thread.sleep(1000); before the sender.send(message); line in the MessageSender to simulate network latency. And back then, a batch of 100 messages were indeed processed in just about 1 second, despite the fact that I had only 10 carrier threads according to the logs.

I'm puzzled. Why the carrier threads do not block on the Thead.sleep call, but block on a call of an external service? Do I do something wrong?


Solution

  • Pinning

    Sounds like pinning, when a virtual thread is not dismounted from its carrier thread.

    Pinning is usually due to:

    If you have four carrier threads (platform threads dedicated to serving virtual threads), and these virtual threads are pinned much of the time, then you have effectively limited the work throughput to those four carrier threads. The remaining 96 of 100 tasks must wait until the first four complete, then 92 wait until the next four complete, and so on. In such a case, you should be using platform threads rather than virtual threads. The virtual threads are of no benefit, and are actually creating extra work.

    Detect pinning

    See the article, Java Virtual Thread Pinning, by Todd Ginsberg for a guide on how to detect pinning. He describes how to detect pinning by:

    You can view JFR results with JDK Mission Control (JMC).

    Note that you can adjust the threshold for detecting pinning. I vaguely recall the default is 20 milliseconds. But you should verify and decide your own useful value.

    Workarounds/alternatives

    You later Commented:

    I see that the org.springframework.mail.javamail.JavaMailSender, which I use, is calling several synchronized methods while sending an email. Am I getting it correct that there isn't much that can be done to benefit from the virtual treads in this case?

    Correct… if that synchronized code is long-running.

    Virtual threads are not appropriate for tasks that involve long-running code that is either synchronized or native (JNI, etc.). In both cases the JVM’s virtual thread scheduler is unable to see when such code is blocked, so it remains assigned to the platform carrier thread.

    For occasional encounters, this situation is no big deal. But for repeated or sustained encounters, this situation means you will get little or no benefit from virtual threads, and virtual threads may actually impair overall performance. Sounds like your case fits the “repeated or sustained encounters“ category.

    The Project Loom team continues to look for ways around the synchronized limitation. But still an issue as of Java 21 & 22.

    Replacing synchronized with ReentrantLock

    If the synchronized code is your own, then replace any long-running uses of synchronized with a ReentrantLock to regain efficiency with virtual threads. That change is simple enough.

    Some folks have misinterpreted this guidance as “replace all use of synchronized with ReentrantLock”. This is not necessary. Only long-running synchronized code need be modified. Brief code, such as guarding access to a variable that is usually available, can be left in place as so little time is spent within the synchronized code.

    If you cannot modify the source code of long-running synchronized tasks, then use platform threads rather than virtual threads. But keep in mind, if those tasks have multiple sub-tasks that can be multi-threaded, the task in a platform thread may benefit from running those sub-tasks with virtual threads.

    For more info

    I'm no expert on this. So listen to the experts, not me.