java concurrency java.util.concurrent fork-join forkjoinpool

ForkJoinPool.invoke() and ForkJoinTask.invoke() or compute()

I was reading Java ForkJoin framework. What extra benefits are there by not directly call invoke() on an implementation of ForkJoinTask (e.g. RecursiveTask), but to instantiate ForkJoinPool and call pool.invoke(task)? What exactly happens when we call these 2 methods all called invoke?

From the source, it seems that if recursiveTask.invoke is called, it will invoke its exec and eventually compute, in a managed thread pool manner. As such it's even more confusing why we have the idiom pool.invoke(task).

I wrote some simple code to test for performance difference, but I didn't see any. Maybe the test code is wrong? See below:

public class MyForkJoinTask extends RecursiveAction {

    private static int totalWorkInMillis = 20000;
    protected static int sThreshold = 1000;

    private int workInMillis;


    public MyForkJoinTask(int work) {
        this.workInMillis = work;
    }

    // Average pixels from source, write results into destination.
    protected void computeDirectly() {
        try {

            ForkJoinTask<Object> objectForkJoinTask = new ForkJoinTask<>();
            Thread.sleep(workInMillis);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

    @Override
    protected void compute() {
        if (workInMillis < sThreshold) {
            computeDirectly();
            return;
        }

        int discountedWork = (int) (workInMillis * 0.9);
        int split = discountedWork / 2;

        invokeAll(new MyForkJoinTask(split),
                new MyForkJoinTask(split));
    }

    public static void main(String[] args) throws Exception {
        System.out.printf("Total work is %d in millis.%n", totalWorkInMillis);
        System.out.printf("Threshold is %d in millis.%n", sThreshold);

        int processors = Runtime.getRuntime().availableProcessors();
        System.out.println(Integer.toString(processors) + " processor"
                + (processors != 1 ? "s are " : " is ")
                + "available");

        MyForkJoinTask fb = new MyForkJoinTask(totalWorkInMillis);

        ForkJoinPool pool = new ForkJoinPool();

        long startTime = System.currentTimeMillis();


        // These 2 seems no difference!
        pool.invoke(fb);
//        fb.compute();


        long endTime = System.currentTimeMillis();

        System.out.println("Took " + (endTime - startTime) +
                " milliseconds.");
    }
}

Solution

The compute() method of the RecursiveTask class is just an abstract method that contains the task code. It doesn't use a new thread from the pool, and if you call it normally, it's not run in a pool managed thread.

The invoke method on the fork join pool submits a task to the pool, which then starts running on a separate thread, calls the compute method on that thread, and then waits for a result.

You can see this in the wording in the java doc for RecursiveTask and ForkJoinPool. The invoke() method actually performs the task, whereas the compute() method just encapsulates the computation.

protected abstract V compute()
The main computation performed by this task.

And ForkJoinPool

public <T> T invoke(ForkJoinTask<T> task)
Performs the given task, returning its result upon completion. ...

So with the compute method, what you're doing is running the first call to compute outside of the fork join pool. You can test this by adding the log line inside the compute method.

System.out.println(this.inForkJoinPool());

You can also check that it's running in the same thread by logging the thread id

System.out.println(Thread.currentThread().getId());

Once you call invokeAll, the subtasks included in that call are then run in a pool. Note though, that it is NOT necessarily run in the pool you created just before calling compute(). You can comment out your new ForkJoinPool() code, and it will still run. Interestingly, the java 7 doc says that the invokeAll() method will throw an exception if it's called outside of a pool managed thread, but the java 8 doc doesn't. I haven't tested it in java 7 mind you (only 8). But quite possibly, your code would throw an exception when calling compute() directly in java 7.

The reason both results are returning the same time is that milliseconds aren't quite accurate enough to record the difference of starting the first thread in a pool managed thread, or just running the first compute call in an existing thread.

The way the OCA/OCP study guide by Sierra and Bates recommend you use the fork join framework is to call invoke() from the pool. It makes it clear which pool you are using, and it also means you can submit multiple tasks to the same pool, which saves the overhead of recreating new pools each time. Logically, it's also cleaner to keep all of your task computation within a pool managed thread (or at least I think it is).

pool.invoke() calls invoke on a particular pool; rather than leaving it up to the framework to create one, when task.invoke or task.invokeAll is called for the first time. It means you can re-use the pool for new tasks, and specify things like number of active threads when creating the pool. That's the difference. Add those log lines to your code, play around with it, and you'll see what it's doing it