Search code examples
graalvmfastr

Does GraalVM use same thread and heap space when calling polyglot functions?


If I call R code from Java within GraalVM (using GraalVM's polyglot function), does the R code and the Java code run on the same Java thread (ie there's no switching between OS or Java threads etc?) Also, is it the same "memory/heap" space? That is, in the example code below (which I took from https://www.baeldung.com/java-r-integration)

public double mean(int[] values) {
    Context polyglot = Context.newBuilder().allowAllAccess(true).build();
    String meanScriptContent = RUtils.getMeanScriptContent(); 
    polyglot.eval("R", meanScriptContent);
    Value rBindings = polyglot.getBindings("R");
    Value rInput = rBindings.getMember("c").execute(values);
    return rBindings.getMember("customMean").execute(rInput).asDouble();
}

does the call rBindings.getMember("c").execute(values) cause the values object (an array of ints) to be copied? Or is GraalVM smart enough to consider it a pointer to the same memory space? If it's a copy, is the copying time the same (or similar, ie within say 20%) time as if I were to a normal java clone() operation? Finally, does calling a polyglot function (in this case customMean implemented in R) have the same overhead as calling a native Java function? Bonus question: can the GraalVM JIT compiler even compile accross the layers, eg say I had this:

final long sum = IntStream.range(0,10000)
.stream()
.map(x -> x+4)
.map(x -> <<<FastR version of the following inverse operation: x-4 >>>)
.sum();

would the GraalVM compiler be as smart as say a normal Java JIT compiler and realize that the whole above statement can be simply written without the two map operations (Since they cancel each other out)?

FYI: I'm considering using GraalVM to run both my Java code and my R code, once the issue I identified here is resolved (Why is FASTR (ie GraalVM version of R) 10x *slower* compared to normal R despite Oracle's claim of 40x *faster*?) and one of the motivitations is that I hope to eliminate the 50% of time that calling R (using RServe()) from Java is spent on network IO (because Java communicates with RServer over TCP/IP and RServe and Java are on different threads and memory spaces etc etc.)


Solution

  • does the R code and the Java code run on the same Java thread. Also, is it the same "memory/heap" space?

    Yes and yes. You can even use GraalVM VisualVM to inspect the heap: it provides standard Java view where you can see instances of FastR internal representations like RIntVector mingled with the rest of the other Java objects, or R view where you can see integer vectors, lists, environments, ...

    does the call rBindings.getMember("c").execute(values) cause the values object (an array of ints) to be copied?

    In general yes: most objects are passed to R as-is. Inside R you have two choices:

    • Explicitly convert them to some concrete type, i.e., as.integer(arg), which does not make a copy, but tells R explicitly how you want that value to be treated as "native" R type including R's value semantics.
    • Leave it up to the default rules, which will be applied once your objects is passed to some R builtin, e.g., int[] is treated as integer vector (but note that treating it as a list would be also reasonable in some cases). Again no copies here. And the object itself keeps its reference semantics.

    However, sometimes FastR needs to make a copy:

    • some builtin functions cannot handle foreign objects yet
    • R language often implicitly copies vectors, because of its value semantics, arguments coercion, etc.
    • when a vector is passed to native R extension, we need to move its data to off heap memory

    I would say that if you happen to have a very large vector, say GBs of data, you need to be very careful about it even in regular R. Note: FastR vectors are by default backed by Java arrays, so their size limitations apply to FastR vectors too.

    Finally, does calling a polyglot function (in this case customMean implemented in R) have the same overhead as calling a native Java function?

    Mostly yes, except that the function cannot be pulled and inlined into the surrounding Java code(+). The call itself is as fast as regular Java call. For the example you give: it cannot be optimized as you suggest, because the R function cannot be inlined(+). However, I would be very skeptical that any compiler can optimize this as you suggest even if both functions where pure Java code. That being said, yes: some things that compiler can optimize, like eliminating some useless computations that it can analyze well, is not going to work because of the impossibility to inline code across the Java <-> R boundary(+).

    (+) Unless you'd run the Java code with Espresso (Java on Truffle), but then you would not be using Context API but Espresso's interop support.