Generic method to convert primitive arrays

I have old code with a lot of methods like long[] toLongArray(int[] array) but for many different primitive types configurations (on both sides) and I just wonder if it is possible to make one generic method for this - without losing performance.
First I created simple method using MethodHandles for int[] -> long[] pair:

static final MethodHandle getIntElement  = MethodHandles.arrayElementGetter(int[].class);
static final MethodHandle setLongElement = MethodHandles.arrayElementSetter(long[].class);
static long[] specializedMethodHandle(int[] array) throws Throwable {
    long[] newArray = new long[array.length];
    for (int i = 0; i < array.length; i++) getIntElement.invokeExact(newArray, i, (long) (int) setLongElement.invokeExact(array, i));
    return newArray;
}

And it works great - same performance as manual loop, so I decided to make this generic:

static Map<Class<?>, MethodHandle> metHanGettersObj = Map.of(int[].class, MethodHandles.arrayElementGetter(int[].class).asType(MethodType.methodType(Object.class, Object.class, int.class)));
static Map<Class<?>, MethodHandle> metHanSettersObj = Map.of(long[].class, MethodHandles.arrayElementSetter(long[].class).asType(MethodType.methodType(void.class, Object.class, int.class, Object.class)));
static <F, T> T genericMethodHandleObject(Class<T> to, F array) throws Throwable {
    int length = Array.getLength(array);
    Object newArray = Array.newInstance(to.getComponentType(), length);
    MethodHandle getElement = metHanGettersObj.get(array.getClass());
    MethodHandle setElement = metHanSettersObj.get(to);
    for (int i = 0; i < length; i++) setElement.invokeExact(newArray, i, getElement.invokeExact(array, i));
    return (T) newArray;
}

But this works much much slower, for my example array of 500000 elements it was over 15x slower.
What is interesting CompiledScript made with Nashorn javascript engine is around 20% faster than this code. (simple copy loop inside)

So I wonder if someone know other way to do this? I will probably not use that anywhere as it is starting to be too "hacky" but now I just need to know if it is possible at all - as no generic method with method handles works fine, so why this one is that slow, and is it possible to make it faster?

Solution

You can bootstrap together an array converter method handle, that you then cache in some static map.

Here's a benchmark including the code. The convertBootstrap method creates the converter, that's where the real magic happens:

@BenchmarkMode({ Mode.AverageTime })
@Warmup(iterations = 10, batchSize = 1)
@Measurement(iterations = 10, batchSize = 1)
@Fork(1)
@State(Scope.Thread)
public class MyBenchmark {

    int[] input;

    static final Map<Class<?>, Map<Class<?>, Function<?, ?>>> cacheGeneric = new HashMap<>();

    @Setup
    public void setup() {
        input = new Random(1).ints().limit(500_000).toArray();
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    public long[] manual() {
        long[] result = new long[input.length];
        for(int i = 0 ; i < input.length; i++) {
            result[i] = input[i];
        }
        return result;
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    public long[] cachedGeneric() {
        return getWrapped(int[].class, long[].class).apply(input);
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    public long[] reflective() throws Throwable {
        return genericMethodHandleObject(long[].class, input);
    }

    static Map<Class<?>, MethodHandle> metHanGettersObj = Map.of(int[].class, MethodHandles.arrayElementGetter(int[].class).asType(MethodType.methodType(Object.class, Object.class, int.class)));
    static Map<Class<?>, MethodHandle> metHanSettersObj = Map.of(long[].class, MethodHandles.arrayElementSetter(long[].class).asType(MethodType.methodType(void.class, Object.class, int.class, Object.class)));
    static <F, T> T genericMethodHandleObject(Class<T> to, F array) throws Throwable {
        int length = Array.getLength(array);
        Object newArray = Array.newInstance(to.getComponentType(), length);
        MethodHandle getElement = metHanGettersObj.get(array.getClass());
        MethodHandle setElement = metHanSettersObj.get(to);
        for (int i = 0; i < length; i++) setElement.invokeExact(newArray, i, getElement.invokeExact(array, i));
        return (T) newArray;
    }

    @SuppressWarnings("unchecked")
    public static <F, T> Function<F, T> getWrapped(Class<F> from, Class<T> to) {
        return (Function<F, T>) cacheGeneric.computeIfAbsent(from, k -> new HashMap<>())
            .computeIfAbsent(
                to, k -> {
                    MethodHandle mh = convertBootstrap(from, to);
                    return arr -> {
                        try {
                            return (T) mh.invoke(arr);
                        } catch (Throwable e) {
                            throw new RuntimeException(e);
                        }
                    };
                });
    }

    public static MethodHandle convertBootstrap(Class<?> from, Class<?> to) {       
        MethodHandle getter = arrayElementGetter(from);
        MethodHandle setter = arrayElementSetter(to);

        MethodHandle body = explicitCastArguments(setter, methodType(void.class, to, int.class, from.getComponentType()));      
        body = collectArguments(body, 2, getter); // get from 1 array, set in other
        body = permuteArguments(body, methodType(void.class, to, int.class, from), 0, 1, 2, 1);
        body = collectArguments(identity(to), 1, body); // create pass-through for first argument
        body = permuteArguments(body, methodType(to, to, int.class, from), 0, 0, 1, 2);

        MethodHandle lenGetter = arrayLength(from);
        MethodHandle cons = MethodHandles.arrayConstructor(to);
        MethodHandle init = collectArguments(cons, 0, lenGetter);

        MethodHandle loop = countedLoop(lenGetter, init, body);
        return loop;
    }
}

Benchmark results are about the same for my method and manual (less score is better):

# JMH version: 1.19
# VM version: JDK 10.0.1, VM 10.0.1+10

Benchmark                  Mode  Cnt   Score   Error  Units
MyBenchmark.cachedGeneric  avgt   10   1.175 ± 0.046  ms/op
MyBenchmark.manual         avgt   10   1.149 ± 0.098  ms/op
MyBenchmark.reflective     avgt   10  10.165 ± 0.665  ms/op

I was actually really surprised how well this is being optimized (unless I made a mistake in the benchmark somewhere, but I can't find it). If you increase the number of elements to 5 million you can see the difference again:

Benchmark                  Mode  Cnt    Score    Error  Units
MyBenchmark.cachedGeneric  avgt   10  277.764 ± 14.217  ms/op
MyBenchmark.manual         avgt   10   14.851 ±  0.317  ms/op
MyBenchmark.reflective     avgt   10   76.599 ±  3.695  ms/op

Those numbers suggest to me that some loop un-rolling/inlining/something-else limit is being hit though, since the difference is suddenly a lot bigger.

You will probably also see a performance drop when the array types are not statically known.