Search code examples
javadata-structuresjvmprogramming-languagesprimitive

Garbage-collected languages with efficient numeric data types


I am searching for a language/library (preferably JVM-based) that handles numeric values (integer and floating point numbers) in both convenient and efficient manner.

  • Convenient: supported by the collection framework and generics.
  • Efficient: incurs no noticeable overhead when the primitives are the building block in a data-heavy data-processing software
    (specifically, processing multiple GB of texts with >100,000,000
    items).

Deficiencies of the current languages:

  • Plain Java: auto-boxing is quite convenient, but it has substantial overhead.
  • Scala and Kotlin: seem to rely also on Java's boxed primitives, so no efficiency advantage here.
  • Python: again, seems to box all numeric values, and we ran into prohibitive performance problems with vanilla Python. Numpy, which provides a different implementation, does not support the needed features.

Is there a language that handles primitives with the same convenience but efficiently (compared to that language general performance)?


Solution

  • C# fits the criteria, depending on what you mean by the efficiency requirement. It doesn't run on the JVM, of course.

    Unlike Java, which implements generics with type erasure, C# implements generics via reification like C++ does. That means that when you make a List<int>, the underlying array will be an array of int, not an array of objects. Also the code that implements all the List methods will be compiled specifically for List<int>, and can take advantage of int-specific optimizations.

    For this reason, data processing with primitive types is generally faster in C# than it is in Java when you're using all the convenient language features. It can still be far from what you can get with C++, however, because the runtime checks that prevent buffer overrun, etc., are not free.