Search code examples
javaperformancecollectionsguava

Defining a set of ground-rules for high-performance data structures (java)


I generally use vectors/arraylists , hashmaps/treemaps, and other java collections interchangeably, with exception of the fact that there are sometimes functional API requirements (for example, I might need a sorted data set in certain instances).

Lately, however, I've found a need to push java performance to the limit for some algorithms I'm running.

Is there a set of guidelines for high-performance data structures, that I can use as ground rules for my coding ?

I'm looking for general rules, but, in this context, answers to the following questions might also be very helpful :

1) When should I use multidimensional arrays instead of nested Collections ?

2) Vectors vs. ArrayLists - is there truly a performance difference ?

3) Do collection API's like Google's collections, java tricks (like reflection and casting), and other common java developer idioms tend to slow down the JVM when it is under heavy load ?

4) Do primitives vs regular objects (i.e. Double vs double) slow down the JVM when doing lots of calculations ?

5) Are there other important guidelines for dealing with large collections in java programs which need to be high-performance ?

  • Note : at this point, I'm not doing any multithreading... I realize that there are other constraints which might apply once I start parallelizing .

Solution

  • All performance issues should be addressed first by profiling (for both time and memory/object use). Don't optimize things that aren't a factor in the performance of your code. With that caveat, there are some general rules of thumb (that should all be tested by profiling!)

    1) When should I use multidimensional arrays instead of nested Collections ?

    When you don't need the dynamic sizing of Collections and don't need to feed your data to anything that requires a Collection, then multidimensional arrays (arrays of arrays, actually) can be a faster.

    2) Vectors vs. ArrayLists - is there truly a performance difference ?

    Yes. Many methods in Vector are synchronized, which is expensive. If you aren't multithreading, then avoid Vector. Even if you are, the granularity of the synchronization is usually wrong and you're better off providing thread safety yourself.

    3) Do collection API's like Google's collections, java tricks (like reflection and casting), and other common java developer idioms tend to slow down the JVM when it is under heavy load ?

    Reflection is slow; garbage collection is slow. Anything you can do to avoid those will speed things up.

    4) Do primitives vs regular objects (i.e. Double vs double) slow down the JVM when doing lots of calculations ?

    Yes. Autoboxing/unboxing can create huge amounts of garbage very quickly. This all has to be collected, which will also slow down your program.

    5) Are there other important guidelines for dealing with large collections in java programs which need to be high-performance ?

    Prefer local method variables to field accesses. You can find many other guidelines by searching the web. The main thing, though, is to profile.

    Edit: There's a good collection of performance hints here.