Search code examples
javaguavapreprocessor

Managing highly repetitive code and documentation in Java


Highly repetitive code is generally a bad thing, and there are design patterns that can help minimize this. However, sometimes it's simply inevitable due to the constraints of the language itself. Take the following example from java.util.Arrays:

/**
 * Assigns the specified long value to each element of the specified
 * range of the specified array of longs. The range to be filled
 * extends from index <tt>fromIndex</tt>, inclusive, to index
 * <tt>toIndex</tt>, exclusive. (If <tt>fromIndex==toIndex</tt>, the
 * range to be filled is empty.)
 *
 * @param a the array to be filled
 * @param fromIndex the index of the first element (inclusive) to be
 *        filled with the specified value
 * @param toIndex the index of the last element (exclusive) to be
 *        filled with the specified value
 * @param val the value to be stored in all elements of the array
 * @throws IllegalArgumentException if <tt>fromIndex &gt; toIndex</tt>
 * @throws ArrayIndexOutOfBoundsException if <tt>fromIndex &lt; 0</tt> or
 *         <tt>toIndex &gt; a.length</tt>
 */
public static void fill(long[] a, int fromIndex, int toIndex, long val) {
    rangeCheck(a.length, fromIndex, toIndex);
    for (int i = fromIndex; i<toIndex; i++)
        a[i] = val;
}

The above snippet appears in the source code eight times, with very little variation in the documentation/method signature but exactly the same method body, one for each of the root array types boolean[], byte[], char[], double[], float[], int[], short[], and Object[].

I believe that unless one resorts to reflection (which is an entirely different subject in itself), this repetition is inevitable. I understand that as a utility class, such high concentration of repetitive Java code is highly atypical, but even with the best practice, repetition does happen! Refactoring doesn't always work because it's not always possible (the obvious case is when the repetition is in the documentation).

Obviously maintaining this source code is a nightmare. A slight typo in the documentation, or a minor bug in the implementation, is multiplied by however many repetitions was made. In fact, the best example happens to involve this exact class:

Google Research Blog - Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken (by Joshua Bloch, Software Engineer)

The bug is a surprisingly subtle one, occurring in what many thought to be just a simple and straightforward algorithm.

 // int mid = (low + high)/2; // the bug
    int mid = (low + high) >>> 1; // the fix

The above line appears 11 times in the source code!

So my questions are:

  • How are these kinds of repetitive Java code/documentation handled in practice? How are they developed, maintained, and tested?
    • Do you start with "the original", and make it as mature as possible, and then copy and paste as necessary and hope you didn't make a mistake?
    • And if you did make a mistake in the original, then just fix it everywhere, unless you're comfortable with deleting the copies and repeating the whole replication process?
    • And you apply this same process for the testing code as well?
    • Would Java benefit from some sort of limited-use source code preprocessing for this kind of thing?
    • Perhaps Sun has their own preprocessor to help write, maintain, document and test these kind of repetitive library code?

A comment requested another example, so I pulled this one from Google Collections: com.google.common.base.Predicates lines 276-310 (AndPredicate) vs lines 312-346 (OrPredicate).

The source for these two classes are identical, except for:

  • AndPredicate vs OrPredicate (each appears five times in its class)
  • "And(" vs Or(" (in the respective toString() methods)
  • #and vs #or (in the @see Javadoc comments)
  • true vs false (in apply; ! can be rewritten out of the expression)
  • -1 /* all bits on */ vs 0 /* all bits off */ in hashCode()
  • &= vs |= in hashCode()

Solution

  • For people that absolutely need performance, boxing and unboxing and generified collections and whatnot are big no-no's.

    The same problem happens in performance computing where you need the same complex to work both for float and double (say some of the method shown in Goldberd's "What every computer scientist should know about floating-point numbers" paper).

    There's a reason why Trove's TIntIntHashMap runs circles around Java's HashMap<Integer,Integer> when working with a similar amount of data.

    Now how are Trove collection's source code written?

    By using source code instrumentation of course :)

    There are several Java libraries for higher performance (much higher than the default Java ones) that use code generators to create the repeated source code.

    We all know that "source code instrumentation" is evil and that code generation is crap, but still that's how people who really know what they're doing (i.e. the kind of people that write stuff like Trove) do it :)

    For what it is worth we generate source code that contains big warnings like:

    /*
     * This .java source file has been auto-generated from the template xxxxx
     * 
     * DO NOT MODIFY THIS FILE FOR IT SHALL GET OVERWRITTEN
     * 
     */