Search code examples
javacomputer-sciencecorrectnessproof-of-correctness

Why would this algorithm not adhere to its specification? How to attempt proving its correctness?


Write the method getLargeAndSmallLetters that returns a String that contains all upper- and lowercase ASCII letters in the following form (upper- and lowercase letters separated by white space, letter groups separated by comma, no comma at the end):

A a, B b, C c, D d, E e, ..., Z z

The specification of the algorithm does not bear any details about implementation specifics, i.e. the use of a StringBuilder construct, efficiency guidelines the algorithm has to meet or any best practices regarding readability of the implementation.

My version of the algorithm that compiles and returns the desired output

public static String getLargeAndSmallLetters() {
    StringBuilder builder = new StringBuilder();
    for (int i = 64; i < 96; ++i) { // WRONG: hard coded
        if (Character.isLetter(i)) {
            String upper = Character.toString((char)i);
            String lower = Character.toString((char)(i+32));
            builder.append(upper + " " + lower + ", "); // WRONG: in loop
        }
    }
    String result = builder.toString();
    result = result.substring(0, result.length() - 2);
    return result;
 }

What my docent probably would have liked to see is a for loop like this

for (char lower = 'a'; lower <= 'z'; ++lower) {
    Character upper = Character.toUpperCase(lower);
    builder.append(upper)
           .append(" ")
           .append(lower)
           .append(", ");
}

Why would the algorithm be "academically" more correct this way? This could have been solved with simple string concatenation, too.

I first want to understand why my solution is not optimal (but correct IMO). Having a C# background and can only roughly guess reasons as of why my code is not optimal:

  • for loop goes through too many iterations. I correct this in an "ugly" but correct way with if (Character.isLetter(i)).
  • it's not as readable
  • it casts ints to chars. Is there happening any kind of boxing/unboxing that impacts performance?
  • the way String is implemented in Java, am I creating more immutable Strings inside the string pool by using the string concatenation with the + operator inside my for loop? Would using StringBuilder be more efficient because it internally reuses the " " and ", " Strings more efficiently?

Now what I also would like to do is to provide an "academical argument" that proves the correctness of my answer. Here some ideas:

  1. The compiler optimizes my "mistakes" away and actually generates the same byte code for both the "wrong" and "correct" answer
  2. The JIT compiler executes the same machine code for both "correct" and "wrong" answer implementations
  3. Provide a mathematical proof of correctness of the algorithm

Option 3 seems the most straight forward way of proofing my answers correctness to me.

For options 1 and 2 I think I'd need to correct the for loop to for (int i = 65; i < 91; ++i) and remove the if (Character.isLetter(i)) statement.

Any suggestions, additions, corrections to my thoughts would be greatly appreciated.


Solution

  • The compiler optimizes my "mistakes" away and actually generates the same byte code for both the "wrong" and "correct" answer

    No, the compiler does very little by way of optimization. It might change some of your string concatenations into StringBuilder appends, but that's about it; it otherwise just faithfully reproduces what you've written in your code, in bytecode form. Basically all of the smartness happens in the JIT.

    You can see very easily that they don't produce the same bytecode: decompile them with javap -c <YourClass>.

    The JIT compiler executes the same machine code for both "correct" and "wrong" answer implementations

    The JIT does execute the code, eventually. For something like this, it will probably execute once, and so not be worth JIT'ing.

    But I strongly doubt that the JIT would produce the same code for both approaches, because the two approaches are quite different.

    Provide a mathematical proof of correctness of the algorithm

    Formal proof is hard in Java. There are an awful lot of subtle semantics that you have to take into account in the proof; or, at least, prove that those semantics aren't involved in the execution of your code.

    All you can really do is assert that it is functionally equivalent. As in, not prove that it's correct, but demonstrate that it's not incorrect.

    It's pretty easy to write a test for this, as you know the correct output:

    A a, B b, C c, ...
    

    So just check that your code produces that, and you're done. Or, of course, check that your code produces the same output as the model answer.

    Your answer doesn't actually quite do the same as the model answer. You chop off the , from the end; the model answer does not.