Search code examples
javalocalizationnumber-formatting

NumberFormat.parse("3.14") returning 314 when used with German Locale


When I try to read the string "3.14" as a float using German Locale I expect one of two things to happen:

(1) throw an error, because that is not a valid way to write 3.14 in German
(2) fallback to a more standard decimal notation and read the number as 3.14 because that is what any German would read in that number

But instead I am getting 314.

import java.text.NumberFormat;
import java.util.Locale;


public class MyClass {
    public static void main(String args[]) throws Exception {
        System.out.println(
            NumberFormat.getNumberInstance(Locale.GERMANY).parse("3.14")
        ); // prints 314
    }
}

The oracle-documentation for parse states:

Number parse(String source)
Parses text from the beginning of the given string to produce a number.

Which does not really explain what I am seeing here as it doesn't specify any non-happy path. What is the javas understanding of a German decimal number, and how can I fail-fast and safely convert Strings to numbers assuming a German decimal notation?


Solution

  • The basic assumption that NumberFormat would validate its input is wrong. A modern dev might expect a validation, especially because the method throws a ParseException as a checked exception, but with the magic of open-source I can look at the source and realize I am very wrong and this Java 1.1 code was written with different design principles than I am used to.

    The critical code section in the concrete class that we are using here (for one implementation) is in openjdk > DecimalFormat.java > int subparseNumber, where the input string gets converted into a "DigitList". The digit-list for "3.14" with a German locale is indeed [3, 1, 4] because the thousands-separator is indeed ignored as @GiacomoCatenazzi pointed out in his comment 1, so subsequent code has to interpret it as 314. Also, when an invalid character is encountered, the parsing just stops, so for example "0x134" -> 0 with no error.

    There is more to learn from the source-code: NumberFormat is not threadsafe, you may not reuse the same instance across multiple threads. The modern assumption that a function like format.parse(input) -> obj would be trivially safe because input and format are only accessed readonly does not hold - parsing changes internal state of the NumberFormat-instance. You can only reuse the instance after parse completes.


    So how do I make a failfast conversion of Strings to numbers in Java?

    (1) If you know the target type and the number is in the standard decimal format, this works:

    Float.valueOf("3,14"); // NumberFormatException
    Float.valueOf("3.14"); // 3.14f
    

    Note that NumberFormat.getNumberInstance().parse("3,14") will return 314 - not an error - so this no-validation-problem is in no way exclusive to the German Locale.

    (2) If I have to use German-locale-number-strings for reading numbers, I must check if the input-string matches expectation and NumberFormat does not provide any way to do that, nor does there seem to be a satisfying fail-fast/non-gigo answer to this 12-year old question about the problem: Convert String with Dot or Comma to Float Number

    The best idea I have is to validate the input myself and restrict it that way. Here is a solution that is stricter than necessary, banning thousands-separators completely, but for my usecase, this is fine:

    if (inputString.contains(".")) {
               // throw
    }
    return Float.valueOf(inputString.replace(',', '.'));
    

    1 You can actually do format.setGroupingUsed(false), and then you can parse "3.14" as a 3 instead of a 314, so it is not entirely true they get fully ignored. But there is no code that uses the grouping-character to judge the correctness of the input String, even though there is format.setGroupingSize and getter which controls how many digits should be grouped together.