Search code examples
javaunicodeintegerstring-parsing

Why does Integer.parseInt("\uD835\uDFE8") fail?


I was under the impression that java supports unicode characters. I made this test and sadly found that it fails. The question is why? Is it a bug or somewhere documented?

// MATHEMATICAL SANS-SERIF "𝟨"
String unicodeNum6 = "\uD835\uDFE8";
int codePoint6 = unicodeNum6.codePointAt(0);    
int val6 = Character.getNumericValue(codePoint6);
System.out.println("unicodeNum6 = "+ unicodeNum6
    + ", codePoint6 = "+ codePoint6+ ", val6 = "+val6);
int unicodeNum6Int = Integer.parseInt(unicodeNum6);

This fails with a Exception in thread "main" java.lang.NumberFormatException: For input string: "𝟨"

Unexpected I think, since the println works and prints the expected line:

unicodeNum6 = 𝟨, codePoint6 = 120808, val6 = 6

So Java perfectly knows the numerical value of the unicode character but does not use it in parseInt.

Can someone give a good reason why it should fail?


Solution

  • It's not bug, the behaviour is documented. According to the documentation for parseInt(String s, int radix) (emphasis mine)

    The characters in the string must all be digits of the specified radix (as determined by whether Character.digit(char, int) returns a nonnegative value), except that the first character may be an ASCII minus sign '-' ('\u002D') to indicate a negative value or an ASCII plus sign '+' ('\u002B') to indicate a positive value

    If you try :

    int aa = Character.digit('\uD835', 10);
    int bb = Character.digit('\uDFE8', 10);
    

    You'll see that both return -1.
    Mind you, Integer.parseInt(unicodeNum6); will just call Integer.parseInt(unicodeNum6, 10);