Search code examples
javaunicodenumber-systemsapache-commons-lang3

Why does Apache Commons consider '१२३' numeric?


According to Apache Commons Lang's documentation for StringUtils.isNumeric(), the String '१२३' is numeric.

Since I believed this might be a mistake in the documentation, I ran tests to verify the statement. I found that according to Apache Commons it is numeric.

Why is this String numeric? What do those characters represent?


Solution

  • Because that "CharSequence contains only Unicode digits" (quoting your linked documentation).

    All of the characters return true for Character.isDigit:

    Some Unicode character ranges that contain digits:

    • '\u0030' through '\u0039', ISO-LATIN-1 digits ('0' through '9')
    • '\u0660' through '\u0669', Arabic-Indic digits
    • '\u06F0' through '\u06F9', Extended Arabic-Indic digits
    • '\u0966' through '\u096F', Devanagari digits
    • '\uFF10' through '\uFF19', Fullwidth digits

    Many other character ranges contain digits as well.

    १२३ are Devanagari digits: