Search code examples
javaunicodeidentifierscjp

What are "connecting characters" in Java identifiers?


I am reading for SCJP and I have a question regarding this line:

Identifiers must start with a letter, a currency character ($), or a connecting character such as the underscore ( _ ). Identifiers cannot start with a number!

It states that a valid identifier name can start with a connecting character such as underscore. I thought underscores were the only valid option? What other connecting characters are there?


Solution

  • Here is a list of connecting characters. These are characters used to connect words.

    http://www.fileformat.info/info/unicode/category/Pc/list.htm

    U+005F _ LOW LINE
    U+203F ‿ UNDERTIE
    U+2040 ⁀ CHARACTER TIE
    U+2054 ⁔ INVERTED UNDERTIE
    U+FE33 ︳ PRESENTATION FORM FOR VERTICAL LOW LINE
    U+FE34 ︴ PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
    U+FE4D ﹍ DASHED LOW LINE
    U+FE4E ﹎ CENTRELINE LOW LINE
    U+FE4F ﹏ WAVY LOW LINE
    U+FF3F _ FULLWIDTH LOW LINE
    

    This compiles on Java 7.

    int _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, _;
    

    An example. In this case tp is the name of a column and the value for a given row.

    Column<Double> ︴tp︴ = table.getColumn("tp", double.class);
    
    double tp = row.getDouble(︴tp︴);
    

    The following

    for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
        if (Character.isJavaIdentifierStart(i) && !Character.isAlphabetic(i))
            System.out.print((char) i + " ");
    }
    

    prints

    $ _ ¢ £ ¤ ¥ ؋ ৲ ৳ ৻ ૱ ௹ ฿ ៛ ‿ ⁀ ⁔ ₠ ₡ ₢ ₣ ₤ ₥ ₦ ₧ ₨ ₩ ₪ ₫ € ₭ ₮ ₯ ₰ ₱ ₲ ₳ ₴ ₵ ₶ ₷ ₸ ₹ ꠸ ﷼ ︳ ︴ ﹍ ﹎ ﹏ ﹩ $ _ ¢ £ ¥ ₩