Search code examples
javaintersystems-cacheobjectscript

What characters are usable in a variable name in ObjectScript on a "Unicode" installation?


I have a parser (in Java) for ObjectScript which works quite well, except for one thing: I don't parse "Unicode variable names".

The problem is that the documentation is not very explanative on this subject; and what is more, it misdefines Unicode as "16 bits". This tells me that only characters within the BMP are allowed.

But which ones? The number of Unicode blocks defined in the JDK is frighteningly high, and scripts aren't any better.

I could maybe use Character.isLetter() (note, I elected the version with a char, not an int), but I'm sure that even that would be too large...


Solution

  • Eduard was pretty much correct, i.e. local variable could be starting from percent or "alphabetic" character, followed by "alphabetic" characters or digits.

    [\p{Alphabetic}%][\p{Alphabetic}\d]*
    

    The most important to note here - what is the "alphabetic"? This implies latin letter or alphabetic in current Caché locale. I.e. with Russian/Unicode locale installed you could write something like:

    set порусски = 1
    

    or within Japanese locale:

    USER>set a=$c(12354)
    
    USER>set @a=88
    
    USER>write
    
    a="あ"
    あ=88