JavaTokenParsers
in Scala provides convenient regexps for matching integer and floating-point numbers, and double-quoted strings. But that's ALL it does. How do I do the obvious thing of converting these strings back into the underlying converting objects? This is pretty easy to do for numbers, using toDouble
or toInt
, etc. But how do you do the equivalent for strings? E.g. If I type the string
"Unicode \u20ac is a Euro sign, which I would write \\u20ac in a string. \243 is a pound sign.\n\r And \f is a \"form feed\", with embedded quotes.\n\r"
And then I run this through JavaTokenParsers
, I'll duly get a string back that correctly parses the embedded quotes, but has a double quote character as its first and last characters, and lots of backslash sequences. How do I get the equivalent Java string with the escape sequences processed? I can't believe there's no library function to do this, but can't find one.
It seems that there is no such function—at least, none is used in the Scala compiler. That's not a conclusive answer though, maybe a library function was introduced afterwards.
In case you want to read (or copy-n-paste) this code, here's the related code I found.
The tokenization logic of the Scala compiler is distributed among different files.
The top level method seems to be fetchToken
in src/compiler/scala/tools/nsc/ast/parser/Scanners.scala
, which in turn delegates to logic in src/compiler/scala/tools/nsc/util/CharArrayReader.scala
(one of its ancestors), in particular nextChar
and potentialUnicode
. Other escapes are handled in getLitChar
, again in Scanners.scala
.