Search code examples
javaparsingsemanticssymbol-table

Does Java has ambiguous syntax which needs more information about an identifier?


NOTICE: This question is not about "Java do not have pointers"

In C language, the code identifier1 * identifier2 is ambiguous for two possible meaning:

  1. If the identifier1 is a type, then this might be a pointer declaration.
  2. If the identifier1 is a variable, then this might be a multiply statement.

The problem is that I cannot choose the right production when building the Syntax tree. I checked Clang's code and it seems that Clang has to put the type checking(by using a symbol table) to the parsing phase(correct me if I'm wrong).

Then I checked the code of javac(OpenJDK), it seems that on parsing phase, there's no semantic analysis involved. The parser can build an AST barely using the tokens.

So I'm curious if Java has the same ambiguous syntax problem? The problem that if the parser don't know an identifier's type, it can not choose the right production?

Or more generic, Does Java has syntax ambiguous that a parser cannot choose a production without other information more than a token stream?


Solution

  • Tokenization is always context sensitive, for languages. However Java does not have operators that are this sensitive. You can, however chain tokens in such a way, that it produces ambiguity, but not only as part of a larger syntactical statement:

    A < B can be part of both public class A < B > { ... } or if (A < B) { ... }. The first is a generic class definition, the second is a comparison.

    This is just the first example from the top of my hat, but I presume there are more. However, the operators are usually very narrowly defined, and cannot (as in C/C++-like languages) be overloaded. Also, other than in C/C++ there is only one accessor-operator (the dot: .), with one exception (since Java 8, the double-colon ::). In C++ there are a bunch, so it is much less chaotic.

    To the specific question about whether Java is always syntactically decidable: Yes. A well-implemented compiler can always decide what token is present, depending on a token stream.