struct birthday{ int day:6; }b-day;
While declaring b-day
as a structure it shows the following error:
error: expected ':', ',', ';', '}' or '__attribute__' before '-' token|
but after removing the hyphen from the variable name it works, why?
The boring answer is that the language definition doesn't allow -
to be part of an identifier (variable name, function name, typedef name, enumeration constant, tag name, etc.).
Why that's the case probably boils down to a couple of things:
At the preprocessing stage, your source text is broken up into a sequence of tokens - identifiers, punctuators, string literals, and numeric constants. Whitespace is not significant except that it separates tokens of the same type. If you write a=b+c;
, the compiler sees the sequence of tokens identifier (a
), punctuator (=
), identifier (b
), punctuator (+
), identifier (c
), and punctuator (;
). This is before it does any syntax analysis - it's not looking at the meaning or the structure of that statement, it's just breaking it down into its component parts.
It can do this because the characters =
and +
and ;
can never be part of an identifier, so it can clearly see where identifiers begin and end1.
The tokenizer is "greedy" and will build the longest valid token it can. In a declaration like
int a;
you need the whitespace to tell the preprocessor that int
and a
are separate tokens, otherwise it will try to mash them together into a single token inta
. Similarly, in a statement like a=b- -c;
, you need that whitespace (or parentheses, a=b-(-c);
) to signify you're subtracting -c
from b
, otherwise the tokenizer will interpret it as a = b-- c
, which isn't what you want.
So, if a -
could be part of an identifier, how should x=a-b+c
be tokenized? Is a-b
a single token or three? How would you write your tokenizer such that it could keep track of that? Would you require whitespace before and after -
to signify that it's an operator and not part of a variable?
It's certainly possible to define a language that allows -
to be both an operator and part of an identifier (see COBOL), but it adds complexity to the tokenizing stage of compiling, and it's just plain easier to not allow it.
T *p;
and T* p;
when declaring pointer variables - the *
can never be part of an identifier, so whitespace isn't necessary to separate the type from the variable name. You could write it as T*p;
or even T * p;
and it will be treated exactly the same.