I am designing a custom language based on CSS-ish (CSS+custom extension) which basically would work like this:
[object.member.value = 5]{
object.member.anothervalue:8
object.member.yetanothervalue:'hello'
object.member.yetyetanothervalue.anothervalue:blue
}
Basically the language allows to check for some conditions (if's, can be nested) and then apply some values to the object. There are no loops. This would be stored in plain text files and loaded into the application (C++) at starting time.
The idea is to translate this CSS-ish file into a C++ tree or something similar, which can be evaluated at runtime.
I am considering using some lexical analyzers and tokenizers (Yacc, Flex, Bison, etc...).
What would be your suggestion of tools / libraries to use?
If you expect to do this sort of thing more than once, learn how to use parser generators. It will save you a lot of pain in the long run.
Start simple. The tools will do lots of things for you, and generally with very little effort. Let them do that. Get things working before you try to do things which are complex.
The rest of this assumes that you will use flex
and bison
(which are lex
and yacc
lookalikes.) You don't have to; there are many alternatives. If you decide to try one of the other alternatives, ignore the rest of this answer.
But flex
and bison
are solid, well-maintained, well-debugged packages with a lot of documentation, and they've been used extensively over a long period of time. Read the documentation first.
flex
will read from standard input or a provided file descriptor automatically. Let it do that. flex
will track line numbers for you. Let it do that.bison
will generate token numbers for you automatically. Let it do that.bison
and flex
are optimized for single-character tokens. Not only do you not need to provide token numbers, you don't even need to provide token names. In your flex file, just put this at or near the end:
. { return yytext[0]; }
and don't bother writing rules to handle single-character tokens. Don't worry about the fact that you will tokenize illegal characters; bison
will produce an error message for you.
flex
to insert a default rule. (%option nodefault
is enough to suppress it.)A couple of other tips:
yytext
is a global, pretend that it isn't. You must copy any string which is needed for further processing. strdup
is your friend; use it rather than messing around with malloc
and strcpy
. Use asprintf
as well; char* out; asprintf(&out, "%s%s%s", s1, s2, s3);
is far and away the easiest way to concatenate three strings. There are easily available unrestricted implementations for platforms which don't have these things, so don't worry about the "but they're not Posix/Standard C/yadda yadda yadda" arguments. And don't even think about fixed-length buffers. You don't need them. Honest.strtol
once in the scanner, and then you don't even need to think about memory allocations.free()
strings when you don't need them any more, but if you find that difficult start by leaking memory and then fix things after you have your parser working. (I know some people will find that sacrilegious, but as long as you remember to do it before production, it's fine; you'll feel a lot more motivated once you have the basics working.)And finally:
bison
. If you find yourself with mysterious shift/reduce conflicts, use a glr
parser: yes, it's a bit slower, but if it saves you some pain, it's worth it. You can always go back and fix things up later. (GLR parsers won't save you from all grammar problems. You still need to make sure your grammar isn't ambiguous. But they can help.)C
interfaces. It's ok to compile with C++ and you can use standard C++ containers and other nice features; just don't use them in your semantic values because that doesn't play nicely with bison
's internal stack management. (Pointers to C++ containers are just fine, though.) And remember that flex
and bison
are just control flow; the bulk of your program is going to be written in C/C++, so you're not entering a new world by using the compiler tools. You're also not getting a free pass: you need to know how to use C/C++ before you start writing your parser.Hope that helps. Good luck.