I am new to c#. I have a question about parsing a string. If i have a file that contains dome lines such as PC: SWITCH_A == ON
or a string like PC: defined(SWITCH_B) && SWITCH_C == OFF
. All the operators(==, &&, defined) are string here and all the switch names(SWITCH_A) and their values are identifiers(OFF). How do i parse these kind of string? Do i first have to tokenize them split them by new lines or white spaces and then make an abstract syntax tree for parsing them? Also do i need to store all the identifiers in a dictionary first? I have no idea about parsing can anyone help? an tell me with an example how to do it what should be the methods and classes that should be included? Thanks.
Unfortunately, Yes. You have to tokenize them if the syntax that you are parsing is something custom and not a standard syntax where a compiler already exists for parsing the source.
You could take advantage of Expression Trees. They are there in the .NET Framework for building and evaluating dynamic languages.
To start parsing the syntax you have to have a grammar document that describes all the possible cases of the syntax in each line. After that, you can start parsing the lines and building your expression tree.
Parsing any source code typically goes a character at a time since each character might change the entire semantics of the piece that is being parsed.
So, i suggest you start with a grammar document for the syntax that you have and then start writing your parser.
Make sure that there isn't anything already out there for the syntax you are trying to parse as these kind of projects tend to be error-prone and time consuming
Now since your high-level grammar is
Expression ::= Identifier | IntegerValue | BooleanExpression
Identifier
and IntegerValue
are constant literals in the source, so you need to start looking for a BooleanExpression
.
To find a BooleanExpression
you need to look for either BooleanBinaryExpression
, BooleanUnaryExpression
, TrueExpression
or FalseExpression
.
You can detect a BooleanBinaryExpression
by look for the &&
or ==
operators and then taking the left and right operands.
To detect a BooleanUnaryExpression
you need to look for the word defined
and then parse the identifier in the parantheses.
And so on...
Notice that your grammar supports recursion in the syntax, look at the definition of the AndExpression
or EqualsExpression
, they point back to Expression
AndExpression ::= Expression '&&' Expression
EqualsExpression ::= Expression '==' Expression
You got a bunch of methods in the String Class in the .NET Framework to assist you in detecting and parsing your grammar.
Another alternative is that you can look for a parser generator that targets c#. For example, see ANTLR