I am writing a parser using the following library: https://www.nuget.org/packages/Irony
My current goal is to parse a file that contains lines of plain text. Each line starts with either a whitespace or a tab symbol.
This is how my grammar class looks like:
NonTerminal program = new NonTerminal("program");
NonTerminal textStatement = new NonTerminal("textStatement");
NonTerminal textStatements = new NonTerminal("textStatements");
FreeTextLiteral text = new FreeTextLiteral("text", "\r\n");
KeyTerm whitespace = ToTerm(" ", "whitespace");
KeyTerm tab = ToTerm(" ", "tab");
KeyTerm newline = ToTerm("\n", "newline");
textStatement.Rule = ((whitespace | tab) + text + newline);
textStatements.Rule = MakePlusRule(textStatements, textStatement);
program.Rule = textStatements;
this.Root = program;
And this is the content of a target file (lines are not included):
----------------------
test
----------------------
Surprisingly, the thing fails on me with the following message:
Column 1, Line 0:
Syntax error, expected: whitespace, tab
It looks like the grammar is configured to skip whitespaces and tabs by default. So, it starts parsing with a "t" letter, having skipped the first " " symbol. This is fine for most cases, but not for this one. I'm trying to write a python-like language, so tracking of whitespaces is important.
I'm not expecting you to write the whole grammar for me, just suggest a generic approach. Any help is appreciated, thanks!
UPD: I ended up overriding 2 functions like this:
public override bool IsWhitespaceOrDelimiter(char ch)
{
if (ch == ' ' || ch == '\t')
return false;
return base.IsWhitespaceOrDelimiter(ch);
}
public override void SkipWhitespace(ISourceStream source)
{
while (!source.EOF())
{
switch (source.PreviewChar)
{
//case ' ':
//case '\t':
// break;
case '\r':
case '\n':
case '\v':
if (UsesNewLine) return;
break;
default:
return;
}
source.PreviewPosition++;
}
}
If you want to handle 'space' as an explicit char in grammar, you need to override IsWhitespaceOrDelimiter method, and for space return false. and same for tab and other chars