I want to match strings where the first character is a letter, then it is followed by multiple characters which are either digits or letters, then finally ends with a letter. For example a11a11a
is correct but a11aa11
is incorrect because it ends with a digit and not a letter.
I wrote the following code to do it:
var grammar =
from first in Parse.Letter.Once()
from rest in Parse.LetterOrDigit.Many()
from end in Parse.Letter.Once()
select new string(first.Concat(rest).Concat(end).ToArray());
var result = grammar.TryParse("a111a");
Unfortunately LetterOrDigit.Many()
consumes the last letter too.
Any way to avoid this?
Here is a solution:
Parser<IEnumerable<char>> A = null, B = null, C = null;
var letter = Parse.Letter.Once();
var digit = Parse.Digit.Once();
B =
(
from d in digit
from cs in Parse.Ref(() => C)
select d.Concat(cs)
).Or
(
from l in letter
from bs in Parse.Ref(() => B)
select l.Concat(bs)
).Or(letter);
C = (
from d in digit
from bs in Parse.Ref(() => B)
select d.Concat(bs)
).Or(letter);
A = (
from l in letter
from bs in Parse.Ref(() => B)
select l.Concat(bs)
).Or(letter);
var grammar =
from _ in Parse.WhiteSpace.Many()
from a in A
from __ in Parse.WhiteSpace.Many()
select a;
The clauses in the Or
's need to be in the correct order.
A commenter recommended the use of Regular Expressions. You can use them within Sprache:
Parse.Regex("[a-z]([a-z0-9]*[a-z])?")