Search code examples
c#parsingsprache

Sprache -- Cannot recognise this sequence


I want to match strings where the first character is a letter, then it is followed by multiple characters which are either digits or letters, then finally ends with a letter. For example a11a11a is correct but a11aa11 is incorrect because it ends with a digit and not a letter.

I wrote the following code to do it:

var grammar =
    from first in Parse.Letter.Once()
    from rest in Parse.LetterOrDigit.Many()
    from end in Parse.Letter.Once()
    select new string(first.Concat(rest).Concat(end).ToArray());

var result = grammar.TryParse("a111a");

Unfortunately LetterOrDigit.Many() consumes the last letter too.

Any way to avoid this?


Solution

  • Here is a solution:

    Parser<IEnumerable<char>> A = null, B = null, C = null;
    
    var letter = Parse.Letter.Once();
    var digit = Parse.Digit.Once();
    
    B =
        (
        from d in digit
        from cs in Parse.Ref(() => C)
        select d.Concat(cs)
        ).Or
        (
            from l in letter
            from bs in Parse.Ref(() => B)
            select l.Concat(bs)
        ).Or(letter);
    
    C = (
        from d in digit
        from bs in Parse.Ref(() => B)
        select d.Concat(bs)
        ).Or(letter);
    
    A = (
        from l in letter
        from bs in Parse.Ref(() => B)
        select l.Concat(bs)
        ).Or(letter);
    
    var grammar =
        from _ in Parse.WhiteSpace.Many()
        from a in A
        from __ in Parse.WhiteSpace.Many()
        select a;
    

    The clauses in the Or's need to be in the correct order.

    A commenter recommended the use of Regular Expressions. You can use them within Sprache:

    Parse.Regex("[a-z]([a-z0-9]*[a-z])?")