Search code examples
regexrubytreetop

How can you require an undetermined character to be repeated consecutively a certain number of times in Ruby Treetop?


I want to create a rule that requires a non-number, non-letter character to be consecutively repeated three times. The rule would look something like this:

# Note, this code does not do what I want! 

grammar ThreeCharacters

  rule threeConsecutiveCharacters
    (![a-zA-Z0-9] .) 3..3
  end

end

Is there any way to require the first character that it detects to be repeated three times?

There was previously a similar question about detecting the number of indentations: PEG for Python style indentation

The solution there was to first initialize the indentation stack:

&{|s| @indents = [-1] }

Then save the indentation for the current line:

&{|s|
  level = s[0].indentation.text_value.length
  @indents << level
  true
}

Whenever a new line begins it peeks at the indentation like this:

!{|s|
  # Peek at the following indentation:
  save = index; i = _nt_indentation; index = save
  # We're closing if the indentation is less or the same as our enclosing block's:
  closing = i.text_value.length <= @indents.last
}

If the indentation is larger it adds the new indentation level to the stack.

I could create something similar for my problem, but this seems like a very tedious way to solve it. Are there any other ways to create my rule?


Solution

  • Yes, you can do it this way in Treetop. This kind of thing not generally possible with a PEG because of the way packrat parsing works; it's greedy but you need to limit its greed using semantic information from earlier in the parse. It's only the addition in Treetop of semantic predicates (&{...}} that make it possible. So yes, it's tedious. You might consider using Rattler instead, as it has a significant number of features in addition to those available in Treetop. I can't advise (as maintainer of Treetop, but not being a user of Rattler) but I am very impressed by its feature set and I think it will handle this case better.

    If you proceed with Treetop, bear in mind that every semantic predicate should return a boolean value indicating success or failure. This is not explicit in the initialisation of @indents above.