Search code examples
yaccbisonbnfpushdown-automaton

Deriving a state machine from a BNF grammar


I am trying to put together a proof of concept of an XSS-safe string interpolation scheme.

Given a string with substitutions,

"Hello <b>$planetoid</b>!"

I want break it into literal portions and substitutions ("Hello<b>" planetoid "</b>!") and then run a state machine left to right over the literal portions. When I reach an interpolated value (planetoid in the above), I need to be able to get from the state to an appropriate escaping function.

Does anyone know of any examples of how to use lex/yacc/bison to derive a state machine and be able to associate labels in the grammar with output states? I want to derive a state machine that I can use both in javascript, and to try and replace PHP's underlying string implementation.

My reasons for doing this are described here.

cheers, mike


Solution

  • In general, it is not possible to create a state machine for grammars that can be represented in BNF. State machines can only recognize regular languages and BNF can specify context-free languages. Yacc can create parsers. Would that be sufficient?