I am trying to put together a proof of concept of an XSS-safe string interpolation scheme.
Given a string with substitutions,
"Hello <b>$planetoid</b>!"
I want break it into literal portions and substitutions ("Hello<b>" planetoid "</b>!")
and then run a state machine left to right over the literal portions. When I reach an interpolated value (planetoid
in the above), I need to be able to get from the state to an appropriate escaping function.
Does anyone know of any examples of how to use lex/yacc/bison to derive a state machine and be able to associate labels in the grammar with output states? I want to derive a state machine that I can use both in javascript, and to try and replace PHP's underlying string implementation.
My reasons for doing this are described here.
cheers, mike
In general, it is not possible to create a state machine for grammars that can be represented in BNF. State machines can only recognize regular languages and BNF can specify context-free languages. Yacc can create parsers. Would that be sufficient?