Search code examples
c++cregexparser-generator

Regular expression parser generator


Sometimes, it would be convenient to have a highly optimized function for regex search instead of including a library generating parsers at runtime. Is there a parser generator that would fit such a role?

Ideally, it would:

  • create a single C function
  • generate a DFA corresponding to the given regular expression
  • be as efficient as KMP or Boyer-Moore in simple cases

Solution

  • Here is list of tools that all suit your needs:

    1. Lex/Flex is perhaps the best-known tool for constructing parsers from regular expressions. Lex is useful in many scenarios but it can impose too much overhead for simple parsing applications because of heavyweight processing loop that imposes a stream "pull" model and input buffering. It was designed to parse entire files instead of simple strings.

    2. Re2C. It is a pre-processor that generates C-based recognizers from regular expressions. Generated state machines run very fast and integrate easily into any program, free of dependencies.

    3. Ragel State Machine Compiler. Another pre-processor that generates FSM code from high level regular language notation (regular expression is one case of this definition). It works for a range of languages (C, C++, Objective-C, D, Java and Ruby), can execute user actions on different FSM events, etc. What is more, it can generate state machine definition in format of Graphviz for visualization of states and transitions.