Search code examples
parsingparser-generatorragel

Lin Descriptor File parser


I am trying to research on the possible parsers, as part of developing a PC application that can be used for parsing a Lin Descriptor File. The current parser application is based on flex-bison parsing approach. Now I need to redesign the parser, since the current one is incapable of detecting specific errors.

I have previously used Ragel parser(https://en.wikipedia.org/wiki/Ragel) for parsing a regular expression (Regex : https://en.wikipedia.org/wiki/Regular_expression) commands and it proved very handy.

However, with the current complexity of a LDF-file, I am unsure if Ragel(with C++ as host language) is the best possible approach to parse the LDF-file. The reason for this is that the LDF-file has a lot of data that is not fixed or constant, but varies as per the vendors. Also the LDF fields must have retain references to other fields to detect errors in the file. Ragel is more suited when the structure for parsing is fixed(thats what I found while developing a Regex parser)

Could anyone who has already worked on such a project, provide some tips to select a suitable parser for the Lin Descriptor File.

Example for Lin Descriptor File : http://microchipdeveloper.com/lin:protocol-app-ldf


Solution

  • If you feel that an LALR(1) parser is not adequate to your parsing problem, then it is not possible that a finite automaton would be better. The FA is strictly less powerful.

    But without knowing much about the nature of the checks you want to implement, I'm pretty sure that the appropriate strategy is to parse the file into some simple hierarchical data structure (i.e. a tree of some form, usually called an AST in parsing literature) using a flex/bison grammar, and then walk the resulting data structure to perform whatever semantic checks seem necessary.

    Attempting to do semantic checks while parsing usually leads to over-complicated, badly-factored and unscalable solutions. That is not a problem with the bison tool, but rather with a particular style of using it which does not take into account what we have learned about the importance of separation of concerns.

    Refactoring your existing grammar so that it uses "just a grammar" -- that is, so that it just generates the required semantic representation -- is probably a much simpler task than reimplementing with some other parser generator (which is unlikely to provide any real advantage, in any case).

    And you should definitely resist the temptation to abandon parser generators in favour of an even less modular solution: you might succeed in building something, but the probability is that the result will be even less maintainable and extensible than what you currently have.