Search code examples
c++boostqi

Returning multiple data types from Boost::spirit parse


I would like to parse about 5-10 different message types that share a common format (such as JSON, for example) but each have specific fields that need to be validated. Each message should eventually be parsed into a custom class/struct that has types that don't require any sort of casting (e.g. a field is an int instead of a variant/tuple). I see two approaches to the problem:

  1. Write a grammar for each specific message that handles the validation for both the message format (JSON boilerplate, in this example) and validates the content of the fields, returning a truly custom struct

  2. Write a grammar that only validates the structure (just the JSON rules) and returns a more generic object (with fields that are variants/tuples, etc.) and validate/translate at a higher level into a custom struct (casting and checking the various variant fields)

I see these as the pros and cons of each:

Pros for 1:

  • All validation is done within boost::spirit
  • Karma generators (if written) would look like existing spirit parsing code

Cons for 1:

  • A new grammar has to be written and maintained for each new message type that may be invented in the future (and the spirit syntax is not intuitive)

Pros for 2:

  • Complex spirit code is written once and rarely touched

Cons for 2:

  • Validation and translation of generic message objects is going to be messy code that spirit was supposed to eliminate in the first place

Which method is preferable? Is there a third way to parse into multiple types using one grammar?

Here's some example messages and the classes they should eventually reside in:

{"messageType": "messageTypeA", "numberParam": 1}
{"messageType": "messageTypeB", "stringParam": "Test"}

class MessageTypeA
{
public:
    double numberParam;
};

class MessageTypeB
{
public:
    std::string stringParam;
};

Solution

  • I think this question is really close to a recent answer, where I did precisely that: I answered with two answers:

    1. Answer #1 taking the generalist approach, and just interpreting according to a specific "scheme"
    2. Answer #2 taking the ad-hoc approach, which was perceived to be easier by the OP

    My vote is with the first option, because

    • "complex code gets written & tested once, and touched rarely" outweigh the other factors in my experience,
    • in fact the grammar greatly benefits from separated responsibilities and keeping the AST really close to the natural rule attributes
    • I've written multiple "flavours" of JSON backends (OData, Edm, versions, metadata levels) to my Fusion-adapted types (using fusion "reflection"). These all share the same parser/generator.

    Even the OP in the linked question seemed to later need exactly the flexibility that my first answer already afforded:

    Oh well. I'm pretty surprised that you accepted this answer then, since the other answer does exactly the same, except it does accept and ignore "other" JSON content. Did you miss the update that defined extract_from? It uses exactly the same data structure - the one you suggested in the question. – sehe Jan 4 at 16:41