c++boost segmentation-fault boost-spirit boost-spirit-x3

X3 parser segfaults with debug output (BOOST_SPIRIT_X3_DEBUG)

Update

This question touches on two issues (as shown by the accepted answer), both of which are present in the version of Boost Spirit X3 that ships with Boost 1.64, but both of which are now fixed (or at least detected at compile time) in develop branch at the time of writing (2017-04-30).

I have updated the mcve project to reflect the changes that I made to use the develop branch instead of the latest boost release, in the hopes that it might help out others who face a similar issues.

The original question

I am trying to learn how to break up Spirit X3 parsers into separate reusable grammars, as encouraged by the example code (rexpr_full and calc in particular) and the presentations from CppCon 2015 and BoostCon.

I have a symbol table (essentially mapping different types to a enum class of the types I am supporting), which I would like to reuse in several parsers. The only example of symbol tables I could find is the roman numerals example, which is in a single source file.

When I try to move the symbol table into its own cpp/h file in the style of the more structured examples my parser will segfault if I try to parse any string which is not in the symbol table. If the symbol table is defined in the same compilation unit as the parsers that use it throws an expectation exception instead (which is what I would expect it to do).

With BOOST_SPIRIT_X3_DEBUG defined I get the following output:

<FruitType>
  <try>GrannySmith: Mammals</try>
  <Identifier>
    <try>GrannySmith: Mammals</try>
    <success>: Mammals</success>
    <attributes>[[
Process finished with exit code 11

I have made a small project which shows what I am trying to achieve and is available here: https://github.com/sigbjornlo/spirit_fruit_mcve

My questions:

Why does moving the symbol parser to a separate compilation unit cause a segmentation fault in this case?
What is the recommended way of making a symbol table reusable in multiple parsers? (In the MCVE I obviously only use the fruit parser in one other parser, but in my full project I want to use it in several other parsers.)

Below is the code for the MCVE project:

main.cpp

#include <iostream>

#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>

#include "common.h"
#include "fruit.h"

namespace ast {
    struct FruitType {
        std::string identifier;
        FRUIT fruit;
    };
}

BOOST_FUSION_ADAPT_STRUCT(ast::FruitType, identifier, fruit);

namespace parser {
    // Identifier
    using identifier_type = x3::rule<class identifier, std::string>;
    const auto identifier = identifier_type {"Identifier"};
    const auto identifier_def = x3::raw[x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')]];
    BOOST_SPIRIT_DEFINE(identifier);

    // FruitType
    struct fruit_type_class;
    const auto fruit_type = x3::rule<fruit_type_class, ast::FruitType> {"FruitType"};

    // Using the sequence operator creates a grammar which fails gracefully given invalid input.
    // const auto fruit_type_def = identifier >> ':' >> make_fruit_grammar();

    // Using the expectation operator causes EXC_BAD_ACCESS exception with invalid input.
    // Instead, I would have expected an expectation failure exception.
    // Indeed, an expectation failure exception is thrown when the fruit grammar is defined here in this compilation unit instead of in fruit.cpp.
    const auto fruit_type_def = identifier > ':' > make_fruit_grammar();

    BOOST_SPIRIT_DEFINE(fruit_type);
}

int main() {
    std::string input = "GrannySmith: Mammals";
    parser::iterator_type it = input.begin(), end = input.end();

    const auto& grammar = parser::fruit_type;
    auto result = ast::FruitType {};

    bool successful_parse = boost::spirit::x3::phrase_parse(it, end, grammar, boost::spirit::x3::ascii::space, result);
    if (successful_parse && it == end) {
        std::cout << "Parsing succeeded!\n";
        std::cout << result.identifier << " is a kind of " << to_string(result.fruit) << "!\n";
    } else {
        std::cout << "Parsing failed!\n";
    }

    return 0;
}

std::string to_string(FRUIT fruit) {
    switch (fruit) {
        case FRUIT::APPLES:
            return "apple";
        case FRUIT::ORANGES:
            return "orange";
    }
}

common.h

#ifndef SPIRIT_FRUIT_COMMON_H
#define SPIRIT_FRUIT_COMMON_H

namespace x3 = boost::spirit::x3;

enum class FRUIT {
    APPLES,
    ORANGES
};

std::string to_string(FRUIT fruit);

namespace parser {
    using iterator_type = std::string::const_iterator;
    using context_type = x3::phrase_parse_context<x3::ascii::space_type>::type;
}

#endif //SPIRIT_FRUIT_COMMON_H

fruit.h

#ifndef SPIRIT_FRUIT_FRUIT_H
#define SPIRIT_FRUIT_FRUIT_H

#include <boost/spirit/home/x3.hpp>

#include "common.h"

namespace parser {
    struct fruit_class;
    using fruit_grammar = x3::rule<fruit_class, FRUIT>;

    BOOST_SPIRIT_DECLARE(fruit_grammar)

    fruit_grammar make_fruit_grammar();
}


#endif //SPIRIT_FRUIT_FRUIT_H

fruit.cpp

#include "fruit.h"

namespace parser {
    struct fruit_symbol_table : x3::symbols<FRUIT> {
        fruit_symbol_table() {
            add
                    ("Apples", FRUIT::APPLES)
                    ("Oranges", FRUIT::ORANGES);
        }
    };

    struct fruit_class;
    const auto fruit = x3::rule<fruit_class, FRUIT> {"Fruit"};
    const auto fruit_def = fruit_symbol_table {};
    BOOST_SPIRIT_DEFINE(fruit);

    BOOST_SPIRIT_INSTANTIATE(fruit_grammar, iterator_type, context_type);

    fruit_grammar make_fruit_grammar() {
        return fruit;
    }
}

Solution

Very good work on the reproducer. This reminded me of my PR https://github.com/boostorg/spirit/pull/229 (see analysis here Strange semantic behaviour of boost spirit x3 after splitting).

The problem would be Static Initialization Order Fiasco taking copies of the debug-names of rules before they became initialized.

In fact, indeed disabling the debug information does remove the crash, and correctly throws the expectation failure.

The same happens with the develop branch¹, so either there is another similar thing, or I missed a spot. For now, know you can disable the debug output. I'll post an update if I find the spot.

UPDATE:

I didn't miss a spot. There's a separate issue in call_rule_definition where it parameterizes the context_debug<> helper class with the actual attribute type instead of the transformed one:

#if defined(BOOST_SPIRIT_X3_DEBUG)
                typedef typename make_attribute::type dbg_attribute_type;
                context_debug<Iterator, dbg_attribute_type>
                dbg(rule_name, first, last, dbg_attribute_type(attr_), ok_parse);
#endif

The comment seems to suggest that this behaviour is as desired: it tries to print the attribute before transformation. However, it totally doesn't work unless the synthesized attribute type matches the actual attribute type. In this case, it makes context_debug take a dangling reference to a temporary converted attribute, leading to Undefined Behaviour.

It's in fact also undefined behaviour in the working cases. I can only assume things happen to pan out nicely in the inline-definition case, making it seem like things work as intended.

To the best of my knowledge this would be a clean fix, preventing any unwarranted conversions and the temporaries that come with them:

#if defined(BOOST_SPIRIT_X3_DEBUG)
                context_debug<Iterator, transform_attr>
                dbg(rule_name, first, last, attr_, ok_parse);
#endif

I've created pull request for this: https://github.com/boostorg/spirit/pull/232

¹ develop branch doesn't seem merged into the 1.64 release