Search code examples
c++parsingboost-spiritboost-spirit-x3

Strange semantic behaviour of boost spirit x3 after splitting


I came across a strange behaviour of boost spirit x3, after I splittet my grammar up into the recommended parser.hpp, parser_def.hpp, parser.cpp files. My example gramar parses some kind of easy enums:

enum = "enum" > identifier > "{" > identifier % "," > "}

this is my enum grammar. When I don't split the enum and identifier parser into the recommended files, everything works fine, especially the string "enum {foo, bar}" throws an expectation failure, as expected. This example can be found here: unsplitted working example

But when I split the exactly same grammar up into the different files, the parser throws

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid

trying to parse the same string "enum {foo, bar}"

this example can be found here: splitted strange example

  1. ast.hpp

    #pragma once
    
    #include <vector>
    #include <string>
    #include <boost/fusion/include/adapt_struct.hpp>
    
    
    
    namespace ast{
    
    namespace x3 = boost::spirit::x3;
    
    struct Enum {
        std::string _name;
        std::vector<std::string> _elements;
    };
    
    
    }
    
    BOOST_FUSION_ADAPT_STRUCT(ast::Enum, _name, _elements)
    
  2. config.hpp

    #pragma once 
    
    #include <boost/spirit/home/x3.hpp>
    
    namespace parser{
    
        namespace x3 = boost::spirit::x3;
    
        typedef std::string::const_iterator iterator_type;
        typedef x3::phrase_parse_context<x3::ascii::space_type>::type context_type;
    
    }
    
  3. enum.cpp

    #include "enum_def.hpp"
    #include "config.hpp"
    
    namespace parser { namespace impl {
         BOOST_SPIRIT_INSTANTIATE(enum_type, iterator_type, context_type)
    }}
    
    namespace parser {
    
    const impl::enum_type& enum_parser()
    {
        return impl::enum_parser;
    }
    
    }
    
  4. enum_def.hpp

    #pragma once
    
    #include "identifier.hpp"
    #include "enum.hpp"
    #include "ast.hpp"
    
    namespace parser{ namespace impl{
    
        namespace x3=boost::spirit::x3;
    
        const enum_type enum_parser = "enum";
    
        namespace{
            const auto& identifier = parser::identifier();
        }
        auto const enum_parser_def =
            "enum"
            > identifier
            > "{"
            > identifier % ","
            >"}";
    
        BOOST_SPIRIT_DEFINE(enum_parser)
    }}
    
  5. enum.hpp

    #pragma once
    
    #include <boost/spirit/home/x3.hpp>
    #include "ast.hpp"
    
    namespace parser{ namespace impl{
        namespace x3=boost::spirit::x3;
    
        typedef x3::rule<class enum_class, ast::Enum> enum_type;
    
        BOOST_SPIRIT_DECLARE(enum_type)
    
    }}
    
    namespace parser{
        const impl::enum_type& enum_parser();
    }
    
  6. identifier.cpp

    #include "identifier_def.hpp"
    #include "config.hpp"
    
    namespace parser { namespace impl {
         BOOST_SPIRIT_INSTANTIATE(identifier_type, iterator_type, context_type)
    }}
    
    namespace parser {
    
    const impl::identifier_type& identifier()
    {
        return impl::identifier;
    }
    
    }
    
  7. identifier_def.hpp

    #pragma once
    #include <boost/spirit/home/x3.hpp>
    #include "identifier.hpp"
    
    namespace parser{ namespace impl{
    
        namespace x3=boost::spirit::x3;
    
        const identifier_type identifier = "identifier";    
    
        auto const identifier_def = x3::lexeme[
            ((x3::alpha | '_') >> *(x3::alnum | '_'))
        ];
    
        BOOST_SPIRIT_DEFINE(identifier)
    }}
    
  8. identifier.hpp

    #pragma once
    #include <boost/spirit/home/x3.hpp>
    
    namespace parser{ namespace impl{
        namespace x3=boost::spirit::x3;
    
        typedef x3::rule<class identifier_class, std::string> identifier_type;
    
        BOOST_SPIRIT_DECLARE(identifier_type)
    }}
    
    
    namespace parser{
        const impl::identifier_type& identifier();
    }
    
  9. main.cpp

    #include <boost/spirit/home/x3.hpp>
    #include "ast.hpp"
    #include "enum.hpp"
    
    namespace x3 = boost::spirit::x3;
    
    template<typename Parser, typename Attribute>
    bool test(const std::string& str, Parser&& p, Attribute&& attr)
    {
        using iterator_type = std::string::const_iterator;
        iterator_type in = str.begin();
        iterator_type end = str.end();
    
        bool ret = x3::phrase_parse(in, end, p, x3::ascii::space, attr);
        ret &= (in == end);
        return ret;
    
    }
    
    int main(){
        ast::Enum attr;
        test("enum foo{foo,bar}", parser::enum_parser(), attr);
        test("enum {foo,bar}", parser::enum_parser(), attr);    
    }
    

Is this a bug, am I missing something, or is this an expected behaviour?

EDIT: here is my repo with an example which throws an std::logic_error instead of an expectation_failure


Solution

  • I've found the cause of the bug.

    The bug is with the fact that the expect directive takes it subject parser by value, which is before the parser::impl::identifier initializer runs.

    To visualize, imagine the static initializer for parser::impl::enum_parser running before parser::impl::identifier. This is valid for a compiler to do.

    The copy, therefore, has an uninitialized name field, which fails as soon as the expectation point tries to construct the x3::expectation_failure with the which_ member, because constructing a std::string from a nullptr is illegal.

    All in all, I fear the root cause here is Static Initialization Order Fiasco. I'll see whether I can fix it and submit a PR.

    WORKAROUND:

    An immediate workaround is to list the order of the source files in reverse, so that use comes after definition:

    set(SOURCE_FILES 
        identifier.cpp
        enum.cpp 
        main.cpp 
    )
    

    Note that if this fixes it on your compiler (it does on mine) that is implementation defined. The standard does NOT specify the order of static initialization across compilation units.