Search code examples
c++boostboost-spiritboost-spirit-x3

Spirit X3, How to get attribute type to match rule type?


For the development of Spirit X3 parser I want to use semantic actions(footnote 1). It is important for me to be in control of how to store attributes into STL containers.

This question is about how to control that the parser attribute: _attr( ctx ) match the rule type: _val( ctx ) so that it can be assigned properly. Maybe this question boils down to how to apply the undocumented transform_attribute feature. But please read with me to see if that is actually the thing that solves it for me in the example code.

Printing types of objects/variables

What I found very useful is the ability to print the type of _attr( ctx ) and _val( ctx ) in an semantic action, when I am experimenting with different grammar expressions.

So based on the answer of Howard Hinnant, I wrote a utility header file to provide facilities like this according to my preferences.

code below is to be put in a file named utility.h

#include <string>
#include <type_traits>
#include <typeinfo>
#include <cxxabi.h>

namespace utility
{

template<typename T>
std::string type2string()
{
  std::string r;
  typedef typename std::remove_reference<T>::type TR;

  std::string space = "";
  if ( std::is_const<TR>::value )
    { r = "const"; space = " "; }
  if ( std::is_volatile<TR>::value )
    { r += space + " volatile"; space = " "; }

  int status;
  char* demangled =
    abi::__cxa_demangle( typeid(TR).name(), nullptr, nullptr, &status );
  switch ( status )
  {
    case  0: { goto proceed; }
    case -1: { r = "type2string failed: malloc failure"; goto fail; }
    case -2: { r = "type2string failed: " + std::string(typeid(TR).name()) +
      " nonvalid C++ ABI name"; goto fail; }
    case -3: { r = "type2string failed: invalid argument(s)"; goto fail; }
    default: { r = "type2string failed: unknown status " +
      status; goto fail; }
  }
  proceed:
  r += space + demangled;
  free( demangled );

  /* references are without a space */
  if ( std::is_lvalue_reference<T>::value ) { r += '&'; }
  if ( std::is_rvalue_reference<T>::value ) { r += "&&"; }

  fail:
  return r;
}

}

Now the actual working example code:

#include <cstddef>
#include <cstdio>
#include <cstdint>

#define BOOST_SPIRIT_X3_DEBUG
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/home/x3.hpp>

#include <string>
#include <vector>
#include <utility> // this is for std::move
#include "utility.h" // to print types

namespace client
{
  namespace x3 = boost::spirit::x3;
  namespace ascii = boost::spirit::x3::ascii;

  namespace semantic_actions
  {
    using x3::_val;  // assign to _val( ctx )
    using x3::_attr; // from _attr( ctx )    

    struct move_assign
    {  
      template <typename Context>
      void operator()(const Context& ctx) const
      {
        printf( "move_assign\n" );
        _val( ctx ) = std::move( _attr( ctx ) );
      }
    };

    struct print_type
    {
      template <typename Context>
      void operator()(const Context& ctx) const
      {
        printf( "print_type\n" );

        std::string str;
        str = utility::type2string< decltype( _attr( ctx ) ) >();
        printf( "_attr type: %s\n", str.c_str() );

        // reuse str
        str = utility::type2string< decltype( _val( ctx ) ) >();
        printf( "_val type: %s\n", str.c_str() );
      }
    };
  }

  namespace parser
  {
    using x3::char_;
    using x3::lit;
    using namespace semantic_actions;

    x3::rule<struct main_rule_class, std::string> main_rule_ = "main_rule";

    const auto main_rule__def = (*( !lit(';') >> char_) >> lit(';'))[print_type()][move_assign()];

    BOOST_SPIRIT_DEFINE( main_rule_ )

    const auto entry_point = x3::skip(x3::space)[ main_rule_ ];
  }
}

int main()
{
  printf( "Give me a string to test rule.\n" );
  printf( "Type [q or Q] to quit.\n" );

  std::string input_str;
  std::string output_str;

  while (getline(std::cin, input_str))
  {
    if ( input_str.empty() || input_str[0] == 'q' || input_str[0] == 'Q')
    { break; }

    auto first = input_str.begin(), last = input_str.end();

    if ( parse( first, last, client::parser::entry_point, output_str) )
    {
      printf( "Parsing succeeded\n" );
      printf( "input:  \"%s\"\n", input_str.c_str() );
      printf( "output: \"%s\"\n", output_str.c_str() );
    }
    else
    {
      printf( "Parsing failed\n" );
    }
  }

  return 0;
}

The input is always: abcd;

output:

Give me a string to test rule.
Type [q or Q] to quit.
<main_rule>
  <try>abcd;</try>
print_type
_attr type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
_val type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
move_assign
  <success></success>
  <attributes>[a, b, c, d]</attributes>
</main_rule>
Parsing succeeded
input:  "abcd;"
output: "abcd"

Ok, so far all fine but assume I would like to include the semicolon in the parsed result. I change the grammar line to:

const auto main_rule__def = (*( !lit(';') >> char_) >> char_(";"))[print_type()];

Note: I removed the semantic action [move_assign()] because it fails to compile due to incompatible _attr and _val types. Now the output is:

Give me a string to test rule.
Type [q or Q] to quit.
<main_rule>
  <try>abcd;</try>
print_type
_attr type: boost::fusion::deque<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, char>&
_val type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
  <success></success>
  <attributes>[]</attributes>
</main_rule>
Parsing succeeded
input:  "abcd;"
output: ""

Now the _attr type of boost::fusion::deque<> is not what I want, I just what it to be std::string. I don’t understand why if I have the complete right side of the grammar assignment within semantic action parentheses _attr is still not of the _val type. Would the X3 feature transform_attribute help here? And how should I apply that? Or what is another good way to solve this, without having to work with boost fusion class interfaces or other implementation details.

Current workaround

The current workaround for me is to define another rule just to be assigned from the first rule with a semantic action. Only there the _attr is of std::string type.

  namespace parser
  {
    using x3::char_;
    using x3::lit;
    using namespace semantic_actions;

    x3::rule<struct main_rule_class, std::string> main_rule_ = "main_rule";
    x3::rule<struct main_rule2_class, std::string> main_rule2_ = "main_rule2";

    const auto main_rule__def = *( !lit(';') >> char_) >> char_(";");
    const auto main_rule2__def = main_rule_[print_type()][move_assign()];

    BOOST_SPIRIT_DEFINE( main_rule_, main_rule2_ )

    const auto entry_point = x3::skip(x3::space)[ main_rule2_ ];
  }

output:

Give me a string to test rule.
Type [q or Q] to quit.
<main_rule2>
  <try>abcd;</try>
  <main_rule>
    <try>abcd;</try>
    <success></success>
    <attributes>[a, b, c, d, ;]</attributes>
  </main_rule>
print_type
_attr type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
_val type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&
move_assign
  <success></success>
  <attributes>[a, b, c, d, ;]</attributes>
</main_rule2>
Parsing succeeded
input:  "abcd;"
output: "abcd;"

I hope there is a way without having to make another rule just to get the type of _attr to match _val.

(1) I don’t appreciate the hidden cleverness the authors put into this library. As just one innocent looking change can break the application. Whereas a more explicit and elaborate approach will communicate much clearer what is going on. I just have to get this off my chest.


Solution

  • Direct Answer

    transform_attribute is not yet documented for X3 (https://www.boost.org/doc/libs/1_70_0/libs/spirit/doc/x3/html/index.html) but you can find its Qi counterpart here: https://www.boost.org/doc/libs/1_70_0/libs/spirit/doc/html/spirit/advanced/customize/transform.html.

    Would the X3 feature transform_attribute help here? And how should I apply that?

    Regardless, it's an implementation detail that you can easily access by using rules. I like to use anonymous rules to help with this:

    template <typename T>
        struct as_type {
            template <typename E>
            constexpr auto operator[](E e) const { return x3::rule<struct _, T> {} = e; }
        };
    
    template <typename T>
        static inline constexpr as_type<T> as;
    

    Now you can write

    const auto main_rule__def = as<std::string> [ (*(char_ - ';') >> char_(';')) ];
    

    Live On Coliru

    #include <iostream>
    //#define BOOST_SPIRIT_X3_DEBUG
    #include <boost/spirit/home/x3.hpp>
    #include <iomanip> // std::quoted
    
    namespace client {
        namespace x3 = boost::spirit::x3;
        namespace ascii = boost::spirit::x3::ascii;
    
        namespace parser {
            using x3::char_;
            using x3::lit;
    
            x3::rule<struct main_rule_class, std::string> main_rule_ = "main_rule";
    
            template <typename T>
                struct as_type {
                    template <typename E>
                    constexpr auto operator[](E e) const { return x3::rule<struct _, T> {} = e; }
                };
    
            template <typename T>
                static inline constexpr as_type<T> as;
    
            const auto main_rule__def = as<std::string> [ (*(char_ - ';') >> char_(';')) ];
    
            BOOST_SPIRIT_DEFINE(main_rule_)
    
            const auto entry_point = x3::skip(x3::space)[main_rule_];
        } // namespace parser
    } // namespace client
    
    int main() {
        std::string output_str;
        for(std::string const input_str : { "abcd;" }) {
            auto first = input_str.begin(), last = input_str.end();
    
            if (parse(first, last, client::parser::entry_point, output_str)) {
                std::cout << "Parsing succeeded\n";
                std::cout << "input:  " << std::quoted(input_str) << "\n";
                std::cout << "output:  " << std::quoted(output_str) << "\n";
            } else {
                std::cout << "Parsing failed\n";
            }
        }
    }
    

    Prints

    Parsing succeeded
    input:  "abcd;"
    output:  "abcd;"
    

    In theory there might be performance overhead, but I strongly suspect all compilers will inline everything here since nothing has external linkage or vtables, and everything is const/constexpr.

    Alternatives, simplifications:

    Use x3::raw

    In this case you could have gotten the behaviour you want using an existing directive: x3::raw

    Live On Coliru

    const auto main_rule__def = x3::raw [ *(char_ - ';') >> ';' ];
    

    Don't use rule<> always

    Only required if you have recursive rules or need external linkage on rules (define them in separate translation units). The whole program shrinks to ...

    Live On Coliru

    #include <iostream>
    #include <boost/spirit/home/x3.hpp>
    #include <iomanip> // std::quoted
    
    namespace x3 = boost::spirit::x3;
    namespace client::parser {
        auto const entry_point = x3::raw [ *(x3::char_ - ';') >> ';' ];
    }
    
    int main() {
        for(std::string const input : { "abcd;" }) {
            std::string output;
            if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
                std::cout << "Parsing succeeded\n";
                std::cout << "input:  " << std::quoted(input) << "\n";
                std::cout << "output: " << std::quoted(output) << "\n";
            } else {
                std::cout << "Parsing failed\n";
            }
        }
    }
    

    Finally - About skipping

    I don't think you want char_ - ';' (or the more elaborate way you spelled it: !lit(';') >> char_). With the skipper it will parse across whitespace ("ab c\nd ;" -> "abcd;"`).

    You would probably want to make the rule more restrictive (like lexeme [+(graph - ';')] or even simply raw[lexeme[+(alnum|'_')] or lexeme[+char_("a-zA-Z0-9_")]).

    See Boost spirit skipper issues