I'm writing a parser in Spirit X3 in order to get familiar with it, and even though I'm pretty familiar Qi I'm still hitting some stumbling blocks in X3.
For example, the Qi examples include a basic XML parser that should you how to match a previously matched value using Phoenix placeholders. However, I've only kinda been able to figure it out in X3:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
namespace x3 = boost::spirit::x3;
namespace mytest
{
struct SimpleElement
{
std::string tag;
std::string content;
};
} // namespace bbspirit
BOOST_FUSION_ADAPT_STRUCT
(
mytest::SimpleElement, tag, content
)
namespace mytest
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using x3::lit;
using x3::lexeme;
using ascii::char_;
const x3::rule<class SimpleElementID, SimpleElement> simpleTag = "simpleTag";
auto assignTag = [](auto& ctx)
{
x3::_val(ctx).tag = x3::_attr(ctx);
};
auto testTag = [](auto& ctx)
{
x3::_pass(ctx) =
(x3::_val(ctx).tag == x3::_attr(ctx));
};
auto assignContent = [](auto& ctx)
{
x3::_val(ctx).content = x3::_attr(ctx);
};
auto const simpleTag_def
= '['
>> x3::lexeme[+(char_ - ']')][assignTag]
>> ']'
>> x3::lexeme[
+(char_ - x3::lit("[/"))]
[assignContent]
>> "[/"
>> x3::lexeme[+(char_ - ']')][testTag]
>> ']'
;
BOOST_SPIRIT_DEFINE(simpleTag);
} // namespace bbspirit
int main()
{
const std::string text = "[test]Hello World![/test]";
std::string::const_iterator start = std::begin(text);
const std::string::const_iterator stop = std::end(text);
mytest::SimpleElement element{};
bool result =
phrase_parse(start, stop, mytest::simpleTag, x3::ascii::space, element);
if (!result)
{
std::cout << "failed to parse!\n";
}
else
{
std::cout << "tag : " << element.tag << '\n';
std::cout << "content: " << element.content << '\n';
}
}
(Link: https://wandbox.org/permlink/xLZN9plcOwkSKCrD )
This works, however if I try to parse something like [test]Hello [/World[/test]
it doesn't work because I have not specified the correct omission here:
>> x3::lexeme[
+(char_ - x3::lit("[/"))]
[assignContent]
Essentially I want to tell the parser something like:
>> x3::lexeme[
+(char_ - (x3::lit("[/") << *the start tag* << ']') )]
[assignContent]
How could I go about doing this? Also, is the way in which I'm referencing the start tag and later matching it the "best" way to do this in X3 or is there a better/more preferred way?
Thank you!
Nice question.
The best answer would be to do exactly what XML does: outlaw [/
inside the tag data. In fact, XML outlaws <
(because it could be opening a nested tag, and you don't want to have to potentially read-ahead the entire stream to find whether it is a valid subtag).
XML uses character entities ("escapes" like
<
and>
) or unparsed character data (CDATA[]
) to encode contents that requires these characters.
Next up, you can, of course do a negative lookahead assertion (!closeTag
or -closeTag
) using the tag
attribute member like you already did.
Reshuffling the rule spelling a litte, it's not even that bad
Note I removed the need for manual propagation of the tag/contents using the
, true>
template argument onsimpleTag
rule. See Boost Spirit: "Semantic actions are evil"?
const x3::rule<class SimpleElementID, SimpleElement, true> simpleTag = "simpleTag";
auto testTag = [](auto& ctx) { _pass(ctx) = (_val(ctx).tag == _attr(ctx)); };
auto openTag = '[' >> x3::lexeme[+(char_ - ']')] >> ']';
auto closeTag = "[/" >> x3::lexeme[+(char_ - ']')] [testTag] >> ']';
auto tagContents = x3::lexeme[ +(char_ - closeTag) ];
auto const simpleTag_def
= openTag
>> tagContents
>> x3::omit [ closeTag ]
;
See it Live On Coliru
That works but ends up getting quite clumsy, because it means using semantic actions all around and also go against the natural binding of attribute references.
Thinking outside the box a litte:
In Qi you'd use qi::locals
or inherited attributes for this (see a very similar example in the docs: MiniXML).
Both of these would have the net effect of extending the parser context with your piece(s) of information.
X3 has no such "high-level" features. But it does have the building block to extend your context: x3::witt<>(data) [ p ]
.
In this simple example it would seem overkill, but at some point you will appreciate how you use extra context in your rules without holding your attribute types hostage:
struct TagName{};
auto openTag
= x3::rule<struct openTagID, std::string, true> {"openTag"}
= ('[' >> x3::lexeme[+(char_ - ']')] >> ']')
[([](auto& ctx) { x3::get<TagName>(ctx) = _attr(ctx); })]
;
auto closeTag
= x3::rule<struct closeTagID, std::string, true> {"closeTag"}
= ("[/" >> x3::lexeme[+(char_ - ']')] >> ']')
[([](auto& ctx) { _pass(ctx) = (x3::get<TagName>(ctx) == _attr(ctx)); })]
;
auto tagContents
= x3::rule<struct openTagID, std::string> {"tagContents"}
= x3::lexeme[ +(char_ - closeTag) ];
auto const simpleTag
= x3::rule<class SimpleElementID, SimpleElement, true> {"simpleTag"}
= x3::with<TagName>(std::string()) [
openTag
>> tagContents
>> x3::omit [ closeTag ]
];
See it Live On Coliru