Search code examples
c++parsingboostsymbolsboost-spirit

How do I take the output of a parse and use it to look up in a symbols


As can be seen from the code I'm taking the output of one parse and using it to look up the number from the symbols in a second parse. How do I do this as a single rule? Looking at the docs and doing a lot of searching leads me to believe this can be done with a local var, but I can't figure out how to use my symbols quad on that var.

int main()
{
  using boost::phoenix::ref;
  using qi::_1;
  using qi::_val;
  using qi::no_case;
  using qi::_a;
  using qi::symbols;
  using qi::char_;
  using qi::omit;

  symbols<char, int> quad;
  quad.add
  ("1", 1)
    ("2", 2)
    ("3", 3)
    ("4", 4)
    ("NE", 1)
    ("SE", 2)
    ("SW", 3)
    ("NW", 4)
    ;

  std::wstring s = L"N44°30'14.950\"W";

  std::wstring out;
  int iQuad;
  qi::parse(s.begin(), s.end(), 
    no_case[char_('N')] >> omit[*(qi::char_ - no_case[char_("NSEW")])] >> no_case[char_('W')],
    out);
  qi::parse(out.begin(), out.end(), quad, iQuad);
  return 0;
}

Solution

  • Yes it can be done with a local var.

    However, that demotes symbols to a regular map. So let's use that¹

    1. The simplest thing

    Firstly, I'd consider doing the simplest thing:

    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    #include <iostream>
    
    namespace Rules {
        namespace qi = boost::spirit::qi;
    
        qi::rule<std::wstring::const_iterator, int()> quad = qi::no_case [
            ('N' >> *~qi::char_("EW") >> 'E')[ qi::_val = 1 ] |
            ('S' >> *~qi::char_("EW") >> 'E')[ qi::_val = 2 ] |
            ('S' >> *~qi::char_("EW") >> 'W')[ qi::_val = 3 ] |
            ('N' >> *~qi::char_("EW") >> 'W')[ qi::_val = 4 ] 
        ];
    }
    
    int main() {
        for (std::wstring const s : {
                L"NE", L"SE", L"SW", L"NW",
                L"N44°30'14.950\"E", 
                L"N44°30'14.950\"W", 
                L"S44°30'14.950\"W", 
                L"S44°30'14.950\"E", 
                L"1", L"2", L"3", L"4",
            })
        {
            int iQuad;
            auto f = s.begin(), l = s.end();
            bool ok = parse(f, l, Rules::quad, iQuad);
    
            if (ok)
                std::wcout << L"Parsed: '" << s << L"' -> " << iQuad << L"\n";
            else
                std::wcout << L"Parse failed '" << s << L"'\n";
    
            if (f!=l)
                std::wcout << L"Remaining unparsed: '" << std::wstring(f,l) << L"'\n";
        }
    }
    

    Which prints

    Live On Coliru

    Parsed: 'NE' -> 1
    Parsed: 'SE' -> 2
    Parsed: 'SW' -> 3
    Parsed: 'NW' -> 4
    Parsed: 'N44?30'14.950"E' -> 1
    Parsed: 'N44?30'14.950"W' -> 4
    Parsed: 'S44?30'14.950"W' -> 3
    Parsed: 'S44?30'14.950"E' -> 2
    Parse failed '1'
    Remaining unparsed: '1'
    Parse failed '2'
    Remaining unparsed: '2'
    Parse failed '3'
    Remaining unparsed: '3'
    Parse failed '4'
    Remaining unparsed: '4'
    

    If you want to make the numerics parse as well, just add

    qi::rule<std::wstring::const_iterator, int()> quad = qi::no_case [
        (qi::int_(1) | qi::int_(2) | qi::int_(3) | qi::int_(4)) [ qi::_val = qi::_1 ] |
        ('N' >> *~qi::char_("EW") >> 'E')[ qi::_val = 1 ] |
        ('S' >> *~qi::char_("EW") >> 'E')[ qi::_val = 2 ] |
        ('S' >> *~qi::char_("EW") >> 'W')[ qi::_val = 3 ] |
        ('N' >> *~qi::char_("EW") >> 'W')[ qi::_val = 4 ] 
    ];
    

    All this can be optimized, but I'll venture the guess that it's more efficient than anything based on symbol and 2-phase parse

    2. Using a map lookup

    Just... use a map:

    template <typename It> struct MapLookup : qi::grammar<It, int()> {
        MapLookup() : MapLookup::base_type(start) {
            namespace px = boost::phoenix;
    
            start = qi::as_string [
                qi::char_("1234") | 
                qi::char_("nsNS") >> qi::omit[*~qi::char_("weWE")] >> qi::char_("weWE")
            ] [ qi::_val = px::ref(_lookup)[qi::_1] ];
        }
      private:
        struct ci {
            template <typename A, typename B>
            bool operator()(A const& a, B const& b) const { return boost::ilexicographical_compare(a, b); }
        };
        std::map<std::string, int, ci> _lookup = { 
            { "NE", 1 }, { "SE", 2 }, { "SW", 3 }, { "NW", 4 },
            { "1" , 1 }, { "2",  2 }, { "3",  3 }, { "4",  4 } };
        qi::rule<It, int()> start;
    };
    

    See it Live On Coliru too.

    3. Optimizing it

    qi::symbol uses Tries. You might think that's faster. It is, in fact pretty fast for lookups. But not on very small keysets. On a node-based container. Using dynamically allocated temporary keys.

    In other words, we can do much better:

    template <typename It> struct FastLookup : qi::grammar<It, int()> {
        using key = std::array<char, 2>;
    
        FastLookup() : FastLookup::base_type(start) {
            namespace px = boost::phoenix;
    
            start = 
                qi::int_ [ qi::_pass = (qi::_1 > 0 && qi::_1 <= 4), qi::_val = qi::_1 ] |
                qi::raw [
                    qi::char_("nsNS") >> qi::omit[*~qi::char_("weWE")] >> qi::char_("weWE")
                ] [ qi::_val = _lookup(qi::_1) ];
        }
      private:
        struct lookup_f {
            template <typename R> int operator()(R const& range) const {
                using key = std::tuple<char, char>;
                static constexpr key index[] = { key {'N','E'}, key {'S','E'}, key {'S','W'}, key {'N','W'}, };
    
                using namespace std;
                auto a = std::toupper(*range.begin());
                auto b = std::toupper(*(range.end()-1));
                return 1 + (find(begin(index), end(index), key(a, b)) - begin(index));
            }
        };
    
        boost::phoenix::function<lookup_f> _lookup;
        qi::rule<It, int()> start;
    };
    

    See it Live Again On Coliru


    ¹ if you insist you can use symbols in your own code