Search code examples
c++boostboost-spiritboost-spirit-qi

Boost Spirit Qi validating input parser


I have a very basic Boost Spirit Qi grammar to parse either IP port or IP port range, i.e. either "6322" or "6322-6325".

The grammar looks like:

  template<class It>
  void init_port_rule(u16_rule<It>& port)
  {
    port = boost::spirit::qi::uint_parser<uint16_t, 10, 2, 5>();
    port.name("valid port range: (10, 65535)");
  }

  typedef boost::fusion::vector
    < std::uint16_t
    , boost::optional<std::uint16_t>
    > port_range_type
  ;

  template<class It>
  struct port_range_grammar
    : boost::spirit::qi::grammar
      < It
      , port_range_type()
      >
  {
    typedef typename port_range_grammar::base_type::sig_type signature;

    port_range_grammar()
      : port_range_grammar::base_type(start, "port_range")
    {
      init_port_rule(port);
      using namespace boost::spirit::qi;
      start = port > -(lit("-") > port);
    }

  private:
    boost::spirit::qi::rule<It, signature> start;
    boost::spirit::qi::rule<It, std::uint16_t()> port;
  };

I am a bit stuck to define, that in a range port1 must be less than port2. I think I have to use eps parser here, but do not seem to find the proper way to specify it. Any suggestions are very welcome.


Solution

  • You can indeed use semantic actions. You don't always need to attach them to an eps node, though. Here's what you'd get if you do:

    port %= uint_parser<uint16_t, 10, 2, 5>() >> eps[ _pass = (_val>=10 && _val<=65535) ];
    start = (port >> -('-' >> port)) >> eps(validate(_val));
    

    Note that the one rule uses Simple Form eps with semantic action attached. This requires operator%= to still invoke automatic attribute propagation.

    The second instance uses the Semantic Predicate form of eps. The validate function needs to be a Phoenix Actor, I defined it like:

    struct validations {
        bool operator()(PortRange const& range) const {
            if (range.end)
                return range.start<*range.end;
            return true;
        }
    };
    boost::phoenix::function<validations> validate;
    

    More Generic/Consistent

    Note you can use the second rule style on both rules like so:

    port %= uint_parser<Port, 10, 2, 5>() >> eps(validate(_val));
    start = (port >> -('-' >> port))      >> eps(validate(_val));
    

    if you simply add an overload to validate a single port:

    struct validations {
        bool operator()(Port const& port) const {
            return port>=10 && port<=65535;
        }
        bool operator()(PortRange const& range) const {
            if (range.end)
                return range.start<*range.end;
            return true;
        }
    };
    

    First Tests

    Let's define some nice edge cases and test them!

    Live On Coliru

    #include <boost/fusion/adapted/struct.hpp>
    #include <boost/optional/optional_io.hpp>
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    namespace qi = boost::spirit::qi;
    
    using Port = std::uint16_t;
    
    struct PortRange {
        Port start;
        boost::optional<Port> end;
    };
    
    BOOST_FUSION_ADAPT_STRUCT(PortRange, start, end)
    
    template <class It, typename Attr = PortRange> struct port_range_grammar : qi::grammar<It, Attr()> {
    
        port_range_grammar() : port_range_grammar::base_type(start, "port_range") {
            using namespace qi;
    
            port %= uint_parser<Port, 10, 2, 5>() >> eps(validate(_val));
            start = (port >> -('-' >> port))      >> eps(validate(_val));
    
            port.name("valid port range: (10, 65535)");
        }
    
      private:
        struct validations {
            bool operator()(Port const& port) const {
                return port>=10 && port<=65535;
            }
            bool operator()(PortRange const& range) const {
                if (range.end)
                    return range.start<*range.end;
                return true;
            }
        };
        boost::phoenix::function<validations> validate;
        qi::rule<It, Attr()> start;
        qi::rule<It, Port()> port;
    };
    
    int main() {
        using It = std::string::const_iterator;
        port_range_grammar<It> const g;
    
        std::string const valid[]   = {"10", "6322", "6322-6325", "65535"};
        std::string const invalid[] = {"9", "09", "065535", "65536", "-1", "6325-6322"};
    
        std::cout << " -------- valid cases\n";
        for (std::string const input : valid) {
            It f=input.begin(), l = input.end();
            PortRange range;
            bool accepted = parse(f, l, g, range);
            if (accepted)
                std::cout << "Parsed '" << input << "' to " << boost::fusion::as_vector(range) << "\n";
            else
                std::cout << "TEST FAILED '" << input << "'\n";
        }
    
        std::cout << " -------- invalid cases\n";
        for (std::string const input : invalid) {
            It f=input.begin(), l = input.end();
            PortRange range;
            bool accepted = parse(f, l, g, range);
            if (accepted)
                std::cout << "TEST FAILED '" << input << "' (returned " << boost::fusion::as_vector(range) << ")\n";
        }
    }
    

    Prints:

     -------- valid cases
    Parsed '10' to (10 --)
    Parsed '6322' to (6322 --)
    Parsed '6322-6325' to (6322  6325)
    Parsed '65535' to (65535 --)
     -------- invalid cases
    TEST FAILED '065535' (returned (6553 --))
    

    CONGRATULATIONS We found a broken edge case

    Turns out that by limiting uint_parser to 5 positions, we may leave characters in the input, so that 065535 parses as 6553 (leaving '5' unparsed...). Fixing that is simple:

    start = (port >> -('-' >> port)) >> eoi >> eps(validate(_val));
    

    Or indeed:

    start %= (port >> -('-' >> port)) >> eoi[ _pass = validate(_val) ];
    

    Fixed version Live On Coliru

    A Few Words About The Attribute Type

    You will have noticed I revised your attribute type. Most of this is "good taste". Note, in practice you might want to represent your range as either single-port or range:

    using Port = std::uint16_t;
    
    struct PortRange {
        Port start, end;
    };
    
    using PortOrRange = boost::variant<Port, PortRange>;
    

    Which you would then parse like:

    port %= uint_parser<Port, 10, 2, 5>() >> eps(validate(_val));
    range = (port >> '-' >> port)         >> eps(validate(_val));
    
    start = (range | port) >> eoi;
    

    Full demo Live On Coliru

    You might think this will get unweildy to use. I agree!

    Simplify Instead

    Let's do without variant or optional in the first place. Let's make a single port just a range which happens to have start==end:

    using Port = std::uint16_t;
    
    struct PortRange {
        Port start, end;
    };
    

    Parse it like:

    start = port >> -('-' >> port | attr(0)) >> eoi >> eps(validate(_val));
    

    All we do in validate is to check whether end is 0:

        bool operator()(PortRange& range) const {
            if (range.end == 0) 
                range.end = range.start;
            return range.start <= range.end;
        }
    

    And now the output is: Live On Coliru

     -------- valid cases
    Parsed '10' to (10-10)
    Parsed '6322' to (6322-6322)
    Parsed '6322-6325' to (6322-6325)
    Parsed '65535' to (65535-65535)
     -------- invalid cases
    

    Note how you can now always enumerate start..end without knowing whether there was a port or a port-range. This may be convenient (depending a bit on the logic you're implementing).