Search code examples
c++parsingboostboost-spirit

Writing a parser for a matrix-like input with Boost Spirit


I'm trying to write a parser that is able to take an input of the form MATRIX.{variableName} = [1,2,3;4,5,6] , where the representation of the matrix (a 2x3 matrix in this case) is somewhat like MATLAB's format (semicolon indicating new row).

The initial idea was to save the input in a 2d std vector for further processing of the data. This is my first time writing a parser and I'm somewhat clueless about the Spirit framework.

My current (not so intuitive) solution is for the input to be something like MATRIX (2,3) = [1,2,3,4,5,6] to represent the same matrix as above and saving the data in a one-dimensional vector and making use of row and column data to process it later (I believe somewhat like Eigen's implementation of dynamic matrices).

namespace client
{
    namespace qi = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;
    namespace phoenix = boost::phoenix;
    namespace fusion = boost::fusion;

    template <typename Iterator>
    bool parse_matrix(Iterator first, Iterator last, unsigned& rows, unsigned& cols, std::vector<double>& vals)
    {
        using qi::double_;
        using qi::uint_;
        using qi::_1;
        using qi::lit;
        using qi::phrase_parse;
        using ascii::space;
        using phoenix::push_back;

        double rN = 0.0;
        double iN = 0.0;
        unsigned i=0;
        rows = 0, cols = 0;
        bool r = phrase_parse(first, last,

            //  Begin grammar
            (
                lit("MATRIX") >> '(' >> uint_[phoenix::ref(rows) = _1] >> ',' >> uint_[phoenix::ref(cols) = _1] >> ')' >> '='
                 >> '[' >> double_[push_back(phoenix::ref(vals),_1)]
                        >> *(',' >> double_[push_back(phoenix::ref(vals),_1)]) >> ']'
                // |   double_[ref(rN) = _1]
            ),
            //  End grammar

            space);

        if (!r || first != last) // fail if we did not get a full match
            return false;
        if (vals.size() != (rows*cols)) 
            return false;
        // c = std::complex<double>(rN, iN);
        return r;
    }
}

I was thinking maybe it'd be possible to call functions like appending a std::vector<double> to the std::vector<std::vector<double> > when certain chars (like semicolon) are being parsed. Is this possible? Or how do I actually go about implementing my initial idea?


Solution

  • I'd suggest:

    • Not using semantic actions for the attribute propagation. You could use it to add validation criteria (see Boost Spirit: "Semantic actions are evil"?)

    • Use automatic attribute propagation so you don't have to pass references around

    • Not validating during parsing unless you have pressing reasons to do so.

    A minimal viable parser then becomes:

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    
    using It  = boost::spirit::istream_iterator;
    using Row = std::vector<double>;
    using Mat = std::vector<Row>;
    
    int main() {
        It f(std::cin>>std::noskipws), l;
    
        Mat matrix;
        std::string name;
    
        {
            using namespace boost::spirit::qi;
            rule<It, std::string()> varname_ = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
    
            if (phrase_parse(f, l, 
                    lit("MATRIX") >> '.' >> '{' >> varname_ >> '}' >> '=' 
                        >> '[' >> (int_ % ',' % ';') >> ']',
                    space, name, matrix))
            {
                std::cout << "Parsed: variabled named '" << name << "' [";
    
                for(auto& row : matrix)
                    std::copy(row.begin(), row.end(), std::ostream_iterator<double>(std::cout<<"\n\t",", "));
                std::cout << "\n]\n";
            } else {
                std::cout << "Parse failed\n";
            }
        }
    
        if (f!=l)
            std::cout << "Remaining input: '" << std::string(f,l) << "'\n";
    }
    

    Which can be seen printing the following output for input "MATRIX.{variable_name}=[1,2,3;4,5,6]":

    Parsed: variabled named 'variable_name' [
        1, 2, 3, 
        4, 5, 6, 
    ]
    

    If you want to catch inconsistent row lengths early on, see e.g. this answer: