Search code examples
c++c++11boostboost-spirit

Using Boost.Spirit.Lex and stream iterators


I want use Boost.Spirit.Lex to lex a binary file; for this purpose I wrote the following program (here is an extract):

#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/support_multi_pass.hpp>
#include <boost/bind.hpp>
#include <boost/ref.hpp>
#include <fstream>
#include <iterator>
#include <string>

namespace spirit = boost::spirit;
namespace lex = spirit::lex;

#define X 1
#define Y 2
#define Z 3

template<typename L>
class word_count_tokens : public lex::lexer<L>
{
    public:
        word_count_tokens () {
            this->self.add
                ("[^ \t\n]+", X)
                ("\n", Y)
                (".", Z);
        }
};

class counter
{
    public:
        typedef bool result_type;

        template<typename T>
        bool operator () (const T &t, size_t &c, size_t &w, size_t &l) const {
            switch (t.id ()) {
               case X:
                   ++w; c += t.value ().size ();
                    break;
               case Y:
                   ++l; ++c;
                    break;
                case Z:
                    ++c;
                    break;
            }

            return true;
        }
};

int main (int argc, char **argv)
{
    std::ifstream ifs (argv[1], std::ios::in | std::ios::binary);
    auto first = spirit::make_default_multi_pass (std::istream_iterator<char> (ifs));
    auto last = spirit::make_default_multi_pass (std::istream_iterator<char> ());
    size_t w, c, l;
    word_count_tokens<lex::lexertl::lexer<>> word_count_functor;

    w = c = l = 0;

    bool r = lex::tokenize (first, last, word_count_functor, boost::bind (counter (), _1, boost::ref (c), boost::ref (w), boost::ref (l)));

    ifs.close ();

    if (r) {
        std::cout << l << ", " << w << ", " << c << std::endl;
    }

    return 0;
}

The build returns the following error:

lexer.hpp:390:46: error: non-const lvalue reference to type 'const char *' cannot bind to a value of unrelated type

Now, the error is due to definition of concrete lexer, lex::lexer<>; in fact its first parameter is defaulted to const char *. I obtain the same error also if I use spirit::istream_iterator or spirit::make_default_multi_pass (.....).
But if I specify the correct template parameters of lex::lexer<> I obtain a plethora of errors!

Solutions?

Update

I have putted all source file; it's the word_counter site's example.


Solution

  • Okay, since the question was changed, here's a new answer, addressing some points with the complete code sample.

    1. Firstly, you need to use a custom token type. I.e.

      word_count_tokens<lex::lexertl::lexer<lex::lexertl::token<boost::spirit::istream_iterator>>> word_count_functor;
      // instead of:
      // word_count_tokens<lex::lexertl::lexer<>> word_count_functor;
      

      Obviously, it's customary to typedef lex::lexertl::token<boost::spirit::istream_iterator>

    2. You need to use min_token_id instead of token IDs 1,2,3. Also, make it an enum for ease of maintenance:

      enum token_ids {
          X = lex::min_token_id + 1,
          Y,
          Z,
      };
      
    3. You can no longer just use .size() on the default token value() since the iterator range is not RandomAccessRange anymore. Instead, employ boost::distance() which is specialized for iterator_range:

              ++w; c += boost::distance(t.value()); // t.value ().size ();
      

    Combining these fixes: Live On Coliru

    #include <boost/spirit/include/lex_lexertl.hpp>
    #include <boost/spirit/include/support_istream_iterator.hpp>
    #include <boost/bind.hpp>
    #include <fstream>
    
    namespace spirit = boost::spirit;
    namespace lex    = spirit::lex;
    
    enum token_ids {
        X = lex::min_token_id + 1,
        Y,
        Z,
    };
    
    template<typename L>
    class word_count_tokens : public lex::lexer<L>
    {
        public:
            word_count_tokens () {
                this->self.add
                    ("[^ \t\n]+", X)
                    ("\n"       , Y)
                    ("."        , Z);
            }
    };
    
    struct counter
    {
        typedef bool result_type;
    
        template<typename T>
        bool operator () (const T &t, size_t &c, size_t &w, size_t &l) const {
            switch (t.id ()) {
                case X:
                    ++w; c += boost::distance(t.value()); // t.value ().size ();
                    break;
                case Y:
                    ++l; ++c;
                    break;
                case Z:
                    ++c;
                    break;
            }
    
            return true;
        }
    };
    
    int main (int argc, char **argv)
    {
        std::ifstream ifs (argv[1], std::ios::in | std::ios::binary);
        ifs >> std::noskipws;
        boost::spirit::istream_iterator first(ifs), last;
        word_count_tokens<lex::lexertl::lexer<lex::lexertl::token<boost::spirit::istream_iterator>>> word_count_functor;
    
        size_t w = 0, c = 0, l = 0;
        bool r = lex::tokenize (first, last, word_count_functor, 
                boost::bind (counter (), _1, boost::ref (c), boost::ref (w), boost::ref (l)));
    
        ifs.close ();
    
        if (r) {
            std::cout << l << ", " << w << ", " << c << std::endl;
        }
    }
    

    When run on itself, prints

    65, 183, 1665