Search code examples
c++boost-spirit-qi

Skipping blank lines when reading line delimited list of strings


I'm trying to parse a simple text file using boost::spirit. The text file is a line delimited list of strings. I can get it to mostly work, except for when it comes to blank lines, which I would like to skip.

I've tried several approaches, but I either stop parsing at the blank line, or I get the blank line included in my results.

Is there a way to tell my grammar to skip blank lines?

code

std::ifstream ifs("one.txt");
ifs >> std::noskipws;

std::vector< std::string > people;

if (parse(
     istream_iterator(ifs),
     istream_iterator(),
     *(as_string[+print >> (eol | eoi)]),
     people))
{
  std::cout << "Size = " << people.size() << std::endl;

  for (auto person : people)
  {
     std::cout << person << std::endl;
  }
}

one.txt

Sally
Joe
Frank
Mary Ann

Bob

What I Get

Sally
Joe
Frank
Mary Ann

What I Want to Get

Sally
Joe
Frank
Mary Ann
Bob

Bonus: Can I strip leading or trailing spaces from the lines in the grammar at the same time? I need to keep the space in Mary Ann of course.


Solution

  • if (qi::phrase_parse(
                first, last,
                -qi::as_string[qi::lexeme[+(qi::char_ - qi::eol)]] % qi::eol,
                qi::blank,
                people))
    

    I'll refer to Boost spirit skipper issues for more background. Quick notes:

    if (qi::phrase_parse(
    //      ^ ----- use a skipper to parse phrases whith a skipper (`qi::blank` here)
                first, last,
                -qi::as_string[qi::lexeme[+(qi::char_ - qi::eol)]] % qi::eol,
    //          |                  |      |                          ^---- 1.
    //          +---- 2.           |      +---- 4.
    //       5. ----v       3. ----+      
                qi::blank,
                people))
    
    1. match list of items separated by newlines
    2. '-' makes the item optional (ignoring blank lines)
    3. lexeme includes whitespace inside the subexpression (but it does still pre-skip, so lines with only whitespace count as empty lines; use no_skip if you don't want preskip to happen)
    4. + requires at least 1 match, so empty names are not considered a name
    5. the blank skipper skips whitespaces, but not newlines; this is because newlines are significant to your grammar. Also note that the lexeme still keeps the internal whitespace

    See it Live On Coliru

    UPDATE In response to the comment, the added complexity was due to skipping whitespace. If you are happy trimming whitespace after the fact, by all means, use

    if (parse(first, last, - as_string[+(char_ - eol)] % eol, people))
    

    See it Live On Coliru as well