Search code examples
c++boostsplitc++14delimiter

boost::split to split only if delimiter is present in c++


I am using boost::split but if the string has no delimiter then it returns the string in the vector. I want it to not return anything if delimiter is not there in the string.

#include <iostream>
#include <string>
#include <bits/stdc++.h>
#include <boost/algorithm/string.hpp>
using namespace std;
  
int main()
{
    string input("abcd");
    vector<string> result;
    boost::split(result, input, boost::is_any_of("\t"));
  
    for (int i = 0; i < result.size(); i++)
        cout << result[i] << endl;
    return 0;
}

Output is abcd. I want the vector to be empty if delimiter is not present in the string. please suggest.


Solution

  • It looks a bit like you might need a validating parser. Regex could be a good starting stone, but I'd suggest a parser generator, as in all likelihood you require more

    My crystal ball whispers that you might be parsing command line output or CSV/TSV files

    This is what you could do with Boost Spirit X3:

    template <typename Cont>
    bool parse_columns(std::string_view input, Cont& container,
                       unsigned required = 2) {
        namespace x3 = boost::spirit::x3;
    
        auto valid = [required](auto& ctx) {
            x3::_pass(ctx) = x3::_val(ctx).size() >= required;
        };
    
        auto delim = x3::char_('\t');
        auto field = *(~delim);
        auto rule
            = x3::rule<struct _, Cont, true>{"rule"} 
            = (field % delim)[valid];
    
        return parse(begin(input), end(input), rule, container);
    }
    

    Here's a live demo with test-cases:

    Live On Compiler Explorer

    #include <boost/spirit/home/x3.hpp>
    #include <fmt/ranges.h>
    
    template <typename Cont>
    bool parse_columns(std::string_view input, Cont& container,
                       unsigned required = 2) {
        namespace x3 = boost::spirit::x3;
    
        auto valid = [required](auto& ctx) {
            x3::_pass(ctx) = x3::_val(ctx).size() >= required;
        };
    
        auto delim = x3::char_('\t');
        auto field = *(~delim);
        auto rule
            = x3::rule<struct _, Cont, true>{"rule"} 
            = (field % delim)[valid];
    
        return parse(begin(input), end(input), rule, container);
    }
    
    int main() {
        for (auto input : {
                 "",
                 "\t",
                 "abcd\t",
                 "ab cd\tef",
                 "\tef",
                 "ab\tc\t\tdef",
                 "abcd",
             }) {
            std::vector<std::string> columns;
    
            if (parse_columns(input, columns)) {
                fmt::print("'{}' -> {}\n", input, columns);
            } else {
                fmt::print("'{}' -> not matched\n", input);
            }
        }
    }
    

    Prints

    '' -> not matched
    '   ' -> {"", ""}
    'abcd   ' -> {"abcd", ""}
    'ab cd  ef' -> {"ab cd", "ef"}
    '   ef' -> {"", "ef"}
    'ab c       def' -> {"ab", "c", "", "def"}
    'abcd' -> not matched
    

    Tweaks

    • To treat repeated \t as a single delimiter, just change field % delim to field % +delim
    • You can easily replace with another container, like std::set

    Live On Compiler Explorer