Search code examples
c++stringboosttokenize

Boost::Split using whole string as delimiter


I would like to know if there is a method using boost::split to split a string using whole strings as a delimiter. For example:

str = "xxaxxxxabcxxxxbxxxcxxx"

is there a method to split this string using "abc" as a a delimiter? Therefore returning:

Results would be the string "xxaxxxx" and "xxxxbxxxcxxx".

I am aware of boost::split using the "is_any_of" predicate, however invoking is_any_of("abc") would result in splitting the string at the single character 'a', 'b', and 'c' as well, which is not what I want.


Solution

  • split_regex as suggested by @Mythli is fine. If you don't want to deal with regex, you can use ifind_all algo, as is shown in this [example][1].

    Usage - Split

    Split

    Split algorithms are an extension to the find iterator for one common usage scenario. These algorithms use a find iterator and store all matches into the provided container. This container must be able to hold copies (e.g. std::string) or references (e.g. iterator_range) of the extracted substrings.

    Two algorithms are provided. find_all() finds all copies of a string in the input. split() splits the input into parts.

        string str1("hello abc-*-ABC-*-aBc goodbye");
    
        typedef vector< iterator_range<string::iterator> > find_vector_type;
        
        find_vector_type FindVec; // #1: Search for separators
        ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] }
    
        typedef vector< string > split_vector_type;
        
        split_vector_type SplitVec; // #2: Search for tokens
        split( SplitVec, str1, is_any_of("-*"), token_compress_on ); // SplitVec == { "hello abc","ABC","aBc goodbye" }
    

    [hello] designates an iterator_range delimiting this substring.

    First example show how to construct a container to hold references to all extracted substrings. Algorithm ifind_all() puts into FindVec references to all substrings that are in case-insensitive manner equal to "abc".

    You receive iterator_range (begin/end) of all occurrences of you delimiter. Your tokens are between them (and at the beginning and end of string).