Search code examples
c++regexboost-regex

Is symbol ’ special one for boost regexp?


Regular expression: “[^”]*“

String: “lips“

Result: match

String: “lips’“

Result: not match

I expect both strings to match.

C++ code:

#include <iostream>
#include <string>
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

int main()
{
    const string s1 = "“lips“";
    const string s2 = "“lips’“";
    if (regex_search(s1, regex("“[^”]*“"))) cout << "s1 matched" << endl;
    if (regex_search(s2, regex("“[^”]*“"))) cout << "s2 matched" << endl;
    return 0;
}

output: s1 matched

Is the symbol special ? Why is the second string not matching?


Solution

  • boost regex library does not use utf-8 by default. utf-8 quote symbol and apostrophe have common byte, that`s why regex does not work. Code for utf-8:

    #include <iostream>
    #include <string>
    #include <boost/regex.hpp>
    #include <boost/regex/icu.hpp>
    
    using namespace std;
    using namespace boost;
    
    int main()
    {
        const string s1 = "“lips“";
        const string s2 = "“lips’“";
        if (u32regex_search(s1, make_u32regex("“[^”]*“"))) cout << "s1 matched" << endl;
        if (u32regex_search(s2, make_u32regex("“[^”]*“"))) cout << "s2 matched" << endl;
        return 0;
    }
    

    compilation: g++ -std=c++11 ./test.cc -licuuc -lboost_regex

    output:

    s1 matched
    s2 matched