Search code examples
c++regexc++98tr1

Remove/replace Multicharacters (ÿû) in a C++ String


I am trying to replace the multicharacters in a string using std::tr1::regex as I am not able to find any function that can help to replace them. The code is as below:

// Example program
#include <iostream>
#include <string>
#include <tr1/regex>

void f1()
{
 std::string str = "ÿûABC";
 std::tr1::regex rx("ÿû");
 std::string replacement = "";
 std::tr1::regex_replace(str,rx,replacement);
}

int main()
{
  f1();
  return 0;
}

But I am receiving the below compilation error. Could anyone please suggest if there is any to resolve it or any better option to replace them using C++98?

In file included from 4:0:
/usr/include/c++/4.9/tr1/regex:2407:5: warning: inline function '_Out_iter std::tr1::regex_replace(_Out_iter, _Bi_iter, _Bi_iter, const std::tr1::basic_regex&, const std::basic_string&, std::tr1::regex_constants::match_flag_type) [with _Out_iter = std::back_insert_iterator >; _Bi_iter = __gnu_cxx::__normal_iterator >; _Rx_traits = std::tr1::regex_traits; _Ch_type = char; std::tr1::regex_constants::match_flag_type = std::bitset]' used but never defined
     regex_replace(_Out_iter __out, _Bi_iter __first, _Bi_iter __last,
     ^
/tmp/ccGJXgKd.o: In function `f1()':
:(.text+0x81): undefined reference to `std::tr1::basic_regex >::_M_compile()'
:(.text+0xc5): undefined reference to `std::back_insert_iterator std::tr1::regex_replace, __gnu_cxx::__normal_iterator, std::tr1::regex_traits, char>(std::back_insert_iterator, __gnu_cxx::__normal_iterator, __gnu_cxx::__normal_iterator, std::tr1::basic_regex > const&, std::basic_string, std::allocator > const&, std::bitset)'
collect2: error: ld returned 1 exit status

Solution

  • To erase a sub-string from another string, you should use the erase function.

    Example:

    #include<iostream>
    #include<string>
    
    int main()
    {
        std::string str = "ÿûABC";
        std::string remove = "ÿû";
    
        std::cout << "Length of source string is " << str.length() << " characters\n";
        std::cout << "Length of string to remove is " << remove.length() << " characters\n";
    
        size_t pos = str.find(remove);
        if (pos == std::string::npos)
        {
            std::cout << "Substring \"ÿû\" not found\n";
        }
        else
        {
            std::cout << "Found sub-string \"" << remove << "\" at position " << pos << '\n';
            str.erase(pos, remove.length());
            std::cout << "After erasing: \"" << str << "\"\n";
        }
    }
    

    Output from working example:

    Length of source string is 7 characters
    Length of string to remove is 4 characters
    Found sub-string "ÿû" at position 0
    After erasing: "ABC"
    

    The important parts to note here is that the characters 'ÿ' and 'û' are not single bytes! Your editor probably saved them as two-bytes each, encoded with UTF-8.

    By putting the sub-string to remove in its own std::string object, we can easily get the actual length of them for the erase call.