Search code examples
c++regexstlposix

std::regex doesn't recognize $


I try to parse the content of the file with a regex:

ifstream file_stream("commented.cpp",ifstream::binary);

std::string txt((std::istreambuf_iterator<char>(file_stream)),
std::istreambuf_iterator<char>());

cmatch m;
bool result = regex_search(txt.c_str(), m, regex("^#(\S*)$",regex_constants::basic));

The file is a c source, and it begins with the line:

#include <stdio.h>

I'm trying to parse a directive, i checked the regexp in regexbuddy and it works 100%, but in std::regex regex_search returns false. It seems that $ character is not gettinc recognized and also ^ for the posix syntax. I have tried to use ECMAScript, and the regex works, only if i remove $ symbol.

//ecmascript syntax
bool result = regex_search(txt.c_str(), m, regex("^#(\S*)"));

The file is read using binary flag, so the txt string, keeps all \r\n characters which are required for $ syntax. I look for help, how to resolve this issue.


Solution

  • Note that the $ anchor in most cases works only as an end-of-string (whole input) anchor. See this thread. You may make $ match end of a line position by using a custom boundary pattern based on a positive lookahead, (?=$|\r?\n).

    Another issue is that you are using \S escape sequence in a regular string literal. That means, it is treated as an S letter, not as a non-whitespace pattern. Use a raw string literal so that you could use a single \ to define a regex escape sequence (where \ escaping d, s, etc. should be literal backslashes). Or double escape \ in regular string literals.

    Also, @HWalters already noted that the ^#\S+$ will not match #include <stdio.h>, you need to account for a space inside. Thus, you regex might look like ^#include[ \t]+(\S+)(?=$|\r?\n), to make sure you have #include, then some horizontal spaces, and then capture any number (1 or more here, with +) of non-whitespace chars up to the end of string or a line break (CRLF or LF).

    And here is a snippet:

    regex r(R"(^#include[ \t]+(\S+)(?=$|\r?\n))");
    string s("#include <stdio.h>\r\n#include <regex>");
    smatch m;
    if (regex_search(s, m, r)) {
        std::cout << m[1] << std::endl;
    }