I am trying to use regular expressions in my C code to find a string in each line of a text file that I am reading and \b
boundary seems like it does not work. That string can not be a part of a bigger string.
After that failure I also tried some hand-written boundary expression in the following and could not make it work in my code as well (source here):
(?i)(?<=^|[^a-z])MYWORDHERE(?=$|[^a-z])
But when I try something simple like a
as the regular expression, it finds what is expected.
Here is my shortened snippet:
#include <regex.h>
void readFromFile(char arr[], char * wordToSearch) {
regex_t regex;
int regexi;
char regexStr [100];
strcpy(regexStr, "\\b(");
strcat(regexStr, wordToSearch);
strcat(regexStr, ")\\b");
regexi = regcomp(®ex, regexStr, 0);
printf("regexi while compiling: %d\n", regexi);
if (regexi) {
fprintf(stderr, "compile error\n");
}
FILE* file = fopen(arr, "r");
char line[256];
while (fgets(line, sizeof(line), file)) {
regexi = regexec(®ex, line, 0, NULL, 0);
printf("%s\n", line);
printf("regexi while execing: %d\n", regexi);
if (!regexi) {
printf("there is a match.");
}
}
fclose(file);
}
In the regcomp
function, I also tried to pass the REG_EXTENDED
as the flag and it also did not work.
The regular expressions supported by POSIX are documented in the Linux regex(7) manual page and re_format(7) for MacOS X.
Unfortunately the POSIX standard regular expressions (which come in 2 standard flavours: obsolete basic, and the REG_EXTENDED
) support neither \b
nor any of the (?...)
formats, both of which I believe originated in Perl.
Mac OS X (and possibly other BSD derived systems) additionally has the REG_ENHANCED
format, which is not portable.
Your best choice would be to use some other regular expression library such as PCRE. While the word boundaries themselves are a regular language, the use of capturing groups make this harder, as POSIX doesn't even support non-capturing grouping, otherwise you could use something like (^|[^[:alpha:])(.*)($|[^[:alpha:]]*)
but it surely would get really messy.