I am trying to do strstr
in C++ with Shift-JIS strings. But since the accepted answer here states that there could be false positives if the standard strstr
is used, I couldn't just use the regular one in the standard library. Apparently Windows provides _mbsstr
that does what I want, but I am targeting other platforms as well.
I tried to use gnulib as it also provides mbsstr
but I couldn't get it to work as it requires autotools, and I am using cmake.
Is there anything else that achieves it?
You are correct: unlike UTF-8, the Shift-JIS encoding causes false positives for strstr
and strchr
for single byte characters.
Here is a simplistic custom function for C:
#include <string.h>
char *sjis_strstr(const char *s1, const char *s2) {
unsigned char c1, c2 = *s2++;
if (c2 == '\0')
return s1;
size_t len2 = strlen(s2);
while ((c1 = *s1++) != '\0') {
if (c1 == c2 && !strncmp(s1, s2, len2))
return (char *)(s1 - 1);
if (*s1 == '\0')
break;
s1 += (c1 >= 0x81 && c1 <= 0x9F) || (c1 >= 0xE0 && c1 <= 0xFC);
}
return NULL;
}