Search code examples
cstringstrchr

First occurrence except escaped chars in C


How can I locate the first unescaped char in a str. In the following code, I get the first char at position 14, but I'm looking the one at position 26.

#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] = "FOO + pHWAx \\\"bar AER/2.1\" BAZ";
  printf ("%s\n",str);
  char * pch;
  pch=strchr(str,'"');
  printf ("found at %d\n",pch-str+1);
  return 0;
}

Solution

  • Use the strpbrk function to look for the first occurence of any one of several characters at once. You must not skip the escape character; you must check whether it is followed by the character that you're really looking for.

    I.e. suppose we want to look for " which can be escaped as \". Actually, this means we must look for either " or \. In other words:

    char *ptr = strpbrk(string, "\"\\"); /* look for chars in the set { ", \ } */
    

    But we have to do this in a loop, because we are not interested in escaped quotes and have to keep going:

    char *quote = 0;
    char *string = str; /* initially points to the str array */
    
    while (*string != 0) {
      char *ptr = strpbrk(string, "\"\\");
    

    Next we check whether we found something:

      if (!ptr)
        break;
    

    If we found something is is necessarily a \ or ":

      if (*ptr == '"') {
        quote = ptr;
        break;
      }
    

    If it is not a quote, then it must be an escape. We increment to the next character. If it is a terminating null it means we have a backslash at the end of a string: an improper escape.

      if (*++ptr == 0)
        break;
    

    Otherwise, we can skip the next character and continue the loop to scan for the next escape or unescaped quote.

      string = ++ptr;
    }
    

    If an unescaped quote occurs, then quote points to it after the execution of the while loop. Otherwise quote remains null.

    This code assumes that there exist other escapes besides \", but that they are all one character long, e.g. \b or \r. It will not work if there are longer escapes like \xff. Escapes constitute the conventions of a language: you have to know what the language is that you're processing to do it correctly.