Search code examples
cxpathstrtokstrsep

String tokenise an xpath expression


I have below program to tokenise an Xpath expression. But it cannot handle expressions like this one:

/employees/employee[secret-code=a/b/c][unicode=d/e/f]/salary

Basically tokenizing by '/' breaks since the predicates themselves contain '/'.

const char *g_xpath_node_delim = "/";

static void tokenize_xpath (char *xpath_str)
{
    const char *tok = NULL;

    if (!xpath_str)
        return;

    while ((tok = strsep(&xpath_str, g_xpath_node_delim)) != NULL) {
        
        if (tok[0] == '\0')
            continue;
        fprintf(stdout, "\nToken '%s'\n", tok);
    }
}

I want to construct a structure of nodes along with their predicates. Any hints?


Solution

  • XPath has a fairly complex grammar (I won't go into any computer science for classes of grammar, because I'm hazy on the subject myself) but it's a recursively-defined grammar which means you can't do proper analysis of expression structure with a single-level regex-based tokenizer. You'll need a real XPath parser to build a syntax tree. Depending on what you want to achieve and how much effort you want to put in, you can either try to take advantage of an existing open-source XPath parser, or write your own.