Search code examples
phpxpathsimplexmldomdocument

Does PHP compile XPath


Can anyone tell me if and when PHP compiles XPath expressions? It would be useful to know for both simpleXML and DOMDocument classes. I also want to know where the compiled XPath is stored.


Solution

  • PHP's XML functionality is all built on top of the libxml2 library, so part of the answer will depend on how that library works, and part on how exactly PHP uses it.

    Starting with SimpleXML, we can find the implementation of SimpleXMLElement->xpath() in ext/simplexml/simplexml.c. Skipping over some internal housekeeping, checking parameter types, etc, the first interesting line we find is this:

    if (!sxe->xpath) {
        sxe->xpath = xmlXPathNewContext((xmlDocPtr) sxe->document->ptr);
    }
    

    So repeated XPath expressions on the same SimpleXMLElement will use the same "XPath context", but it won't be shared with other instances. Further down, we find where this is used:

    retval = xmlXPathEval((xmlChar *)query, sxe->xpath);
    

    So PHP is calling a libxml function xmlXPathEval which just takes a string and a context, and evaluates it immediately. The libxml manual for xmlXPathEval` says:

    Evaluate the XPath Location Path in the given context.

    Returns: the xmlXPathObjectPtr resulting from the evaluation or NULL. the caller has to free the object.

    And indeed, PHP frees that result at the end of the method:

    xmlXPathFreeObject(retval);
    

    So, in SimpleXML at least, there is no separate compilation step, and nothing is stored between calls to the method.

    The DOM version is a bit more complicated, because it has a user-visible object representing the XPath context, which is defined in /ext/dom/xpath.c. Firstly, the constructor sets up the context, as you might expect:

    PHP_METHOD(domxpath, __construct)
    {
        # ...
        ctx = xmlXPathNewContext(docp);
    

    It then does some magic to reuse this context where possible, but we haven't compiled any XPath yet, this is just the context, i.e. the "current node" to compare expressions against. The definitions for both ->eval() or ->query() use the same C implementation, php_xpath_eval. This checks some internal state is correct, and then calls:

    xpathobjp = xmlXPathEvalExpression((xmlChar *) expr, ctxp);
    

    That's a different function, so we can look it up in libxml:

    Function: xmlXPathEvalExpression Alias for xmlXPathEval().

    So, it turns out there's no difference after all. Again, it's passed a string, returns a result, and the PHP function frees that result before returning:

    xmlXPathFreeObject(xpathobjp);
    

    So, as before: no explicit compilation, and the only thing stored between calls is the "context" for XPath expressions to run against.

    It turns out libxml does support some kind of cache, if enabled via xmlXPathContextSetCache:

    Creates/frees an object cache on the XPath context. If activates XPath objects (xmlXPathObject) will be cached internally to be reused. @options: 0: This will set the XPath object caching: @value: This will set the maximum number of XPath objects to be cached per slot There are 5 slots for: node-set, string, number, boolean, and misc objects. Use <0 for the default number (100). Other values for @options have currently no effect.

    In the case of SimpleXML, this cache would not be useful anyway, as the context is discarded after use; for DOM, it would be more relevant, as the context - and therefore the cache - would live as long as the PHP DOMXPath object.

    We can dig into the implementation of xmlXpathNewContext to see if this cache is enabled or disabled by default:

    #ifdef XP_DEFAULT_CACHE_ON
        if (xmlXPathContextSetCache(ret, 1, -1, 0) == -1) {
        xmlXPathFreeContext(ret);
        return(NULL);
        }
    #endif
    

    So it turns out that this is a compile-time option - if the libxml compiled or loaded into your PHP had this flag set when it was compiled, I believe you will, in the case of DOM XPath, have some degree of caching, within a single DOMXPath` instance.