Search code examples
phpsymfonyxpathsymfony-components

XPath element selection containing a text with accents or characters


I want to select an element/div tag by using XPath (Symfony Dom Crawler):

$element->filterXPath('//span[text() = "SOMEtext"]')->text();

It works fine if there are no special characters in the string. It won't work if a string contains the accents or characters such as: Prénom, expérience, à toi, etc.

$element->filterXPath('//span[text() = "Référence"]')->text(); gives me an error.

Is there a way to filter out the non-english text ?

I tried many combinations to convert the text into the unicode string, but it always fails.

Référence
Référence
R\u00E9f\u00E9rence
R\u{00E9}f\u{00E9}rence
R\00E9 f\00E9 rence
R%C3%A9f%C3%A9rence
RU+00E9fU+00E9rence
R0xE9f0xE9rence

Solution

  • You didn't specify which XPath implementation you're using, and because filterXpath is non-standard in PHP, the first thing I'd check is encoding. Is the encoding in which your PHP script is saved the same encoding that is expected by the object?

    The second thing I'd try is to use the standard XPath implementation of DOMDocument, but there are other implementations as well.

    $oDom = (new DOMImplementation())->createDocument(NULL, '');
    // import your DOM here
    $XPath = new DOMXPath($oDom);
    $XPath->query('//span[text() = "Référence"')->item(0);