Search code examples
perlhtml-parsinglibxml2

Compilation of pattern failed at LibXML.pm


I'm reading a large HTML file using XML::LibXML. When constructing a pattern like this:

XML::LibXML::Pattern->new('//span[@class="entry"]');

It gives the following error:

Compilation of pattern failed at /Users/chris/perl5/perlbrew/perls/perl-5.31.3/lib/site_perl/5.31.3/darwin-2level/XML/LibXML.pm line 2138.

But this one is working fine:

XML::LibXML::Pattern->new('//span');

I'm not sure if the [@class="entry"] part is not supported by XML::LibXML::Pattern or I'm just doing it wrong.

Any information is appreciated. Thanks in advance.


Solution

  • While a valid XPath, that isn't a valid Pattern.

    Patterns are a small subset of XPath language, which is limited to (disjunctions of) location paths involving the child and descendant axes in abbreviated form as described by the extended BNF given below:

    Selector ::=     Path ( '|' Path )*
    Path     ::=     ('.//' | '//' | '/' )? Step ( '/' Step )*
    Step     ::=     '.' | NameTest
    NameTest ::=     QName | '*' | NCName ':' '*'
    

    For readability, whitespace may be used in selector XPath expressions even though not explicitly allowed by the grammar: whitespace may be freely added within patterns before or after any token, where

    token     ::=     '.' | '/' | '//' | '|' | NameTest
    

    Note that no predicates or attribute tests are allowed.

    (Emphasis mine.)