Search code examples
jsoupcombinators

Order of evaluation of Combinators in jsoup select


For example, if I write

* > a, b, c > d

Will the output tags be union of

* > a
* > b
* > c > d

OR union of

* > a > d
* > b > d
* > c > d

How do I get the required one?

Basically I could not locate the document containing the required details.


Solution

  • The first one, but not really. More precise, the output tags will be union of:

    * > a
    b
    c > d
    

    I prepared very simple HTML and tested it here: https://try.jsoup.org/~D9QmujE_m7rv9G3MahldfKJurfk

    That's consistent with what I got while debugging Jsoup's code. Your query was parsed by org.jsoup.select.QueryParser into 3 evaluators:

    • a :ImmediateParent*
    • b
    • d :ImmediateParentc

    and the result will contain elements matching any of these three so it seems here like commas have the highest priority and work as an OR.

    If you really want to know the internals you can take a look at this comment in Jsoup's source which states that most combinators are AND but comma is OR. Don't worry if you don't understand it all (neither do I) but it's more or less clear.

    That's also consistent with results I got when opening a html file with the code I used on try.jsoup.org and running in my browser's console the following code:

    document.querySelectorAll('* > a, b, c > d');
    

    Anyway, if you want to achieve

    * > a > d
    * > b > d
    * > c > d
    

    just use selector: * > a > d, * > b > d, * > c > d or shorter a > d, b > d, c > d