Search code examples
xpath

XPath: exclude child element


I scrape content using XPath. What i have to scrape is a content in a div with given id. Inside of this div there is another div, which content i don't want to scrape.

I use following XPath:

//[@id='Main'][not(contains(div/@id, 'orderform'))]

But with this XPath i get extraction not from all urls, like i expect, but from those urls, which do have div id="Main" and don't have div id="orderform" inside it.

What XPath should i use instead to scrape the whole div id="Main", but excluding the content of div id="orderform"?


Solution

  • Selects all elements within the element id="Main" at any level:

    //*[@id='Main']//*[not(div[@id="orderform"])]
    

    the div tag is a child tag, you should go down a level.

    Selects only the immediate children of the element id="Main" as a direct child.

    //*[@id='Main']/*[not(div[@id="orderform"])]