Search code examples
google-sheetsweb-scrapingxpathgoogle-sheets-formula

How to use AND operator in Google Sheets IMPORTXML XPath?


I'm trying to write an XPath query for Google Sheets to gather links from a specific type of page.
I thought I could use an "AND" operator (pipe character), but can't quite figure out how to do it.
Here's what I've got so far, but it's wrong.

=IMPORTXML(B2,"//a[not(starts-with(@href, '/'))]/@href | //a[not(contains(@href, 'example.com'))]/@href")

The idea is that I want to gather all links except for ones that contain example.com and ones that begin with a forward slash.

The absolutely surprising thing is that it will still extract all links from a page just completely ignoring my instructions.

Any help would be greatly appreciated.


Solution

  • You are mistaken. The | operator does not mean "and" in XPath. Its meaning is "merge nodesets". So you were merging the results of the first expression with the results of the second.
    To realize what you want, try this approach:

    =IMPORTXML(B2,"//a[not(starts-with(@href, '/') or contains(@href, 'example.com'))]/@href")