Search code examples
xmlgoogle-sheetsweb-scrapingxpathgoogle-sheets-formula

XPath query does not match HTML source


Using XpathBuilder I can construct a simple search engine query and pull data out of the search results using XPath. I have some simple examples in a Google spreadsheet here, which runs the query "XPath tutorial" on various search engines and attempts to pull out the number of results each search engine returns.

The formula in that Google spreadsheet is as follows:

=importxml("http://www.google.com/search?q="xpath+tutorial"&num=30&pws=0", 
           "//div[@id='resultStats']")
=importxml("http://www.bing.com/search?q=xpath+tutorial&count=30", 
           "//span[@class='sb_count']")
=importxml("http://search.yahoo.com/search?p=xpath+tutorial&n=30", 
           "//span[@id='resultCount']")

There are some oddities about this that I don't understand. Firstly, the Google search does not return any results, but the XPath query looks OK. Indeed, there are a number of online tutorials which recommend exactly what I have done here.

The Yahoo query returns the correct result, it's the only one that does.

The number of results found by the Bing Xpath query do not match the results given on the Bing web page, even though there is only one XML node which matches the XPath query. More details are on the spreadsheet here

Where did it all go so wrong?


Solution

  • Try this....

    =importxml("http://www.google.com/search?q='xpath+tutorial&num=30&pws=0'", "//div[@id='resultStats']")