What would be the correct Xpath query to use to fetch data from Wikipedia into Google Sheets?
Here is an example I would like to test this with:
Wikipedia page: http://en.wikipedia.org/wiki/12_Angry_Men_(1957_film)
Data to pull: the "running time" value of "96 minutes" located in the right hand side table
Method: using the Google Sheets ImportXML function
I've tried the following, but it returns N/A:
=IMPORTXML("http://en.wikipedia.org/wiki/12_Angry_Men_(1957_film)", "//div[normalize-space() = 'Running time']/following-sibling::td")
Thanks!
There are a couple of problems with your XPath.
The following-sibling
axis won't work on the markup of that page because the td
following the 'Running time' div
is a sibling of its parent th
. Instead, use the following
axis with a node type selector: following::td
. However, that still returns all td
nodes after the selected div
, so additionally we need a predicate to select only the first node: [1]
.
Complete function with XPath:
=IMPORTXML("http://en.wikipedia.org/wiki/12_Angry_Men_%281957_film%29", "//div[normalize-space()='Running time']/following::td[1]")