I recently began learning XPath for a Python project, but I can't seem to get the following line selecting the correct piece of data.
//table[@id="yfncsumtab"]//tr/td/a[@rel="first"]
Said data is found on this page:http://finance.yahoo.com/q/hp?s=QQQX+Historical+Prices
(Inspect Element the "Next" link to get to the code I'm attempting to create an XPath to. In other words, Command/Control F on that page, and Inspect Element the first result)
I've tried many variations of that code, but none seem to select the proper text. I appreciate any and all help - thanks in advance!
'//a[text()="Next"]'
or:
'//table[@id = "yfncsumtab"]//a[text()="Next"]'
or, to get just the first one:
'//table[@id = "yfncsumtab"]//table[1]/tr/td/a[text()="Next"]'
or:
'//table[@id="yfncsumtab"]/tr[2]/td[1]/table[1]/tr/td/a[1]'
The more specific you are, the faster it is to find the element. However, the more specific you are, the more brittle the xpath is: if the developers make a small change in the html structure surrounding the target element, your code won't work.
from lxml import html
doc = html.parse("http://finance.yahoo.com/q/hp?s=QQQX+Historical+Prices")
my_xpath = '//a[text()="Next"]'
for element in doc.xpath(my_xpath):
print("<{}>".format(element.tag))
print(" text = {}".format(element.text))
for attr, val in element.items():
print(" {} = {}".format(attr, val))
--output:--
<a>
text = Next
rel = next
href = /q/hp?s=QQQX&d=11&e=28&f=2014&g=d&a=1&b=1&c=2007&z=66&y=66
<a>
text = Next
rel = next
href = /q/hp?s=QQQX&d=11&e=28&f=2014&g=d&a=1&b=1&c=2007&z=66&y=66