I am scraping in Python using Selenium and Firefox.
I am able to get my href's into an object using the following:
HREF = node.find_elements_by_xpath(".//a") # Get the href's under the current node
Which returns a bunch of <a>
tags that look like this:
<a href="http://example.com" class="" title="The Link" data-ipshover="" data-ipshover-target="http://example.com/?preview=1" data-ipshover-timeout="1.5" id="ips_uid_1234_9">
<span>The Link</span>
There are multiple links returned, but if I just focus on the first one:
print dir(HREF[0])
print "#########"
print HREF[0].text
print HREF[0].id
print HREF[0].get_attribute("title")
print HREF[0].get_attribute("href")
print HREF[0].get_attribute("data-ipshover-timeout")
print HREF[0].get_attribute("id")
print "#########"
Outputs this:
['__class__', '__delattr__', '__dict__', '__doc__', '__eq__',
'__format__', '__getattribute__', '__hash__', '__init__', '__module__',
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', '_execute', '_id', '_parent', '_upload', '_w3c',
'anonymous_children', 'clear', 'click',
'find_anonymous_element_by_attribute', 'find_element',
'find_element_by_class_name', 'find_element_by_css_selector',
'find_element_by_id', 'find_element_by_link_text',
'find_element_by_name', 'find_element_by_partial_link_text',
'find_element_by_tag_name', 'find_element_by_xpath', 'find_elements',
'find_elements_by_class_name', 'find_elements_by_css_selector',
'find_elements_by_id', 'find_elements_by_link_text',
'find_elements_by_name', 'find_elements_by_partial_link_text',
'find_elements_by_tag_name', 'find_elements_by_xpath', 'get_attribute',
'get_property', 'id', 'is_displayed', 'is_enabled', 'is_selected',
'location', 'location_once_scrolled_into_view', 'parent', 'rect',
'screenshot', 'screenshot_as_base64', 'screenshot_as_png', 'send_keys',
'size', 'submit', 'tag_name', 'text', 'value_of_css_property']
The Link
The Link
Note that last attribute print is blank, when is should return ips_uid_1234_9
. Every other attribute returns fine, so I'm not sure why "id" won't return correctly.
I'm a knucklehead. Next time I need to use the same browser for scraping and viewing source code... Attribute doesn't load in Firefox, but it does load in Chrome.