I've been working on this for hours, but just can't seem to put all the parts together... So given:
<a href="link1">link</a>
<span class="class_name">00A<span>
...
<a href="link2">link</a>
<span class="class_name">00B<span>
...
<a href="link3">link</a>
<span class="class_name">01B<span>
...
<a href="link4">link</a>
<span class="class_name">01A<span>
I'm trying to get the link depending on the inner text of span. So I know... I can get all the links with:
links = [my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[contains(@class, 'class_name')]//preceding-sibling::a[@href]")))]
I can get the text on a single span with:
print(driver.find_element(By.XPATH, "//span[contains(@class, 'class_name')]").text)
But I cant use find elements to get all of their text to test since it's asking for text of a list. I should be able to use:
[contains(text(), '\\d+[A]')]")
But I don't know how to combine it with the code for all the links. I feel like I'm overlooking something really stupid but it's 6:30am and I started working on this project in the evening yesterday, so I give up and just going to ask someone more intelligent. Thank you in advance for any help.
Note that the second parameter of the contains()
function is not a regular expression; it's a plain string which is to be sought within the first string parameter. I believe with Selenium you are stuck with XPath 1.0 which does not have any regular expression functions.
Without using a regular expression, if you wanted to filter a set of span
elements to include only those whose text content consisted of a string of digits followed by a single A
, you would need to use a more complicated expression which combines a bunch of string functions, e.g. something like:
span[
contains(., 'A') and
contains('0123456789', substring(., 1, 1)) and
translate(substring-before(., 'A'), '0123456789', '') = '' and
substring-after(., 'A') = ''
]
NB the .
is a reference to the "context node" which in the predicate expression means one of the span
elements.
This expression means:
span
elements
A
character somewhere; andA
consists entirely of digits; andA
(i.e. there's just one A
, at the end)BTW, I'm not sure this expression does what you think it does:
//span[contains(@class, 'class_name')]//preceding-sibling::a[@href]
To clarify: the //
in XPath is an abbreviation for the expression /descendant-or-self::node()/
. So your expression could be written as:
//span[contains(@class, 'class_name')]
/descendant-or-self::node()/preceding-sibling::a[@href]
This will return every a
element (with an href
attribute) which is followed by a sibling element which is either:
span
element with a class
attribute of 'class_name'
; orspan
element with a class
attribute of 'class_name'
.If you know that the span
and a
are actually siblings then you can replace that //
with the simpler /
(and in my suggestion below).
The other thing to note here is that unless each pair of span
(or span
descendant) and a
are contained with a parent element, then the preceding-sibling::a[@href]
step will return all the a
elements that precede the span
, not just the first such span
(which is I suspect what you want to do, in that I take it that it's the immediately preceding span
that provides a label for the link. You can apply the predicate [1]
to the set of a[@href]
elements to get just the first (in preceding-sibling
order).
So to combine these ideas, here's my suggestion:
//span
[
contains(@class, 'class_name') and
contains(., 'A') and
contains('0123456789', substring(., 1, 1)) and
translate(substring-before(., 'A'), '0123456789', '') = '' and
substring-after(., 'A') = ''
]
//preceding-sibling::a[@href][1]
Applied to this input:
<body>
<a href="link1">link</a>
<span class="class_name">00A</span>
...
<a href="link2">link</a>
<span class="class_name">00B</span>
...
<a href="link3">link</a>
<span class="class_name">01B</span>
...
<a href="link4">link</a>
<span class="class_name">01A</span>
</body>
... it yields:
<a href="link1">link</a>
<a href="link4">link</a>