Python Selenium - Get sibling link depending on inner text of span

I've been working on this for hours, but just can't seem to put all the parts together... So given:

<a href="link1">link</a>
<span class="class_name">00A<span>
...
<a href="link2">link</a>
<span class="class_name">00B<span>
...
<a href="link3">link</a>
<span class="class_name">01B<span>
...
<a href="link4">link</a>
<span class="class_name">01A<span>

I'm trying to get the link depending on the inner text of span. So I know... I can get all the links with:

links = [my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[contains(@class, 'class_name')]//preceding-sibling::a[@href]")))]

I can get the text on a single span with:

print(driver.find_element(By.XPATH, "//span[contains(@class, 'class_name')]").text)

But I cant use find elements to get all of their text to test since it's asking for text of a list. I should be able to use:

[contains(text(), '\\d+[A]')]")

But I don't know how to combine it with the code for all the links. I feel like I'm overlooking something really stupid but it's 6:30am and I started working on this project in the evening yesterday, so I give up and just going to ask someone more intelligent. Thank you in advance for any help.

Solution

Note that the second parameter of the contains() function is not a regular expression; it's a plain string which is to be sought within the first string parameter. I believe with Selenium you are stuck with XPath 1.0 which does not have any regular expression functions.

Without using a regular expression, if you wanted to filter a set of span elements to include only those whose text content consisted of a string of digits followed by a single A, you would need to use a more complicated expression which combines a bunch of string functions, e.g. something like:

span[
   contains(., 'A') and
   contains('0123456789', substring(., 1, 1)) and 
   translate(substring-before(., 'A'), '0123456789', '') = '' and
   substring-after(., 'A') = ''
]

NB the . is a reference to the "context node" which in the predicate expression means one of the span elements.

This expression means:

span elements

which contain an A character somewhere; and
whose first character is a digit; and
the text before the A consists entirely of digits; and
where there's no text after the A (i.e. there's just one A, at the end)

BTW, I'm not sure this expression does what you think it does:

//span[contains(@class, 'class_name')]//preceding-sibling::a[@href]

To clarify: the // in XPath is an abbreviation for the expression /descendant-or-self::node()/. So your expression could be written as:

//span[contains(@class, 'class_name')]
   /descendant-or-self::node()/preceding-sibling::a[@href]

This will return every a element (with an href attribute) which is followed by a sibling element which is either:

a span element with a class attribute of 'class_name'; or
a descendant of a span element with a class attribute of 'class_name'.

If you know that the span and a are actually siblings then you can replace that // with the simpler / (and in my suggestion below).

The other thing to note here is that unless each pair of span (or span descendant) and a are contained with a parent element, then the preceding-sibling::a[@href] step will return all the a elements that precede the span, not just the first such span (which is I suspect what you want to do, in that I take it that it's the immediately preceding span that provides a label for the link. You can apply the predicate [1] to the set of a[@href] elements to get just the first (in preceding-sibling order).

So to combine these ideas, here's my suggestion:

//span
   [
      contains(@class, 'class_name') and
      contains(., 'A') and
      contains('0123456789', substring(., 1, 1)) and 
      translate(substring-before(., 'A'), '0123456789', '') = '' and
      substring-after(., 'A') = ''
   ]
   //preceding-sibling::a[@href][1]

Applied to this input:

<body>
  
<a href="link1">link</a>
<span class="class_name">00A</span>
...
<a href="link2">link</a>
<span class="class_name">00B</span>
...
<a href="link3">link</a>
<span class="class_name">01B</span>
...
<a href="link4">link</a>
<span class="class_name">01A</span>

</body>

... it yields:

<a href="link1">link</a>
<a href="link4">link</a>