Search code examples
pythonregexseleniumxpathsplinter

Splinter find_by_xpath: using regex for element text()


I am running a browser test with splinter and have a page with a large table. I want to locate all <tr> elements that contain a <td> with some nicely-formatted date in their text, like the one highlighted here:

enter image description here

It's easy to find the rows with specific text, e.g., via:

browser.find_by_xpath('//tr[.//td[contains(text(), "September")]]')

So then I tried something like the suggestions here to find text() with the general date pattern (help with simplifying my regex is welcome, too):

exp = '[A-Z][a-z]+\\s[1-9]{1,2},\\s[0-9]{4}'
browser.find_by_xpath('//tr[.//td[matches(text(), "{0}")]]'.format(exp))

This doesn't work (and I did verify that the regex works in isolation). Nor does:

browser.find_by_xpath('//tr[.//td[matches(., "{0}")]]'.format(exp))

Provided my browser allows XPath 2.0, how can I find the elements correctly?


Solution

  • Both latest Firefox and Chrome don't support XPath 2.0. Here are the relevant open issues:

    You have to approach it without using matches(). For instance, you may filter the list of tr you find in Python by taking the EAFP approach and using datetime.strptime(). Sample:

    from datetime import datetime
    
    for tr in browser.find_by_tag("tr"):
        sorted_on = tr.find_by_css(".sorted-on")
    
        try:
            datetime.strptime(sorted_on.text, "%B %d, %Y")
        except ValueError:
            continue