Search code examples
pythonxmlselenium-webdriverweb-scrapingxpath

Looking for the closest element in XPath


I'm trying to scrap/select all the tables that have 'Consolidated Schedule of Investments' in the title but the problem is that for each pages it exists in a different position or html structure for those pages :

https://www.sec.gov/Archives/edgar/data/1287750/000128775023000021/arcc-20230331.htm https://www.sec.gov/Archives/edgar/data/1633336/000095017023020540/ccap-20230331.htm https://www.sec.gov/Archives/edgar/data/1534254/000153425423000008/cion-20230331.htm

This code will select the element, but the next step is to select the closest table to it, either if it's sibiling ascendent or descendant :

//span[contains(translate(., 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), 'CONSOLIDATED SCHEDULE OF INVESTMENTS')]


Solution

  • I think you want to use ancestor and descendant and following e.g.

    //span[contains(translate(., 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), 'CONSOLIDATED SCHEDULE OF INVESTMENTS')]/(ancestor::table[1] | descendant::table[1] | following::table[1])[1]
    

    the (ancestor::table[1] | descendant::table[1] | following::table[1])[1] should take care of "either if it's sibiling ascendent or descendant".

    Note: the used syntax is only supported in current XPath (i.e. not in 1.0) so I am not quite sure you can use it; in the Python world there are at least two options to use the current version 3.1 of XPath, namely ElementPath https://pypi.org/project/elementpath/ and SaxonCHE https://pypi.org/project/saxonche/.