Search code examples
pythonxpathlxmlelementchildren

find all tr in a table element with xpath?


def parse_header(table):
    ths = table.xpath('//tr/th')
    if not ths:
        ths = table.xpath('//tr[1]/td') # here is the problem, this will find tr[1]/td in all html file insted of this table

    # bala bala something elese

doc = html.fromstring(html_string)
table = doc.xpath("//div[@id='divGridData']/div[2]/table")[0]
parse_header(table)

I want to find all tr[1]/td in my table, but table.xpath("//tr[1]/td") still find all in html file. How can I find in just this element instead of all html file?


EDIT:

    content = '''

<root>
    <table id="table-one">
        <tr>
            <td>content from table 1</td>
        <tr>
        <table>
             <tr>
                 <!-- this is content I do not want to get -->
                <td>content from embeded table</td>
            <tr>
        </table>
    </table>
</root>'''

root = etree.fromstring(content)
table_one = root.xpath('table[@id="table-one"]')
all_td_elements = table_one.xpath('//td') # so this give me too much!!!

now I do not want the embeded table content, how can I do this?


Solution

  • To find the elements that are sub-elements of your context node, prepend the period . operator to your XPath. So, I think the XPath you are looking for is:

    .//tr[1]/td
    

    This will select td elements that are sub-elements of the current table, not in the entire HTML file.

    As an example:

    from lxml import etree
    
    content = '''
    
    <root>
        <table id="table-one">
            <tr>
                <td>content from table 1</td>
            <tr>
        </table>
        <table id="table-two">
            <tr>
                <td>content from table 2</td>
            <tr>
        </table>
    </root>'''
    
    root = etree.fromstring(content)
    table_one = root.xpath('table[@id="table-one"]')
    
    # this will select all td elements in the entire XML document (so two elements)
    all_td_elements = table_one.xpath('//td') 
    
    # this will just select the single sub-element because of the period
    just_sub_td_elements = table_one.xpath('.//td')