Search code examples
pythonbeautifulsouphtml-parsing

BeautifulSoup. Wrong element index


I've been parsing an ol element of html and came across a problem with indexing of elements.

Let assume we have the following element:

html_document = """
<ol>
    <li>Test lists</li>
    <li>Second option</li>
    <li>Third option</li>
</ol>
"""

So, let's parse it:

soup = BeautifulSoup(html_document)
all_li = tuple(soup.find_all('li'))
result = [el.parent.index(el) for el in all_li]
print(result)  # [1, 3, 5]

Why 1,3,5? Or I've missed something?


Solution

  • You are using the parent tag.Just use child tag.

    html_document = """
    <ol>
        <li>Test lists</li>
        <li>Second option</li>
        <li>Third option</li>
    </ol>
    """
    
    soup = BeautifulSoup(html_document,'lxml')
    all_li = tuple(soup.find_all('li'))
    result = [all_li.index(el) for el in all_li]
    print(result)
    

    output:

    [0, 1, 2]