Search code examples

When using BeautifulSoup, html has needed data in a different index number in some search results

I am having an issue with a website's format causing certain information within a container to have different index numbers from one search result to the next.

I am scraping pieces of data from search results. The location/Index Numbers are different in a few cases.

Basically, the exact text I need scraped from the html below is "7XB21".

<dl class="last">
    <dt>Part Code:</dt>
        <span class="separator">,</span>
    < /dd>

This is easy to do the with Python code below, as it gets me the result I need which is "7XB21"

modelcode_container = container.find_all("dd")
        modelcode = (modelcode_container[5].text)

HOWEVER! Some of the HTML code scraped, while being structured the same, lacks some information which the above example shows. Here is an example of the troublesome HTML:

<dl class="last">
    <dt>Stock id:</dt>
        <span class="separator">,</span>
    <dt>Part Code:</dt>
        <span class="separator">,</span>

Do you see the difference? I would need to specify a different index number to capture the proper data which is "8B727" in this case.

I am not sure how to go about setting that up, any help would be appreciated. Thank you!


  • If you are certain that <dt>Part Code:</dt> occurs before that you could use find_next_sibling() to get the dd tag next to it.

    soup.find('dt',text="Part Code:").find_next_sibling('dd')