I am having an issue with a website's format causing certain information within a container to have different index numbers from one search result to the next.
I am scraping pieces of data from search results. The location/Index Numbers are different in a few cases.
Basically, the exact text I need scraped from the html below is "7XB21".
<dl class="last">
::before
<dt>Part Code:</dt>
<dd>
"7XB21"
<span class="separator">,</span>
< /dd>
<dt>Weight:</dt>
<dd>97</dd>
</dl>
This is easy to do the with Python code below, as it gets me the result I need which is "7XB21"
modelcode_container = container.find_all("dd")
modelcode = (modelcode_container[5].text)
HOWEVER! Some of the HTML code scraped, while being structured the same, lacks some information which the above example shows. Here is an example of the troublesome HTML:
<dl class="last">
<dt>Stock id:</dt>
<dd>c12
<span class="separator">,</span>
</dd>
<dt>Part Code:</dt>
<dd>
"8B727"
<span class="separator">,</span>
</dd>
<dt>Weight:</dt>
<dd>102</dd>
</dl>
Do you see the difference? I would need to specify a different index number to capture the proper data which is "8B727" in this case.
I am not sure how to go about setting that up, any help would be appreciated. Thank you!
If you are certain that <dt>Part Code:</dt>
occurs before that you could use find_next_sibling() to get the dd
tag next to it.
soup.find('dt',text="Part Code:").find_next_sibling('dd')