I would like to scrape/parse data for a particular result from a page of multiple results.
For example, below is a clip of the source html of a page that has two results for a business search in a business directory. Both have business items such as Status. However, I only want the business items associated with the street address 311 South Swall Drive.
</section><section itemscope itemtype="http://schema.org/Organization" class="org">
<div class="b-business-item">
<div class='b-business-item_header-wrap '>
<div class='b-business-item_title-wrap'>
<h2 class="b-business-item_header uppercase"><a itemprop="url" href="/p/kash+apparel+lp-12645872"><font itemprop="name">Kash Apparel, Lp</font></a></h2>
<p class="b-business-item_sub-header"><span class="addr-cont" itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"><span itemprop="streetAddress">2615 Fruitland Ave</span>, <span><span itemprop="addressLocality">Los Angeles</span>, <span itemprop="addressRegion">CA</span> <span itemprop="postalCode">90058</span></span></span></p>
</div>
</div>
<p class="b-business-item_props"><span class="b-business-item_title">Status:</span><span class="b-business-item_value">Inactive</span></p>
<p class="b-business-item_props"><span class="b-business-item_title">Industry:</span><span class="b-business-item_value">Mfg Women's/Misses' Outerwear</span></p>
<p class="b-business-item_props"><span class="b-business-item_title">Members (3):</span><span class="b-business-item_value">Mel Salde <span class='gray-text'>(Accountant, inactive)</span><br/>Edir Haroni <span class='gray-text'>(Limited Partner, inactive)</span><br/>Stephanie Kleinjan <span class='gray-text'>(General Partner, inactive)</span></span></p>
</div>
</section><section itemscope itemtype="http://schema.org/Organization" class="org">
<div class="b-business-item">
<div class='b-business-item_header-wrap '>
<div class='b-business-item_title-wrap'>
<h2 class="b-business-item_header uppercase"><a itemprop="url" href="/p/kash+inc-178509132"><font itemprop="name">KASH INC</font></a></h2>
<p class="b-business-item_sub-header"><span class="addr-cont" itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"><span itemprop="streetAddress">311 South Swall Drive</span>, <span><span itemprop="addressLocality">Los Angeles</span>, <span itemprop="addressRegion">CA</span> <span itemprop="postalCode">90048</span></span></span></p>
</div>
</div>
<p class="b-business-item_props"><span class="b-business-item_title">Status:</span><span class="b-business-item_value">Inactive</span></p>
<p class="b-business-item_props"><span class="b-business-item_title">Registration:</span><span class="b-business-item_value">Sep 26, 2006</span></p>
<p class="b-business-item_props"><span class="b-business-item_title">State ID:</span><span class="b-business-item_value">C2904860</span></p>
<p class="b-business-item_props"><span class="b-business-item_title">Business type:</span><span class="b-business-item_value">Articles of Incorporation</span></p>
<p class="b-business-item_props"><span class="b-business-item_title">Member:</span><span class="b-business-item_value">Ashwant Venkatram <span class='gray-text'>(President, inactive)</span></span></p>
I am trying to scrape Status, Registration, State ID, Business type, and Member for 311 South Swall Drive, and not similar fields for the other result. Unfortunately the business directory doesn't have any way to enter the address to narrow the search to one result.
I think this is what you're looking for:
for sect in soup.find_all('section'):
for adrs in sect.select('span[itemprop="streetAddress"]'):
if adrs.text == '311 South Swall Drive':
for item in sect.select('p'):
print(item.text)
Output:
311 South Swall Drive, Los Angeles, CA 90048
Status:Inactive
Registration:Sep 26, 2006
State ID:C2904860
Business type:Articles of Incorporation
Member:Ashwant Venkatram (President, inactive)