I am trying to scrape a that contains the following HTML.
<div class="FeedCard urn:publicid:ap.org:db2b278b7e4f9fea9a2df48b8508ed14 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
<div class="FeedCard urn:publicid:ap.org:2f23aa3df0f2f6916ad458785dd52c59 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
As you can see, "FeedCard " is something they have in common. Therefore, I am trying to use a regular expression in conjunction with BeautifulSoup. Here is the code I've tried.
pattern = r"\AFeedCard"
for card in soup.find('div', 'class'==re.compile(pattern)):
print(card)
print('**********')
I'm expecting it to give me each on of the divs from above, with the asterisks separating them. Instead it is giving me the entire HTML of the page in a single instance
Thank you,
No need to use regular expression here. Just use CSS selector or BS4 Api:
from bs4 import BeautifulSoup
html = """\
<div class="FeedCard urn:publicid:ap.org:db2b278b7e4f9fea9a2df48b8508ed14 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
Item 1
</div>
<div class="FeedCard urn:publicid:ap.org:2f23aa3df0f2f6916ad458785dd52c59 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
Item 2
</div>
"""
soup = BeautifulSoup(html, "html.parser")
for card in soup.select(".FeedCard"):
print(card.text.strip())
Prints:
Item 1
Item 2