Search code examples
htmlbeautifulsouppython-re

How do I search for an attribute using BeautifulSoup?


I am trying to scrape a that contains the following HTML.


<div class="FeedCard urn:publicid:ap.org:db2b278b7e4f9fea9a2df48b8508ed14 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
      
<div class="FeedCard urn:publicid:ap.org:2f23aa3df0f2f6916ad458785dd52c59 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
      

As you can see, "FeedCard " is something they have in common. Therefore, I am trying to use a regular expression in conjunction with BeautifulSoup. Here is the code I've tried.

pattern = r"\AFeedCard"



for card in soup.find('div', 'class'==re.compile(pattern)):
    print(card)
    print('**********')

I'm expecting it to give me each on of the divs from above, with the asterisks separating them. Instead it is giving me the entire HTML of the page in a single instance

Thank you,


Solution

  • No need to use regular expression here. Just use CSS selector or BS4 Api:

    from bs4 import BeautifulSoup
    
    
    html = """\
    <div class="FeedCard urn:publicid:ap.org:db2b278b7e4f9fea9a2df48b8508ed14 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
    Item 1
    </div>
    
    <div class="FeedCard urn:publicid:ap.org:2f23aa3df0f2f6916ad458785dd52c59 Component-wireStory-0-2-116 card-0-2-117" data-key="feed-card-wire-story-with-image" data-tb-region-item="true">
    Item 2
    </div>
    """
    
    soup = BeautifulSoup(html, "html.parser")
    
    for card in soup.select(".FeedCard"):
        print(card.text.strip())
    

    Prints:

    Item 1
    Item 2