Search code examples
python-3.xbeautifulsouppython-requestsjpeggif

Beautifulsoup filter "find_all" results, limited to .jpeg file via Regex


I would like to acquire some pictures from a forum. The find_all results gives me most what I want, which are jpeg files. However It also gives me few gif files which I do not desire. Another problem is that the gif file is an attachment, not a valid link, and it causes trouble when I save files.

soup_imgs = soup.find(name='div', attrs={'class':'t_msgfont'}).find_all('img', alt="")
for i in soup_imgs:
    src = i['src']
    print(src)

I tried to avoid that gif files in my find_all selections search, but useless, both jpeg and gif files are in the same section. What should I do to filter my result then? Please give me some help, chief. I am pretty amateur with coding. Playing with Python is just a hobby of mine.


Solution

  • You can filter it via regular expression.Please refer the following example.Hope this helps.

    import re
    from bs4 import BeautifulSoup
    
    data='''<html>
    <body>
    
    <h2>List of images</h2>
    
    <div class="t_msgfont">
    <img src="img_chania.jpeg" alt="" width="460" height="345">
    <img src="wrongname.gif" alt="">
    <img src="img_girl.jpeg" alt="" width="500" height="600">
    </div>
    </body>
    </html>'''
    
    soup=BeautifulSoup(data, "html.parser")
    soup_imgs = soup.find('div', attrs={'class':'t_msgfont'}).find_all('img', alt="" ,src=re.compile(".jpeg"))
    for i in soup_imgs:
        src = i['src']
        print(src)