Search code examples
pythonbeautifulsouphtml-parsing

How to match elements containing string from BeautifulSoup list?


I have the input.html belowI

Input.html https://jsfiddle.net/f86q7ubm/

And I'm trying to match all elements within list allList with size=5, but when I run the following code, the matching has no values inside.

from bs4 import BeautifulSoup

fp = open("file.html", "rb")                 
soup = BeautifulSoup(fp,"html5lib")

allList = soup.find_all(True)

matching = [s for s in allList if 'size="5"' in s]  

What I'm doing wrong?


Solution

  • There may(should) be a better way to this, but you can just do str(s). You were trying to do a match in a non-string object:

    from bs4 import BeautifulSoup
    
    fp = open("file.html", "rb")                 
    soup = BeautifulSoup(fp,"html5lib")
    
    allList = soup.find_all(True)
    
    matching = [s for s in allList if 'size="5"' in str(s)] 
    

    Not sure if this is what you want, but a better way could be:

    allList = soup.find_all("font", {"size": "5"}) # you already have the matching elements here