Search code examples
pythonpython-3.xbeautifulsouphtml-parsingstring-parsing

Parse HTML to get specific tags in Python


I'm trying to parse an HTML source with Python. I'm using BeautifulSoup for the purpose. What I need to get is to get all td tags with ids in the form of nameX format, where X starts from 1. So they are name1, name2, ... as many as we have.

How can I achieve this? My simple code using regex doesn't work.

soup = BeautifulSoup(response.text,"lxml")
resp=soup.find_all("td",{"id":'name*'})

Error:

IndexError: list index out of range

Solution

  • use lambda + startswith

    soup.find_all('td', id=lambda x: x and x.startswith('name'))
    

    or regex

     soup.find_all('td', id=re.compile('^name'))