Search code examples
pythonbeautifulsoupresultsetfindall

AttributeError: 'ResultSet' object has no attribute 'find_all'


Where is error? I want to parse my text without tags.

from bs4 import BeautifulSoup       
import re
import urllib.request
f = urllib.request.urlopen("http://www.championat.com/football/news-2442480-orlov-zenit-obespokoen---pole-na-novom-stadione-mozhet-byt-nekachestvennym.html")

soup = BeautifulSoup(f, 'html.parser')

soup=soup.find_all('div', class_="text-decor article__contain")

invalid_tags = ['b', 'i', 'u', 'br', 'a']

for tag in invalid_tags: 

  for match in soup.find_all(tag):

        match.replaceWithChildren()

soup = ''.join(map(str, soup.contents))

print (soup)

Error:

Traceback (most recent call last):
  File "1.py", line 9, in <module>
    for match in soup.find_all(tag):
AttributeError: 'ResultSet' object has no attribute 'find_all'

Solution

  • soup=soup.find_all('div', class_="text-decor article__contain")

    On this line soup becomes a ResultSet instance - basically a list of Tag instances. And, you are getting the 'ResultSet' object has no attribute 'find_all' since this ResultSet instance does not have a find_all() method. FYI, this problem is actually described in the troubleshooting section in the docs:

    AttributeError: 'ResultSet' object has no attribute 'foo' - This usually happens because you expected find_all() to return a single tag or string. But find_all() returns a list of tags and strings–a ResultSet object. You need to iterate over the list and look at the .foo of each one. Or, if you really only want one result, you need to use find() instead of find_all().

    And you really want one result, since there is a single article on the page:

    soup = soup.find('div', class_="text-decor article__contain")
    

    Note though that there is no need to find tags one by one, you can pass a list of tag names directly to find_all() - BeautifulSoup is quite flexible in locating elements:

    article = soup.find('div', class_="text-decor article__contain")
    
    invalid_tags = ['b', 'i', 'u', 'br', 'a']
    for match in article.find_all(invalid_tags):
         match.unwrap()  # bs4 alternative for replaceWithChildren