Where is error? I want to parse my text without tags.
from bs4 import BeautifulSoup
import re
import urllib.request
f = urllib.request.urlopen("http://www.championat.com/football/news-2442480-orlov-zenit-obespokoen---pole-na-novom-stadione-mozhet-byt-nekachestvennym.html")
soup = BeautifulSoup(f, 'html.parser')
soup=soup.find_all('div', class_="text-decor article__contain")
invalid_tags = ['b', 'i', 'u', 'br', 'a']
for tag in invalid_tags:
for match in soup.find_all(tag):
match.replaceWithChildren()
soup = ''.join(map(str, soup.contents))
print (soup)
Error:
Traceback (most recent call last):
File "1.py", line 9, in <module>
for match in soup.find_all(tag):
AttributeError: 'ResultSet' object has no attribute 'find_all'
soup=soup.find_all('div', class_="text-decor article__contain")
On this line soup
becomes a ResultSet
instance - basically a list of Tag
instances. And, you are getting the 'ResultSet' object has no attribute 'find_all'
since this ResultSet
instance does not have a find_all()
method. FYI, this problem is actually described in the troubleshooting section in the docs:
AttributeError: 'ResultSet' object has no attribute 'foo'
- This usually happens because you expectedfind_all()
to return a single tag or string. Butfind_all()
returns a list of tags and strings–a ResultSet object. You need to iterate over the list and look at the .foo of each one. Or, if you really only want one result, you need to usefind()
instead offind_all()
.
And you really want one result, since there is a single article on the page:
soup = soup.find('div', class_="text-decor article__contain")
Note though that there is no need to find tags one by one, you can pass a list of tag names directly to find_all()
- BeautifulSoup
is quite flexible in locating elements:
article = soup.find('div', class_="text-decor article__contain")
invalid_tags = ['b', 'i', 'u', 'br', 'a']
for match in article.find_all(invalid_tags):
match.unwrap() # bs4 alternative for replaceWithChildren