sentences.find_all(['p','h2'],attrs={['class':None,'class':Not None]})
.
This is an invalid syntax but is there any alternative to doing this. I want p tags with one attribute and h2 tag with another attribute and I need them sequentially not like finding them as two parse tree i.e I don't want to do
sentences.find_all('p',attrs={'class':None])
sentences.find_all('h2',attrs={'class':Not None])
You can use CSS selector with ,
(CSS reference):
from bs4 import BeautifulSoup
html_doc = """
<p class="cls1">Select this</p>
<p class="cls2">Don't select this</p>
<h2 class="cls3">Select this</h2>
<h2 class="cls4">Don't select this</h2>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for tag in soup.select("p.cls1, h2.cls3"):
print(tag)
Prints:
<p class="cls1">Select this</p>
<h2 class="cls3">Select this</h2>
EDIT: To select multiple tags and one tag has to have empty attributes:
from bs4 import BeautifulSoup
html_doc = """
<p>Select this</p>
<p class="cls2">Don't select this</p>
<h2 class="cls3">Select this</h2>
<h2 class="cls4">Don't select this</h2>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for tag in soup.select("p, h2.cls3"):
if tag.name == "p" and len(tag.attrs) != 0:
continue
print(tag)
Prints:
<p>Select this</p>
<h2 class="cls3">Select this</h2>