i have not found a solution here on stackoverflow. So my HTML snippet is:
<d1>
<dt class="abc">Test</dt><dd><dl>
<dt>Part1</dt><dd><p>THISISWHATINEED<br /><a href="anyurl" target="">12334</a><br /><a href="anyurl" target="">abcdef</a></p></dd>
<dt>Part2</dt><dd><p>THISISWHATINEED2<br /><a href="anyurl" target="">12334</a><br /><a href="anyurl" target="">abcdef</a></p></dd>
<dt class="abc">Test2</dt><dd><dl>
<dt>Part3</dt><dd><p>THISISWHATINEED3<br /><a href="anyurl" target="">12334</a><br /><a href="anyurl" target="">abcdef</a></p></dd>
<dt>Part4</dt><dd><p>THISISWHATINEED4<br /><a href="anyurl" target="">12334</a><br /><a href="anyurl" target="">abcdef</a></p></dd>
So how do i get all the <p>
that fit to for example <dt class="abc">Test</dt><dd><dl>
. I tried to use d1.find_all("dt")
, but then i am missing the <p>
. I seriously don't get the way how to get the "childs". Best thing would be to iterate over the <dt>
and then inside of it over the <p>
of for example "Test" (the first part). But how do i do that? Do you guys have any tips or ideas?
What i already tried:
d1 = soup.find_all("dl")
for child in d1.children:
print(child)
And about a lot of other stuff which is not in my head anymore..
Another approach working quite good:
for child in d1.children:
if child.string is not None:
continue
if child.string is None:
xx= len(child.find_all("p"))
Thanks!
Greetings Nick
Try using the adjecent sibling (+
) CSS selector, which will select one element that immediately follows another one.
To use a CSS selector, use the .select()
method instead of find_all()
.
In your example:
for tag in soup.select(".abc +dd dt +dd p"):
print(tag.contents[0])
.abc
is the class-name, so replace abc
with the actual class<p>
tag, use .contents[0]
to get the desired elementOutput:
THISISWHATINEED1
THISISWHATINEED2
THISISWHATINEED3
THISISWHATINEED4