xml = """<f transform="translate(7,7)" class="SoccerPlayer SoccerPlayer-11 Team-Away Outcome-Complete" data-id="8">
<rect x="-15" y="-15" width="30" height="30" transform="rotate(0)" class="SoccerShape"></rect>
<text x="0" y="7" text-anchor="middle" transform="translate(0,0)rotate(0)">11</text>
<text class="Soccer-Hidden">
<div>
<h3>
<span class="Soccer-Key">
Suc passes
</span>
<span class="Soccer-Value">
82
</span>
</h3>
<p>
Ronaldo
</p>
</div>
</text>
</f>"""
I'm currently trying to scrape the above xml, by using soup. Specifically
from bs4 import BeautifulSoup as bs
soup=bs(xml, "xml")
for pr in soup.find_all("f")):
try:
player = pr['class']
time = pr['data-id']
except:
pass
print(player,time)
This is working as intended.
I am having difficulties scraping the nested information in the <text class="Soccer-Hidden">
tag.
I'm trying to scrape the <span class="Soccer-Key">
, <span class="Soccer-Value">
and also the value between the <p>
tags (the Ronaldo text).
What can I add to my code to get these? Thanks
Try with the method findChildren
, giving class options in a dictionary:
for pr in soup.find_all("f"):
soc_key = pr.findChildren("span" , { "class" : "Soccer-Key" })[0].text
soc_value = pr.findChildren("span" , { "class" : "Soccer-Value" })[0].text
name = pr.findChildren("p")[0].text
print(soc_key, soc_value, name)
will get you Suc passes 82 Ronaldo
with some extra space you can remove with strip()