I have html like this
<span class="age">
Ages 15
<span class="loc" id="loc_loads1">
</span>
<script>
getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1);
</script>
</span>
I am trying to extract Age 15
using BeautifulSoup
So i written python code as follows
code:
from bs4 import BeautifulSoup as bs
import urllib3
URL = 'html file'
http = urllib3.PoolManager()
page = http.request('GET', URL)
soup = bs(page.data, 'html.parser')
age = soup.find("span", {"class": "age"})
print(age.text)
output:
Age 15 getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1);
I want only Age 15
not the function inside script
tag. Is there any way to get only text: Age 15
? or any way to exclude the content of script
tag?
PS: there are too many script tags and different URLS. I don't prefer replace text from the output.
Use .find(text=True)
EX:
from bs4 import BeautifulSoup
html = """<span class="age">
Ages 15
<span class="loc" id="loc_loads1">
</span>
<script>
getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1);
</script>
</span>"""
soup = BeautifulSoup(html, "html.parser")
print(soup.find("span", {"class": "age"}).find(text=True).strip())
Output:
Ages 15