I need to get the first line of text inside a tag using python code for web scraping.
expexted output : 22 September 1995
The code html goes like this
<div class="txt-block">
<h4 class="inline">Release Date:</h4> 22 September 1995 (USA)
<span class="see-more inline">
<a href="releaseinfo?ref_=tt_dt_dt">See more</a> »
</span></div>
my code to get the data is
soup.find('div', {"class": "txt-block"}).text
output is: Release Date: 22 September 1995 (USA) See more
I would do this way
text = soup.find('h4').next_sibling
text.replace('(USA)','')
or
text = soup.find('h4',{'class','inline'}).next_sibling
text.replace('(USA)','')
Than you can use regex to exclude parenthesis (USA)
like word from text.
using regex to remove a specific word from a string
text = soup.find('h4',{'class','inline'}).next_sibling
import re
text = re.sub(r'\s\(.+\)','',text)
That will remove any other parenthesis included word from that string.