Search code examples
pythonhtmlbeautifulsoupsubstringrss-reader

I want to get the image link inside a RSS feed description tag


I want to get the image link inside a RSS feed description tag.

Using feedparser got the values in the discription tag.But i want to get the image link inside that tag.

<description><![CDATA[<div class="K2FeedImage"><img src="https://srilankamirror.com/media/k2/items/cache/25a3bb259efa21fc96901ad625f3a85d_S.jpg" alt="MP Piyasena sentenced to 4 years in prison" /></div><div class="K2FeedIntroText"><p>Former Tamil National Alliance (TNA) parliamentarian, P. Piyasena has been sentenced to 4 years in prison and fined Rs.</p>
</div><div class="K2FeedFullText">
<p>5.4 million for using state-owned vehicle for an year after losing his parliamentary seat.</p></div>]]></description>

Then i tried in his way using substring in python.

import re

text =  "<![CDATA[<img src='https://adaderanaenglish.s3.amazonaws.com/' width='60' align='left' hspace='5'/>Former Tamil National Alliance (TNA) MP P. Piyasena had been sentenced to 4 years in prison over a case of misusing a state vehicle after losing his MP post. MORE..]]>"

match = re.search("<img src=\"(.+?) \"", text, flags=re.IGNORECASE)

try:
    result = match.group(1)
except:
    result = "no match found"

print(result)

C:/Users/ASUS/Desktop/untitled/a.py

no match found

Process finished with exit code 0


Solution

  • You can get the image link without regular expression.Try the below code.It will first find the next_element and then get the soup again to get the image link.

    from bs4 import BeautifulSoup
    
    data='''<description><![CDATA[<div class="K2FeedImage"><img src="https://srilankamirror.com/media/k2/items/cache/25a3bb259efa21fc96901ad625f3a85d_S.jpg" alt="MP Piyasena sentenced to 4 years in prison" /></div><div class="K2FeedIntroText"><p>Former Tamil National Alliance (TNA) parliamentarian, P. Piyasena has been sentenced to 4 years in prison and fined Rs.</p>
    </div><div class="K2FeedFullText">
    <p>5.4 million for using state-owned vehicle for an year after losing his parliamentary seat.</p></div>]]></description>'''
    
    soup=BeautifulSoup(data,'html.parser')
    item=soup.find('description')
    data1=item.next_element
    soup1=BeautifulSoup(data1,'html.parser')
    print(soup1.find('img')['src'])
    

    Output:

    https://srilankamirror.com/media/k2/items/cache/25a3bb259efa21fc96901ad625f3a85d_S.jpg