Search code examples
pythonpython-2.7beautifulsoup

Getting attribute's value using BeautifulSoup


I'm writing a python script which will extract the script locations after parsing from a webpage. Lets say there are two scenarios :

<script type="text/javascript" src="http://example.com/something.js"></script>

and

<script>some JS</script>

I'm able to get the JS from the second scenario, that is when the JS is written within the tags.

But is there any way, I could get the value of src from the first scenario (i.e extracting all the values of src tags within script such as http://example.com/something.js)

Here's my code

#!/usr/bin/python

import requests 
from bs4 import BeautifulSoup

r  = requests.get("http://rediff.com/")
data = r.text
soup = BeautifulSoup(data)
for n in soup.find_all('script'):
    print n 

Output : Some JS

Output Needed : http://example.com/something.js


Solution

  • It will get all the src values only if they are present. Or else it would skip that <script> tag

    from bs4 import BeautifulSoup
    import urllib2
    url="http://rediff.com/"
    page=urllib2.urlopen(url)
    soup = BeautifulSoup(page.read())
    sources=soup.findAll('script',{"src":True})
    for source in sources:
     print source['src']
    

    I am getting following two src values as result

    http://imworld.rediff.com/worldrediff/js_2_5/ws-global_hm_1.js
    http://im.rediff.com/uim/common/realmedia_banner_1_5.js
    

    I guess this is what you want. Hope this is useful.