I'm writing a python script which will extract the script locations after parsing from a webpage. Lets say there are two scenarios :
<script type="text/javascript" src="http://example.com/something.js"></script>
and
<script>some JS</script>
I'm able to get the JS from the second scenario, that is when the JS is written within the tags.
But is there any way, I could get the value of src from the first scenario (i.e extracting all the values of src tags within script such as http://example.com/something.js)
Here's my code
#!/usr/bin/python
import requests
from bs4 import BeautifulSoup
r = requests.get("http://rediff.com/")
data = r.text
soup = BeautifulSoup(data)
for n in soup.find_all('script'):
print n
Output : Some JS
Output Needed : http://example.com/something.js
It will get all the src
values only if they are present. Or else it would skip that <script>
tag
from bs4 import BeautifulSoup
import urllib2
url="http://rediff.com/"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
sources=soup.findAll('script',{"src":True})
for source in sources:
print source['src']
I am getting following two src
values as result
http://imworld.rediff.com/worldrediff/js_2_5/ws-global_hm_1.js
http://im.rediff.com/uim/common/realmedia_banner_1_5.js
I guess this is what you want. Hope this is useful.