Search code examples
pythonpython-3.xbeautifulsoupbs4dash

BeautifulSoup find the url out of the result of the find_all


url = 'http://www.xxx'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

s1 = soup.find_all(id="contents")
print(s1, "\n")

The output of the find_all:

[<div id="contents" style="width:1000px;padding:10px 0;overflow:hidden;"><table style="margin:0;width:1000px;overflow:hidden;" width="980">
<tr><td style="text-align:center;">
<img src="http://xxx/shop/data/editor/2020090302-01.jpg"/></td></tr></table>
</div>] 

How can I get the src of the img tag from the results?
Do I have any way to get the url instead of the id="contents" option?
What I just want is the URL from the result.


Solution

  • You can get the src of the img in the div like this:

    from bs4 import BeautifulSoup as bs
    import urllib
    
    url = 'http://www.cobaro.co.kr/shop/goods/goods_view.php?goodsno=8719&category=003004'
    html = urllib.request.urlopen(url).read()
    soup = bs(html, 'html.parser')
    divs = soup.find_all(id="contents")
    
    for div in divs:
        img_tag = div.find('img')
        print(img_tag['src'])
    
    Output:
    
    http://cobaro.co.kr/shop/data/editor/2020090302-01.jpg