Search code examples
python-2.7beautifulsoupweb-crawlerpython-unicode

working with hrefs extracted from Beautifulsoup


I am a Python beginner learning web crawling.

On this one project, I had to retrieve some hrefs and then to print out the text content within each of these href links. Here is my code so far:

import requests, bs4, os, webbrowser
url = 'http://www.constructeursdefrance.com/resultat/?dpt=53'
res = requests.get(url)
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text,'html.parser')
for a in soup.select('.link'):
    links = a.find('a').attrs['href']

I tried many things with the links but it would say "unicode is not callable". How can I work with these links and eventually iterate over them to extract the text within?

Thanks


Solution

  • you code is almost done, just a little change

    import requests, bs4, os, webbrowser
    url = 'http://www.constructeursdefrance.com/resultat/?dpt=53'
    res = requests.get(url)
    res.raise_for_status()
    
    soup = bs4.BeautifulSoup(res.text,'html.parser')
    links = []
    for div in soup.select('.link'):
        link = div.a.get('href')
        links.append(link)
    print(links)
    

    out:

    ['http://www.constructeursdefrance.com/concept-habitat/',
     'http://www.constructeursdefrance.com/maisons-bois-cruard/',
     'http://www.constructeursdefrance.com/passiva-concept/',
     'http://www.constructeursdefrance.com/les-constructions-de-la-mayenne/',
     'http://www.constructeursdefrance.com/maisonsdenfrance53/',
     'http://www.constructeursdefrance.com/lemasson53/',
     'http://www.constructeursdefrance.com/ecb53/',
     'http://www.constructeursdefrance.com/villadeale-53/',
     'http://www.constructeursdefrance.com/habitat-plus-53/']
    

    select('.link') will return a list of div tag which has a child tag a, so you can get a tag by div.a and than get href by div.a.get('href')