Search code examples
pythonweb-scrapingbeautifulsouphref

Get every href from the same div in python


I have this soup:

enter image description here

The webpage has references of companies in a grid view (16 rows x 5 columns) and I want to retrieve each reference's url and the title. The problem is that all 5 references in each row, are in one class named row and when I'm scraping the page, I can only see the first reference of every row, instead of all 5 of them. Here is my code so far:

url = 'http://www.slimstock.com/nl/referenties/'

r = requests.get(url)

soup = BeautifulSoup(r.content, "lxml")

info_block = soup.find_all("div", attrs={"class": "row"})

references = pd.DataFrame(columns=['Company Name', 'Web Page'])

for entry in info_block:
    try:

        title = entry.find('img').get('title')
        url = entry.a['href']
        urlcontent = BeautifulSoup(requests.get(url).content, "lxml")

        row = [{'Company Name': title, 'Web Page': url}]
        references = references.append(row, ignore_index=True)  

    except:
        pass 

Is there a way to fix this?


Solution

  • I think you should iterate over the "img" or over the "a". You can write something like this:

    for entry in info_block:
    try:
        for a in entry.find_all("a"):
            title = a.find('img').get('title')
            url = a.get('href')
            urlcontent = BeautifulSoup(requests.get(url).content, "lxml")
            row = [{'Company Name': title, 'Web Page': url}]
            references = references.append(row, ignore_index=True)  
    except:
        pass