Search code examples
pythonloopscsvbeautifulsoupexport-to-csv

Looping in BeautifulSoup


I am using python to loop a list of "keys" inside a knows url and extracting as an output. To do this I define a get_urls(key) function and then loops trough key. You can see my example code here:

import urllib3
import requests
urllib3.disable_warnings()
from bs4 import BeautifulSoup
import pandas as pd

def get_urls(key):
    url = f'https://aurl.com/{key}#ltr-{key}'
    r = requests.get(url,proxies=proxies, verify=False)
    soup = BeautifulSoup(r.content, "html.parser")

    for a in soup.find_all('a', href=True):
        z=print(a['href'])
    return z

key = ['C','B']

urllist = []
for key in key:
    urllist.append(get_urls(key))

dflinks = pd.DataFrame(urllist) 
path = 'D://mycsv.csv'
dflinks.to_csv(path,index=False)

The first part of the code seems to be doing the job, as I see the desired urls in the out. However, I must have an error when saving these urls to a csv, as when I open the desired file it turns out to be empty.

I know I must be incurring in a very basic mistake here, I am learning python and would really appreciate your feedback. I am sure you will spot it fast :)

Edit: something else that does not work:

Another strategy that does no work is to substitute the line above the code for:

linklist = []
    for a in soup.find_all('a', href=True):
        z=linklist.append(a['href'])
    return z

Solution

  • You are not returning what you want:

    z = linklist.append(a['href'])
    

    .append() returns None. It is a void function that you call to append a value in a list. Also, you are returning z. Try returning the linklist that you created:

    linklist = []
    for a in soup.find_all('a', href=True):
        linklist.append(a['href'])
    return linklist
    

    You can also try a more pythonic code, but that is optional:

    return [ a['href'] for a in soup.find_all('a', href=True) ]