I am using python to loop a list of "keys" inside a knows url and extracting as an output. To do this I define a get_urls(key)
function and then loops trough key
. You can see my example code here:
import urllib3
import requests
urllib3.disable_warnings()
from bs4 import BeautifulSoup
import pandas as pd
def get_urls(key):
url = f'https://aurl.com/{key}#ltr-{key}'
r = requests.get(url,proxies=proxies, verify=False)
soup = BeautifulSoup(r.content, "html.parser")
for a in soup.find_all('a', href=True):
z=print(a['href'])
return z
key = ['C','B']
urllist = []
for key in key:
urllist.append(get_urls(key))
dflinks = pd.DataFrame(urllist)
path = 'D://mycsv.csv'
dflinks.to_csv(path,index=False)
The first part of the code seems to be doing the job, as I see the desired urls in the out
. However, I must have an error when saving these urls to a csv, as when I open the desired file it turns out to be empty.
I know I must be incurring in a very basic mistake here, I am learning python and would really appreciate your feedback. I am sure you will spot it fast :)
Edit: something else that does not work:
Another strategy that does no work is to substitute the line above the code for:
linklist = []
for a in soup.find_all('a', href=True):
z=linklist.append(a['href'])
return z
You are not returning what you want:
z = linklist.append(a['href'])
.append()
returns None
. It is a void function that you call to append a value in a list.
Also, you are returning z
. Try returning the linklist
that you created:
linklist = []
for a in soup.find_all('a', href=True):
linklist.append(a['href'])
return linklist
You can also try a more pythonic code, but that is optional:
return [ a['href'] for a in soup.find_all('a', href=True) ]