i'm having a little bit of an issue: I would like to take this data,
for item in g_data:
print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[0]["href"]
print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[1]["href"]
print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[2]["href"]
print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[3]["href"]
and use the results in another process.
The code currently prints out the urls of the first page of an amazon search term, I would like to take those urls and then scrape the data on the page. How would I go about making it so that it would be something like this:
If for item in g_data
returns url
, taker url[1:15]
and do 'x' with
it.
If for item in g_data
does not return url, say "No urls to work with"
.
Any help or leads you could give would be great, thanks once again.
If you want to take each item in g_data
, find all urls in the item and if there are any, do x with them, if there are no urls in the item, then just print something, then this should work:
def do_x(url):
""" Does x with the given url. """
short = url[1:15]
# do x with short
# ...
# process all items in g_data
for item in g_data:
# find all links in the item
links = item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})
if not links:
# no links in this item -> skip
print("No urls to work with.")
continue
# process all links
for link in links:
urls = link["href"]
# process each url
for url in urls:
do_x(url)
Is this what you wanted?