Search code examples
pythonweb-scrapingbeautifulsoupscreen-scraping

How to take data from variable and put it into another


i'm having a little bit of an issue: I would like to take this data,

for item in g_data:
    print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[0]["href"]
    print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[1]["href"]
    print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[2]["href"]
    print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[3]["href"]

and use the results in another process.

The code currently prints out the urls of the first page of an amazon search term, I would like to take those urls and then scrape the data on the page. How would I go about making it so that it would be something like this:

If for item in g_data returns url, taker url[1:15] and do 'x' with it.

If for item in g_data does not return url, say "No urls to work with".

Any help or leads you could give would be great, thanks once again.


Solution

  • If you want to take each item in g_data, find all urls in the item and if there are any, do x with them, if there are no urls in the item, then just print something, then this should work:

    def do_x(url):
        """ Does x with the given url. """
        short = url[1:15]
        # do x with short
        # ...
    
    # process all items in g_data
    for item in g_data:
        # find all links in the item
        links = item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})
    
        if not links:
            # no links in this item -> skip
            print("No urls to work with.")
            continue
    
        # process all links
        for link in links:
            urls = link["href"]
            # process each url
            for url in urls:
                do_x(url)
    

    Is this what you wanted?