Search code examples
pythonbeautifulsouptagshrefpython-requests-html

How can I get this href by beautifulsoup?


I want to get a product url on this website: https://stockx.com/search?s=555088-105

the url i want to get

But i try this code

link = soup.find("div", class_ = 'browse-grid loading undefined')
print(link)

It just return

<div class="browse-grid loading undefined"><div class="back-to-top"><div class="back-to-top-container"><img alt="back to top" src="https://stockx-assets.imgix.net/svg/icons/back-to-top.svg?auto=compress,format"/><span>TOP</span></div></div><div class="browse-grid"><div class="no-results">NOTHING TO SEE HERE! PLEASE CHANGE YOUR FILTERS OR <a href="/product-suggestion">Suggest a Product</a></div></div></div>

or i try this, it just print all the url without the url I want

a_tags = soup.find_all('a')
for tag in a_tags:
  print(tag.get('href'))

How can I get the url in my picture?


Solution

  • The URL you see on the page is loaded from external source via JavaScript - so beautifulsoup doesn't see it. You can simulate the Ajax requests with requests module:

    import re
    import json
    import requests
    
    url = "https://stockx.com/search?s=555088-105"
    api_url = "https://stockx.com/api/browse"
    
    id_ = re.search(r"s=([\d-]+)", url).group(1)
    params = {
        "": "",
        "currency": "EUR",
        "_search": id_,
        "dataType": "product",
    }
    
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
        "Referer": url,
    }
    
    data = requests.get(api_url, params=params, headers=headers).json()
    
    # uncomment this to print all data:
    # print(json.dumps(data, indent=4))
    
    for product in data["Products"]:
        print("https://stockx.com/" + product["urlKey"])
    

    Prints:

    https://stockx.com/air-jordan-1-retro-high-dark-mocha