Search code examples
pythonweb-scrapingbeautifulsoup

Amazon Scrape Product Detail Page


import requests
URL = "https://www.amazon.com/TRESemm%C3%A9-Botanique-Shampoo-Nourish-Replenish/dp/B0199WNJE8/ref=sxin_14_pa_sp_search_thematic_sspa?content-id=amzn1.sym.a15c61b7-4b93-404d-bb70-88600dfb718d%3Aamzn1.sym.a15c61b7-4b93-404d-bb70-88600dfb718d&crid=2HG5WSUDCJBMZ&cv_ct_cx=hair%2Btresemme&keywords=hair%2Btresemme&pd_rd_i=B0199WNJE8&pd_rd_r=28d72361-7f35-4b1a-be43-98e7103da70c&pd_rd_w=6UL4P&pd_rd_wg=JtUqB&pf_rd_p=a15c61b7-4b93-404d-bb70-88600dfb718d&pf_rd_r=DFPZNAG391M5JS55R6HP&qid=1660432925&sprefix=hair%2Btresemme%2Caps%2C116&sr=1-3-a73d1c8c-2fd2-4f19-aa41-2df022bcb241-spons&smid=A3DEFW12560V8M&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExQlM3VFpGRVM5Tk8wJmVuY3J5cHRlZElkPUEwNjE5MjQwM01JV0FNN1pOMlRHSSZlbmNyeXB0ZWRBZElkPUEwNTA1MDQyMlQ5RjhRQUxIWEdaUiZ3aWRnZXROYW1lPXNwX3NlYXJjaF90aGVtYXRpYyZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU&th=1"
webpage = requests.get(URL, headers=HEADERS)

from bs4 import BeautifulSoup
soup = BeautifulSoup(webpage.content, "lxml")

# Outer Tag Object
title = soup.find("span", attrs={"class":'a-list-item'}).text.strip()
print(title)

This is my code:

I am trying to scrape the product detail page information best seller rank and sub-category best seller rank if you scroll down the page you will see it. But I am only getting the category Beauty Personal Care. I need the rank. Please help.


Solution

  • Select your element more specific for example with css selectors with focus of bestseller link:

    soup.select_one('#detailBulletsWrapper_feature_div a[href*="bestsellers"]').previous.strip().split()[0]
    

    Alternative could be selecting the span that contains Best Seller in the section with id detailBulletsWrapper_feature_div :

    soup.select_one('#detailBulletsWrapper_feature_div span:-soup-contains("Best Seller")').contents[2].strip().split()[0]
    

    Be aware that you always should check if the element is available.

    Example

    import requests
    from bs4 import BeautifulSoup
    
    headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
        'Accept-Language': 'en-US, en;q=0.5'}
    
    URL = "https://www.amazon.com/TRESemm%C3%A9-Botanique-Shampoo-Nourish-Replenish/dp/B0199WNJE8/ref=sxin_14_pa_sp_search_thematic_sspa?content-id=amzn1.sym.a15c61b7-4b93-404d-bb70-88600dfb718d%3Aamzn1.sym.a15c61b7-4b93-404d-bb70-88600dfb718d&crid=2HG5WSUDCJBMZ&cv_ct_cx=hair%2Btresemme&keywords=hair%2Btresemme&pd_rd_i=B0199WNJE8&pd_rd_r=28d72361-7f35-4b1a-be43-98e7103da70c&pd_rd_w=6UL4P&pd_rd_wg=JtUqB&pf_rd_p=a15c61b7-4b93-404d-bb70-88600dfb718d&pf_rd_r=DFPZNAG391M5JS55R6HP&qid=1660432925&sprefix=hair%2Btresemme%2Caps%2C116&sr=1-3-a73d1c8c-2fd2-4f19-aa41-2df022bcb241-spons&smid=A3DEFW12560V8M&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExQlM3VFpGRVM5Tk8wJmVuY3J5cHRlZElkPUEwNjE5MjQwM01JV0FNN1pOMlRHSSZlbmNyeXB0ZWRBZElkPUEwNTA1MDQyMlQ5RjhRQUxIWEdaUiZ3aWRnZXROYW1lPXNwX3NlYXJjaF90aGVtYXRpYyZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU&th=1"
    webpage = requests.get(URL, headers=headers)
    soup = BeautifulSoup(webpage.content)
    soup.select_one('#detailBulletsWrapper_feature_div span:-soup-contains("Best Seller")').contents[2].strip().split()[0]
    

    Output

    #224,708