Search code examples
pythonweb-scrapingbeautifulsoup

How to find the tags 'a' for scraping the data?


I need to scrape the data from this website https://shop.freedompop.com/products?page=1

I use BeautifulSoup to parse the html and find that I need to find all class_="product-results-item-link layout-row flex-gt-sm-33 flex-50"

I tried using containers = html_soup.find_all('a', class_="product-results-item-link layout-row flex-gt-sm-33 flex-50") but it can't be found

    from requests import get
    from bs4 import BeautifulSoup
    from time import sleep
    from random import randint
    import pandas as pd

    product_names = []
    status = []
    ori_prices = []
    sale_prices = []

    headers = {"Accept-Language": "en-US, en;q=0.5"}

    pages = [str(i) for i in range(1,2)]
    #pages = [str(i) for i in range(1,24)]

    for page in pages:

        response = get('https://shop.freedompop.com/products' + page, headers = headers)
        sleep(2)

        html_soup = BeautifulSoup(response.text, 'html.parser')

        containers = html_soup.find_all('a', class_="product-results-item-link layout-row flex-gt-sm-33 flex-50")

        print(containers)

I expect the output to be 18 but the actual output is []


Solution

  • Website accessing all the product entries dynamically through the api. So you can directly use the their API and get the data:

    https://shop.freedompop.com/api/shop/store/555/item?page=1&pageSize=500&sort=RELEVANCE
    
    import json
    from requests import get
    from bs4 import BeautifulSoup
    
    
    response = get('https://shop.freedompop.com/api/shop/store/555/item?pageSize=410&sort=RELEVANCE')
    html_soup = BeautifulSoup(response.text, 'html.parser')
    parsed_response = json.loads(html_soup.text)
    
    
    for index,value in enumerate(a.get('results')):
        print(index, value)