Search code examples
pythonweb-scrapingplaywrightplaywright-python

can't find correct 'select' HTML tag value, and trying to wait for a select option to load, playwright Python


I have an issue where I use a url that ends such as T-shirts page

I am trying to scrape the product links off the pages. I have been trying for some time now, nothing is working yet. This is my current attempt after some Googling and reading the Playwright docs:

Website html:

<select id="prodPerPageSelTop">
    <option value="24">
    <option value="48">
    <option value="72">
    <option value="96">
    <option value="All">
<select>
def playwright_get_soup(url, wait_after_page_load=None):
    with sync_playwright() as this_playwright:
        browser = this_playwright.chromium.launch()
        page = browser.new_page()
        start = time.perf_counter()
        page.goto(url)
        try:
            page.wait_for_load_state("load")
            if wait_after_page_load:
                time.sleep(wait_after_page_load
            products_on_page = page.querySelector('#prodPerPageSelTop').innerText()
            page.waitForFunction("document.querySelector('#prodPerPageSelTop').innerText() !== '" + products_on_page + "'")

            # attempt 1
            page.click('#prodperpageselect').select_option('96')

            # attempt 2
            # products_on_page = page.querySelector('#prodperpageselect ').innerText()
            # page.waitForFunction("document.querySelector('#prodPerPageSelTop').innerText() !== '" + products_on_page + "'")

            # attempt 3
            # new_selector = 'id=prodPerPageSelTop'
            # page.waitForSelector(new_selector)
            # handle = page.querySelector(new_selector)
            # handle.selectOption({"value": "96"})
   
            # attempt 4
            # page.select_option('select#prodperpageselect', value='96')

            time.sleep(15) 

            # try to wait 
            page.wait_for_selector('select#prodperpageselect option[value="96"]')
        except:
            pass

        soup = BeautifulSoup(page.content(), "html.parser")
        browser.close()
        return soup

soup = playwright_get_soup("https://www.alphabroder.com/category/t-shirts")

def get_links(page_soup):
    these_links = []
    all_product_thumbnails = page_soup.find_all("div", class_="thumbnail")
    for thumbnail in all_product_thumbnails:
        a_tag = thumbnail.find("a")
        link = a_tag["href"]
        these_links.append(link)
        return these_links

page_links = get_links(soup)

assert(len(page_links) == 96

As the page loads, it starts on 24 items, continues loading for 4-5 seconds, then flickers and the select option then changes from say 24 items to 96 items.

I was expecting wait_for_selector to work. I also wait 15 seconds after the page loads, yet returns 24 items, not 96.

So far, I've also tried clicking the select option tag 4 different ways myself, and nothing has worked yet.

I did review similar questions that use Playwright. I'm trying to be more respectful on this site than I was when I was younger.

Any help appreciated, thank you


Solution

  • Even if your focus is to get the information with playwright - Therefore, I would just like point out additionally that scraping the information can also be implemented quite simply using requests and the endpoint via which the information is loaded:

    import requests
    
    page_num = 1
    
    data = []
    
    while True:
        json_data  = requests.get(f'https://www.alphabroder.com/cgi-bin/livewamus/wam_tmpl/catalog_browse.p?action=getProduct&content=json&page=catalog_browse&startpath=1017&getNumProd=true&sort=pl&sortdir=asc&pageNum={page_num}&prodPerPage=96&site=ABLive&layout=Responsive&nocache=62059').json()
        data.extend(json_data.get('browseProd'))
        
        if page_num < json_data.get('paging')[0].get('pgTotal'):
            page_num = page_num+1
        else:
            break
    
    data
    
    [{'productID': 'G500', 'colorCode': '93', 'description': 'Gildan Adult Heavy Cotton\x99 T-Shirt', 'division': 'AB', 'prodCat': '130', 'mill': '07', 'prodImg': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_93_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_93_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'prodURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html', 'regPriceDisp': '$0.00', 'onSale': True, 'salePrice': 2.32512, 'salePriceDisp': '$2.33', 'colorCount': 75, 'colorCountLabel': 'Colors', 'sizeCount': 8, 'primePlus': 'primeplus', 'primePlusLogo': True, 'sizeLabel': ' S - 5XL', 'sizeLabelDesc': 'Sizes:', 'msrpPriceDesp': 'Starting At: Pricing upon request', 'salesRank': 1, 'primePlusHTML': "<img src='/img/primeplus_logo.png' alt='Prime Plus Logo' title='Prime Plus' border='0' height='24' class='primelogo'>", 'showPriceHTML': "<span class='browseSalePrice'> $2.33</span>", 'gaMktgMill': 'Gildan', 'gaMktgCategory': 'T-Shirts', 'gaCurrency': 'USD', 'gaList': 'Results from Search List', 'sustainLogo': True, 'sustainLogoHTML': "<img src='/img/leaf_logo.png' alt='Sustain Logo' title='Sustain' border='0' height='20' class='sustainlogo'>", 'colorURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=93', 'colorSwatch': [{'productID': 'G500', 'colorCode': '00', 'colorXref': 'White ', 'description': 'WHITE', 'hexColor': 'FFFFFF', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_00_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_00_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_00_g.jpg', 'sortOrder': 1, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=00', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '01', 'colorXref': 'Pink', 'description': 'AZALEA', 'hexColor': 'FF76A0', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_01_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_01_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_01_g.jpg', 'sortOrder': 2, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=01', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '05', 'colorXref': 'Yellow', 'description': 'YELLOW HAZE', 'hexColor': 'EEE8A0', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_05_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_05_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_05_g.jpg', 'sortOrder': 3, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=05', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '08', 'colorXref': 'Light Blue', 'description': 'INDIGO BLUE', 'hexColor': '34657f', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_08_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_08_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_08_g.jpg', 'sortOrder': 4, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=08', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '11', 'colorXref': 'Pink', 'description': 'LIGHT PINK', 'hexColor': 'FFE4E4', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_11_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_11_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_11_g.jpg', 'sortOrder': 5, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=11', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '12', 'colorXref': 'Orange', 'description': 'TANGERINE', 'hexColor': 'FF8A3D', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_12_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_12_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_12_g.jpg', 'sortOrder': 6, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=12', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '18', 'colorXref': 'Tan', 'description': 'SAND', 'hexColor': 'c5b9ac', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_18_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_18_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_18_g.jpg', 'sortOrder': 7, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=18', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '20', 'colorXref': 'Tan', 'description': 'NATURAL', 'hexColor': 'F3E4C4', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_20_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_20_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_20_g.jpg', 'sortOrder': 8, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=20', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '21', 'colorXref': 'Yellow', 'description': 'DAISY', 'hexColor': 'F9F46F', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_21_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_21_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_21_g.jpg', 'sortOrder': 9, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=21', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '25', 'colorXref': 'Orange', 'description': 'TEXAS ORANGE', 'hexColor': 'af5c37', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_25_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_25_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_25_g.jpg', 'sortOrder': 10, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=25', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '27', 'colorXref': 'Red', 'description': 'GARNET', 'hexColor': '8B0000', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_27_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_27_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_27_g.jpg', 'sortOrder': 11, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=27', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '29', 'colorXref': 'Yellow', 'description': 'OLD GOLD', 'hexColor': 'e0b06e', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_29_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_29_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_29_g.jpg', 'sortOrder': 12, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=29', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '30', 'colorXref': 'Pink', 'description': 'HELICONIA', 'hexColor': 'FF00FF', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_30_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_30_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_30_g.jpg', 'sortOrder': 13, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=30', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}, {'productID': 'G500', 'colorCode': '31', 'colorXref': 'Orange', 'description': 'TENNESSEE ORANGE', 'hexColor': 'EB9501', 'image': '<noscript><img src=\'https://www.alphabroder.com//prodimg/small/g500_31_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\'></noscript><img src=\'/img//lazy.png\' data-lazyload data-src=\'https://www.alphabroder.com//prodimg/small/g500_31_g.jpg\' alt=\'Gildan Adult Heavy Cotton\x99 T-Shirt\' onerror=\'$.wam.imgError(this,"small")\'>', 'imageURL': 'https://www.alphabroder.com//prodimg/small/g500_31_g.jpg', 'showMoreColors': True, 'sortOrder': 14, 'productURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html?color=31', 'mainProdURL': 'https://www.alphabroder.com/product/g500/gildan-adult-heavy-cotton-t-shirt.html'}]},...]