python web-scraping beautifulsoup google-colaboratory

Deep web data scraping with python in google colaboratory

I have the code from an answer to another question of mine.

It can pull data in each page. So, my next problem is how to drag data in each dress like model's name, model's size ,and features.

More than that there are more than one model in each dress (for example BOHO BIRD Amore Wrap Dress have a 3 model who wearing size 10, 14, and 16 for example.

    import json
        
    import requests
    from bs4 import BeautifulSoup

    cookies = {
        "ServerID": "1033",
        "__zlcmid": "10tjXhWpDJVkUQL",
    }
    
    headers = {
        "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
                      "(KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"
    }

    def extract_info(bs: BeautifulSoup, tag: str, attr_value: str) -> list:
        return [i.text.strip() for i in bs.find_all(tag, {"itemprop": attr_value})]

    def extract_info(bs: BeautifulSoup, tag: str, attr_value: str) -> list:
        return [i.text.strip() for i in bs.find_all(tag, {"itemprop": attr_value})]
    
    all_pages = []
    for page in range(1, 29):
        print(f"{all_pages}\nFound: {len(all_pages)} dresses.")
    
        current_page = f"https://www.birdsnest.com.au/womens/dresses?page={page}"
        source = requests.get(current_page, headers=headers, cookies=cookies)
        soup = BeautifulSoup(source.content, 'html.parser')
    
        brand = extract_info(soup, tag="strong", attr_value="brand")
        name = extract_info(soup, tag="h2", attr_value="name")
        price = extract_info(soup, tag="span", attr_value="price")
    
        all_pages.extend(
            [
                {
                    "brand": b,
                    "name": n,
                    "price": p,
                } for b, n, p in zip(brand, name, price)
            ]
        )
    
    with open("all_the_dresses2.json", "w") as jf:
        json.dump(all_pages, jf, indent=4)

Solution

The information that you want is being generated dynamically. So, you won't get it with requests. I suggest you use selenium for that.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time


link = 'https://www.birdsnest.com.au/brands/boho-bird/73067-amore-wrap-dress'
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome('C:/Users/../Downloads/../chromedriver.exe', options=options)
driver.get(link)
time.sleep(3)

soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.close()
page_new = soup.find('div', class_='model-info clearfix')
results = page_new.find_all('p')
for result in results:
    print(result.text)

Output

Marnee usually wears a size 8.
                She is wearing a size 10 in this style.
              
Her height is 178 cm.

Show Marnee’s body measurements

Marnee’s body measurements are:
Bust 81 cm
Waist 64 cm
Hips 89 cm

<div class="model-info-header">
              <p>
                <strong><span class="model-info__name">Marnee</span></strong> usually wears a size <strong><span class="model-info__standard-size">8</span></strong>.
                She is wearing a size <strong><span class="model-info__wears-size">10</span></strong> in this style.
              </p>
              <p class="model-info-header__height">Her height is <strong><span class="model-info__height">178 cm</span></strong>.</p>
              <p>
                <span class="js-model-info-more model-info__link model-info-header__more">Show <span class="model-info__name">Marnee</span>’s body measurements</span>
              </p>
            </div>

With requests you will miss all the data in BOLD which is what you want.