Search code examples
pythonimageseleniumscreen-scraping

scraping images from a web page using selenium python?


Some one on another platform asks for some one to scrape images from a web sit. The idea is that the images load in the same page. I couldn't find a way except for loading all the image in the page using selenium then extract every image url then open each image in a new tab and download it; but this is very resource consuming, the images in some cases go over 200003 I am new to scraping and my web design background is alitle; Is there is abetter teqniuque to scrape the images. note: I am not doing it for the money; it is only practicing new teqniuques.

https://generated.photos/faces/natural/front-facing/young-adult/white-race/brown-hair/short/joy/female/brown-eyes


Solution

  • AOA Muhammad Here is the code you can follow the code and can extract all the images.

    #import modules
    import requests
    import json
    from bs4 import BeautifulSoup
    
    #define headers
    headers = {
        'authority': 'api.generated.photos',
        'sec-ch-ua': '^\\^Google',
        'accept': 'application/json, text/plain, */*',
        'authorization': 'API-Key Cph30qkLrdJDkjW-THCeyA',
        'sec-ch-ua-mobile': '?0',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36',
        'origin': 'https://generated.photos',
        'sec-fetch-site': 'same-site',
        'sec-fetch-mode': 'cors',
        'sec-fetch-dest': 'empty',
        'referer': 'https://generated.photos/',
        'accept-language': 'en-PK,en-US;q=0.9,en;q=0.8',
        'cookie': 'gp_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFVG86HVJhY2s6OlNlc3Npb246OlNlc3Npb25JZAY6D0BwdWJsaWNfaWRJIkViMzUzYjQ3MTYyOTNjMzdkOTE2OTU4MzZkNzAxNjUyODY1MjU3NTExOTNlNzhmYjY2NDMyOTY1MDEyNjkxMDZiBjsARkkiDGNhcnRfaWQGOwBGSSIdNjA3YTg0YTdjN2VjMzEwMDBjZDY3ZGU3BjsAVA^%^3D^%^3D--038eee55b343dcdd77021c6b3494a8111809032d; _ga=GA1.2.1963701744.1618642096; _gid=GA1.2.180857723.1618642096; _gat=1',
    }
    
    #define the filters
    filters = {
        'order_by': 'latest',
        'page': '1',
        'per_page': '30',
        'face': 'natural',
        'head_pose': 'front-facing',
        'age': 'young-adult',
        'ethnicity': 'white',
        'hair_color': 'brown',
        'hair_length': 'short',
        'emotion':'joy',
        'gender':'female',
        'eye_color': 'brown',
    }
    
    #Now requests to website
    
    image_url = []
    #start loop for pagination
    for i in range(1,687):       
        api = f"https://api.generated.photos/api/frontend/v1/images?order_by=latest&page={i}&per_page=30&face=natural&head_pose=front-facing&age=young-adult&ethnicity=white&hair_color=brown&hair_length=short&emotion=joy&gender=female&eye_color=brown"
        response = requests.get(api, headers=headers)
        #loads the response to json
        json_res = json.loads(response.content)        
        image = json_res['images']
        for url in image:
            image_url.append(url['thumb_url'])
    
    
    #Download the image
    for url in image_url:      
        img_content = requests.get(url).content
        with open('Image.jpg','wb') as fh:
            fh.write(img_content)
    

    P:S Keep in mind this will take a lot of time, So you can change the range like (1,4) if you're doing it just for practice.