python selenium-webdriver webdriver google-chrome-devtools

How to get image response in chrome devtools using python webdriver

I am trying to make a python code that automatically receives images from web pages.

The method is to get image response that can be obtained by accessing a specific web page using Selenium and copying the data of the image from the network of chrome devtool.

This is because specific sites are blocked by cloudflare, and if I use common methods such as requests or urllib.request, 403 errors occurred.

I can receive image data through 'Copy response' like a screenshot, but I want to get it using the chrome webdriver with python.

Copy response in Chrome devtools

from selenium import webdriver

option = webdriver.ChromeOptions()
option.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
option.add_experimental_option("debuggerAddress", "127.0.0.1:9222")

browser = webdriver.Chrome(options=option)

browser.get(url)
time.sleep(5)

log_entries = browser.get_log("performance")

I got response header with above code but I want to get full response of images

Solution

To get responses, you should loop through logs and filter message object by message that contain event Network.responseReceived.

Then you get params object and check if target_url_part is present in url.

After getting it, you execute CDP command Network.getResponseBody with requestId from params.

Depends on response body, you can perform further actions, like getting it's json field / convert it into image, etc.

Similar question answer reference

from selenium import webdriver
import json
import time

option = webdriver.ChromeOptions()
option.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
option.add_experimental_option("debuggerAddress", "127.0.0.1:9222")

browser = webdriver.Chrome(options=option)
log_entries = browser.get_log("performance")

url = 'site_url'
browser.get(url)
time.sleep(5)
target_url = "your_request_url_part"

for log in log_entries:
    message = log["message"]
    if "Network.responseReceived" not in message:
        continue
    params = json.loads(message)["message"].get("params")
    if params is None:
        continue
    response = params.get("response")
    if response is None or target_url not in response["url"]:
        continue
    body = browser.execute_cdp_cmd('Network.getResponseBody', {'requestId': params["requestId"]})