Search code examples
pythonweb-scrapingproxyplaywrightplaywright-python

Using proxies with playwright in python


I'm using playwright to extract data from a website and I want to use proxies which I get from this website : https://www.proxy-list.download/HTTPS. It doesn't work, and I'm wondering if this is because the proxies are free ? If this is the reason, can someone know where can i find proxies that will work ?

This is my code :

from playwright.sync_api import sync_playwright
import time


url = "https://www.momox-shop.fr/livres-romans-et-litterature-C055/"
with sync_playwright() as p:
    browser = p.firefox.launch(
        headless=False,
        proxy= {
            'server': '209.166.175.201:3128'
        })
    page = browser.new_page()
    page.goto(url)
    time.sleep(5)

Thank you !


Solution

  • Yes, according to your link, all proxies are "dead"

    Before using proxies try checking them here is one possible solution:

    import json
    import requests
    from pythonping import ping
    from concurrent.futures import ThreadPoolExecutor
    
    
    check_proxies_url = "https://httpbin.org/ip"
    good_proxy = set()
    
    # proxy_lst = requests.get("https://www.proxy-list.download/api/v1/get", params={"type": "https"})
    # proxies = [proxy for proxy in proxy_lst.text.split('\r\n') if proxy]
    proxy_lst = requests.get("http://proxylist.fatezero.org/proxy.list")
    proxies = (f"{json.loads(data)['host']}:{json.loads(data)['port']}" for data in proxy_lst.text.split('\n') if data)
    
    def get_proxies(proxy):
        proxies = {
            "https": proxy,
            "http": proxy
        }
        try:
            response = requests.get(url=check_proxies_url, proxies=proxies, timeout=2)
            response.raise_for_status()
            if ping(target=proxies["https"].split(':')[0], count=1, timeout=2).rtt_avg_ms < 150:
                good_proxy.add(proxies["https"])
                print(f"Good proxies: {proxies['https']}")
        except Exception:
            print(f"Bad proxy: {proxies['https']}")
    
    with ThreadPoolExecutor() as executor:
        executor.map(get_proxies, proxies)
    
    print(good_proxy)
    

    Get a list of active proxies with ping up to 150ms.

    Output:

    {'209.166.175.201:8080', '170.39.194.156:3128', '20.111.54.16:80', '20.111.54.16:8123'}
    

    But in any case, this is a shared proxy and their performance is not guaranteed. If you want to be sure that your parser will work, then it is better to buy a proxy.

    I ran your code with received proxy '170.39.194.156:3128' and for now it works