Search code examples
pythonerror-handlingtry-catch

Try /Except inside for loop not behaving as expected


CODE :

def ValidateProxy(LIST_PROXIES):
    '''
    Checks if scraped proxies allow HTTPS connection
    '''

    for proxy in LIST_PROXIES:

        print('using', proxy)

        host, port = str(proxy).split(":")

        try:
            resp = requests.get('https://amazon.com', 
                                proxies=dict(https=f'socks5://{host}:{port}'),
                                timeout=6)

        except ConnectionError:
            print(proxy, 'REMOVED')
            LIST_PROXIES.remove(proxy)


    print(len(LIST_PROXIES), 'PROXIES GATHERED')

    if len(LIST_PROXIES) != 0:
        return LIST_PROXIES
    else:
        return None

INPUT :

['46.4.96.137:1080', '138.197.157.32:1080', '138.68.240.218:1080'.....] #15 proxies

OUTPUT :

using 46.4.96.137:1080
46.4.96.137:1080 REMOVED
using 138.68.240.218:1080
138.68.240.218:1080 REMOVED
using 207.154.231.213:1080
207.154.231.213:1080 REMOVED
using 198.199.120.102:1080
198.199.120.102:1080 REMOVED
using 88.198.24.108:1080
88.198.24.108:1080 REMOVED
using 188.226.141.211:1080
188.226.141.211:1080 REMOVED
using 92.222.180.156:1080
92.222.180.156:1080 REMOVED
using 183.233.183.70:1081
183.233.183.70:1081 REMOVED
7 PROXIES GATHERED # len(LIST_PROXIES) == 7, so 8 are removed which are printed above

MY DOUBTS :

  1. Why print('using', proxy) is not getting executed everytime ? (becuase input list has 15 items and this line is printed only 8 times)

  2. Are try and except both blocks getting executed everytime ? Becuase everytime REMOVED is printed on console.

  3. I want to function it like print('using', proxy) for every proxy and if ConnectionError then print(proxy, 'REMOVED') and remove that proxy from list.

EDIT : FULL INPUT

['46.4.96.137:1080', '138.197.157.32:1080', '138.68.240.218:1080', '162.243.108.129:1080', '207.154.231.213:1080', '176.9.119.170:1080', '198.199.120.102:1080', '176.9.75.42:1080', '88.198.24.108:1080', '188.226.141.61:1080', '188.226.141.211:1080', '125.124.185.167:38801', '92.222.180.156:1080', '188.166.83.17:1080', '183.233.183.70:1081']

Solution

  • Edit 2022-08-09

    I would separate the logic into two functions. Also, please follow PEP-8 (I did not point that in the original answer)

    from typing import Iterable
    
    import requests
    
    def is_valid_proxy(proxy: str) -> bool:
        try:
            requests.get(
                'https://amazon.com',
                 proxies={'https': f'socks5://{proxy}'},
                 timeout=6,
            )
            return True
        except ConnectionError:
            return False
    
    
    def get_valid_proxies(proxies: Iterable[str]) -> list[str]:
        return [proxy for proxy in proxies if is_valid_proxy(proxy)]
    

    Instead of printing to stdout, you could use the logging module.

    Original Answer

    The problem is you are iterating over the LIST_PROXIES and removing elements from it at the same time.

    If you only want to iterate over the LIST_PROXIES once, something like this could work:

    def ValidateProxy(LIST_PROXIES):
        index = 0 
        for i in range(len(LIST_PROXIES)):
            proxy = LIST_PROXIES[index]
            print('using', proxy)
            host, port = str(proxy).split(":")
            try:
                resp = requests.get('https://amazon.com', 
                                    proxies=dict(https=f'socks5://{host}:{port}'),
                                    timeout=6)
                index += 1
            except ConnectionError:
                print(proxy, 'REMOVED')
                LIST_PROXIES.pop(index) # Index is not incremented
        print(len(LIST_PROXIES), 'PROXIES GATHERED')
        if len(LIST_PROXIES) != 0:
            return LIST_PROXIES
        else:
            return None
    

    However, if iterating over the list twice is not a problem, you can just make a copy of the list, as Sy Ker pointed out.