CODE :
def ValidateProxy(LIST_PROXIES):
'''
Checks if scraped proxies allow HTTPS connection
'''
for proxy in LIST_PROXIES:
print('using', proxy)
host, port = str(proxy).split(":")
try:
resp = requests.get('https://amazon.com',
proxies=dict(https=f'socks5://{host}:{port}'),
timeout=6)
except ConnectionError:
print(proxy, 'REMOVED')
LIST_PROXIES.remove(proxy)
print(len(LIST_PROXIES), 'PROXIES GATHERED')
if len(LIST_PROXIES) != 0:
return LIST_PROXIES
else:
return None
INPUT :
['46.4.96.137:1080', '138.197.157.32:1080', '138.68.240.218:1080'.....] #15 proxies
OUTPUT :
using 46.4.96.137:1080
46.4.96.137:1080 REMOVED
using 138.68.240.218:1080
138.68.240.218:1080 REMOVED
using 207.154.231.213:1080
207.154.231.213:1080 REMOVED
using 198.199.120.102:1080
198.199.120.102:1080 REMOVED
using 88.198.24.108:1080
88.198.24.108:1080 REMOVED
using 188.226.141.211:1080
188.226.141.211:1080 REMOVED
using 92.222.180.156:1080
92.222.180.156:1080 REMOVED
using 183.233.183.70:1081
183.233.183.70:1081 REMOVED
7 PROXIES GATHERED # len(LIST_PROXIES) == 7, so 8 are removed which are printed above
MY DOUBTS :
Why print('using', proxy)
is not getting executed everytime ? (becuase input list has 15 items and this line is printed only 8 times)
Are try and except both blocks getting executed everytime ? Becuase everytime REMOVED
is printed on console.
I want to function it like print('using', proxy)
for every proxy and if ConnectionError
then print(proxy, 'REMOVED')
and remove that proxy from list.
EDIT : FULL INPUT
['46.4.96.137:1080', '138.197.157.32:1080', '138.68.240.218:1080', '162.243.108.129:1080', '207.154.231.213:1080', '176.9.119.170:1080', '198.199.120.102:1080', '176.9.75.42:1080', '88.198.24.108:1080', '188.226.141.61:1080', '188.226.141.211:1080', '125.124.185.167:38801', '92.222.180.156:1080', '188.166.83.17:1080', '183.233.183.70:1081']
I would separate the logic into two functions. Also, please follow PEP-8 (I did not point that in the original answer)
from typing import Iterable
import requests
def is_valid_proxy(proxy: str) -> bool:
try:
requests.get(
'https://amazon.com',
proxies={'https': f'socks5://{proxy}'},
timeout=6,
)
return True
except ConnectionError:
return False
def get_valid_proxies(proxies: Iterable[str]) -> list[str]:
return [proxy for proxy in proxies if is_valid_proxy(proxy)]
Instead of printing to stdout, you could use the logging module.
The problem is you are iterating over the LIST_PROXIES
and removing elements from it at the same time.
If you only want to iterate over the LIST_PROXIES
once, something like this could work:
def ValidateProxy(LIST_PROXIES):
index = 0
for i in range(len(LIST_PROXIES)):
proxy = LIST_PROXIES[index]
print('using', proxy)
host, port = str(proxy).split(":")
try:
resp = requests.get('https://amazon.com',
proxies=dict(https=f'socks5://{host}:{port}'),
timeout=6)
index += 1
except ConnectionError:
print(proxy, 'REMOVED')
LIST_PROXIES.pop(index) # Index is not incremented
print(len(LIST_PROXIES), 'PROXIES GATHERED')
if len(LIST_PROXIES) != 0:
return LIST_PROXIES
else:
return None
However, if iterating over the list twice is not a problem, you can just make a copy of the list, as Sy Ker pointed out.