I'm using Sanic, we're using a proxy to make outside connects for web scraping.
I want to create a python dict, and place a list of proxies within the python dict. Within this dict, we'll store values such as 0
, 1
. 1
= proxy failed connection.
I'm looking to evenly access a list, so our proxies have some predictable pattern on how they are used over time. Instead of randomly selecting them which could yield heavy usage of 1 proxy over others.
However, since Sanic has a number of workers. I'm trying to figure out how to handle this.
Visually my thoughts are, it would be like a line of proxies, each time it's requested the proxy on top takes the next step and once it's called upon, it will go to the back of the line again.
Something like, https://docs.python.org/2/library/itertools.html#itertools.cycle seems like a great option.
However, my question is... How can this happen async'd and non blocking. As a worker or a request can happen at the same time. How does this get resolved if 2-50 request happen at the same time.
Your best bet might be to look at something like aredis. Workers are essentially subprocesses so a shared dict would not work.
If you look in the source code where the workers settings is used, further down the line in a method called serve_multiple
def serve_multiple(server_settings, workers):
"""Start multiple server processes simultaneously. Stop on interrupt
and terminate signals, and drain connections when complete.
:param server_settings: kw arguments to be passed to the serve function
:param workers: number of workers to launch
:param stop_event: if provided, is used as a stop signal
:return:
"""
server_settings['reuse_port'] = True
# Handling when custom socket is not provided.
if server_settings.get('sock') is None:
sock = socket()
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind((server_settings['host'], server_settings['port']))
sock.set_inheritable(True)
server_settings['sock'] = sock
server_settings['host'] = None
server_settings['port'] = None
def sig_handler(signal, frame):
logger.info("Received signal %s. Shutting down.", Signals(signal).name)
for process in processes:
os.kill(process.pid, SIGINT)
signal_func(SIGINT, lambda s, f: sig_handler(s, f))
signal_func(SIGTERM, lambda s, f: sig_handler(s, f))
processes = []
for _ in range(workers):
process = Process(target=serve, kwargs=server_settings)
process.daemon = True
process.start()
processes.append(process)
for process in processes:
process.join()
# the above processes will block this until they're stopped
for process in processes:
process.terminate()
server_settings.get('sock').close()
Redis has a queue, so you could take something off the queue then replace it if need be.
I imagine the proxy you could achieve with nginx?