Search code examples
pythonasynchronouswebsocketgeneratorcoroutine

Using a Python websocket server as an async generator


I have a scraper that requires the use of a websocket server (can't go into too much detail on why because of company policy) that I'm trying to turn into a template/module for easier use on other websites.

I have one main function that runs the loop of the server (e.g. ping-pongs to keep the connection alive and send work and stop commands when necessary) that I'm trying to turn into a generator that yields the HTML of scraped pages (asynchronously, of course). However, I can't figure out a way to turn the server into a generator.

This is essentially the code I would want (simplified to just show the main idea, of course):

import asyncio, websockets

needsToStart = False  # Setting this to true gets handled somewhere else in the script

async def run(ws):
    global needsToStart

    while True:
        data = await ws.recv()
        
        if data == "ping":
            await ws.send("pong")
        elif "<html" in data:
            yield data  # Yielding the page data

        if needsToStart:
            await ws.send("work")  # Starts the next scraping session
            needsToStart = False

generator = websockets.serve(run, 'localhost', 9999)

while True:
    html = await anext(generator)

    # Do whatever with html

This, of course, doesn't work, giving the error "TypeError: 'Serve' object is not callable". But is there any way to set up something along these lines? An alternative I could try is creating an 'intermittent' object that holds the data which the end loop awaits, but that seems messier to me than figuring out a way to get this idea to work.

Thanks in advance.


Solution

  • I found a solution that essentially works backwards, for those in need of the same functionality: instead of yielding the data, I pass along the function that processes said data. Here's the updated example case:

    import asyncio, websockets
    from functools import partial
    
    needsToStart = False  # Setting this to true gets handled somewhere else in the script
    
    
    def process(html):
        pass
    
    
    async def run(ws, htmlFunc):
        global needsToStart
    
        while True:
            data = await ws.recv()
            
            if data == "ping":
                await ws.send("pong")
            elif "<html" in data:
                htmlFunc(data)  # Processing the page data
    
            if needsToStart:
                await ws.send("work")  # Starts the next scraping session
                needsToStart = False
    
    func = partial(run, htmlFunc=process)
    
    websockets.serve(func, 'localhost', 9999)