Search code examples
pythonpython-requestsqthread

How can I exit a Python requests get when stream = True but data is not always flowing in?


I am using requests to issue a get on a webpage where new data is added as events occur in the real world. I want to continue getting this data as long as the window is open so I set stream = True and then iterate line-by-line over the data as it streams in.

page = requests.get(url, headers=headers, stream=True)
# Process the LiveLog data until stopped from exterior source
for html_line in page.iter_lines(chunk_size=1):
    # Do other work here

I have no problem with this part, but when it comes to exiting this loop I run into a problem. From looking at other StackOverflow threads I understand I can't catch any signals since my for loop is blocking. Instead I've tried using the following code which does work but with one big problem.

if QThread.currentThread().isInterruptionRequested():
    break

This code will get me out of my loop, but I've found that the only time the for loop iterates is when new data is introduced to the get, and in my situation this is not continuous. I could go without any new data for minutes or longer, and don't want to have to wait on this new data to land before I go through my loop again to check if an interruption is requested.

How can I exit my loop immediately after a user-action?


Solution

  • You might try the aiohttp library https://github.com/aio-libs/aiohttp, and specifically https://aiohttp.readthedocs.io/en/stable/streams.html#asynchronous-iteration-support. It would look something like:

    import asyncio
    import aiohttp
    
    async def main():
        url = 'https://httpbin.org/stream/20'
        chunk_size = 1024
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as resp:
                while True:
                    data = await resp.content.readline():
                    print(data) # do work here
    
    if __name__ == "__main__":
        asyncio.run(main())
    

    It's worth noting that resp.content is a StreamReader so you can use other methods available https://aiohttp.readthedocs.io/en/stable/streams.html#aiohttp.StreamReader