Search code examples
pythonasync-awaitpython-asyncio

Does Semaphore object counts all async process or just the process happens under its context manager?


For webscraping purposes i use semaphore object to limit async request.i have another async process simply prints out total scraped object from website.I wonder does semaphore count all async process? or just process under of its context manager?

    async def __get_functions(self):
            some codes here...

            semaphore = asyncio.Semaphore(5)
            async with aiohttp.ClientSession() as session:
                tasks = [self.add_data(session, li, semaphore) for li in li_items]
                await asyncio.gather(*tasks)


    async def add_data(self, s, li, semaphore):
        some codes here ...
        async with semaphore:
            info = await self.__get_function_data(s, href)
        self.data.append({"Name": name, **info})
        self.count += 1

start point of program

#here as u can see there is other async functions also
    async def __start_application(self):
        await asyncio.gather(self.__print_total_count(), self.__get_functions())

    #i will call this method from class object
    def start(self):
        asyncio.run(self.__start_application())


Solution

  • First I would like to correct your terminology. In an asyncio program, the execution unit is a Task, not a Process. The two words have very different meanings in Python and it is important to use the correct one.

    When you use a semaphore in a with statement, the block (which you say is "under" the semaphore) contains neither a Task nor a Process but just some code. The semaphore counts calls to its two methods acquire and release. acquire decrements the count and release increments it. The with statement insures that those calls are always paired - when you execute the with statement, it calls acquire; when you exit the with-block it calls release. The point of a Semaphore is that the count is never less than zero: if you call acquire when the count is already zero, it will suspend the task at the point. The task will not go forward until another task calls release. That's pretty much the whole story for a Semaphore. That's all it does.

    So in your case, if you have 6 tasks that all try to acquire the semaphore, five tasks will go through and begin to execute the code inside the with block. The sixth task will stop until one of the other five tasks leaves the with block.

    https://docs.python.org/3/library/asyncio-sync.html?highlight=asyncio%20semaphore#asyncio.Semaphore