Search code examples
gevent

Why gevent needs synchronization since it is in a single threaded


From the gevent docs:

The greenlets all run in the same OS thread and are scheduled cooperatively.

Then is it still necessary to use gevent lock primitives or gevent.Queue to avoid race conditions among multiple greenlets in a single thread? An example to demonstrate such a race condition would be very appreciated. From my own understanding, those synchronization primitives seems to be just a way to switch execution flow among greentlets.


Solution

  • Yes, in general, it is still necessary to use locking and synchronization constructs in gevent.

    Locking and synchronization constructs, under both threads and gevent, such as RLock, Semaphore and Queue are there to ensure that the program state is internally consistent by guarding access to critical data or critical code sections (in effect, pretending that such a section or piece of data is running all by itself).

    The difference between greenlets and threading is that while a thread context change could happen theoretically at any time completely out of your control, a greenlet context change can only happen at specific defined moments, and so theoretically, if you are very careful in your programming and have complete control over how the critical section or data is used, you can avoid switches entirely and eliminate the need for locks. Sometimes this is easy to do, sometimes it is not, depending on the program. In gevent, when IO, time.sleep(), etc, can all cause switches, if there is much code complexity at all it can be difficult to be entirely sure that there will be no switches, so the standard rules about synchronization and locking are best.

    Here's an example. Lets say we want to write some messages (structured data) to a file or file-like object. Lets imagine that the messages are put together in a streaming fashion, one chunk at a time, but the recipient needs to be able to read the message together in one piece---interspersing chunks of two different messages results in a garbled mess.

    def generate_data(chunks): 
      # This task generates the data that makes up a message
      # in chunks.
      # Imagine that each chunk takes some time to generate.
      # Maybe we're pulling data from a database.
      for chunk in chunks:
         yield chunk
    
    def worker_one(file):
      file.write("begin message")
      for chunk in generate_data('abcde'):
         file.write(chunk)
      file.write("end message")
    
    def worker_two(file):
      file.write("begin message")
      for chunk in generate_data('123456'):
         file.write(chunk)
      file.write("end message")
    
    output_file = get_output_file()
    
    workers = [gevent.spawn(worker_one, output_file),
               gevent.spawn(worken_two, output_file)]
    
    gevent.joinall(workers)
    

    If get_output_file simply returns open('/some/file'), this will work fine: using a regular file object does not cooperate with the gevent loop, and so each worker will run to completion without ever yielding and the messages will be intact.

    However, if it returned socket.create_connection(("some.host", 80)).makefile(), this would fail and the messages would be fragmented. Each write to the socket from one worker could let the greenlet yield and the other greenlet run, resulting in garbled data.

    If generate_data were more complex, maybe communicating with a server or database over a gevent socket, then even if we were writing to a file the messages could be garbled because the greenlets switched while in the process of generating data.

    This is an example of why shared state (in this case the socket) may need to be protected with synchronization constructs.