Search code examples
pythonpython-3.xpandasmultiprocessinginterprocess

1 producer, 1 consumer, only 1 piece of data to communicate, is queue an overkill?


This question is related to Python Multiprocessing. I am asking for a suitable interprocess communication data-structure for my specific scenario:

My scenario

I have one producer and one consumer.

  1. The producer produces a single fairly small panda Dataframe every 10-ish secs, then the producer puts it on a python.multiprocess.queue.
  2. The consumer is a GUI polling that python.multiprocess.queue every 100ms. It is VERY CRITICAL that the consumer catches every single DataFrame the producer produces.

My thinking

python.multiprocess.queue is serving the purpose (I think), and amazingly simple to use! (praise the green slithereen lord!). But I am clearly not utilizing queue's full potential with only one producer one consumer and a max of one item on the queue. That leads me to believe that there is simpler thing than queue. I tried to search for it, I got overwhelmed by options listed in: python 3.5 documentation: 18. Interprocess Communication and Networking. I am also suspecting there may be a way not involving interprocess communication data-structure at all for my need.

Please Note

  1. Performance is not very important
  2. I will stick with multiprocessing for now, instead of multithreading.

My Question

Should I be content with queue? or is there a more recommended way? I am not a professional programmer, so I insist on doing things the tried and tested way. I also welcome any suggestions of alternative ways of approaching my problem.

Thanks


Solution

  • To me, the most important thing you mentioned is this:

    It is VERY CRITICAL that the consumer catches every single DataFrame the producer produces.

    So, let's suppose you used a variable to store the DataFrame. The producer would set it to the produced value, and the consumer would just read it. That would work very fine, I guess.

    But what would happen if somehow the consumer got blocked by more than one producing cycle? Then some old value would be overwritten before reading. And that's why I think a (thread-safe) queue is the way to go almost "by definition".

    Besides, beware of premature optimization. If it works for your case, excellent. If some day, for some other case, performance comes to be a problem, only then you should spend the extra work, IMO.