Search code examples
pythonmultithreadingnumpymatplotlib

Thread safeness and slow pyplot.hist


The hist function of matplotlib.pyplot runs very slow which seemingly has to do with the structure I have chosen. I built a front panel in Tkinter which starts a control loop for a camera. To keep the control loop responsive I created an ImageProcessor class which collects, processes and plots the images in cv2. The ImageProcessor object is running in its own thread. This works up to the point where I try to plot the histogram of the image.

Since Tkinter is not thread safe I use Agg as a backend and plot the drawn canvas of the pyplot.figure with cv2. Calculating the histogram of the image using pyplot.hist takes more than 20 seconds. Calculating the histogram on its own it takes only 0.5 seconds.

How does this manifest? Does Matplotlib have to be run from the main thread or is it sufficient if there is only a single thread interacting with it (as in my case)? Or is there another misunderstanding in my code?

import threading
import time
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from timeit import default_timer as timer
from datetime import timedelta
import queue


class ImageProcessor(threading.Thread):
    def __init__(self):
        matplotlib.use('Agg')
        threading.Thread.__init__(self)

        # initialize plot for histograms
        self.hist_fig = plt.figure()

        self.loop = True
        self.continuous_acquisition_var = False
        self.a = None

    def run(self):

        while self.loop:

            self.a = np.random.uniform(low=0, high=16384, size=12320768).reshape((4096, 3008))

            self.hist_fig.clf()  # clear histogram plot

            start = timer()
            plt.hist(self.a.flatten(), bins=256, range=(0.0, 16384), fc='r', ec='r')
            end = timer()
            print(timedelta(seconds=end - start))

    def stop(self):
        self.loop = False


def ctl_loop(command):
    ctl_loop_var = True

    img_proc = ImageProcessor()
    img_proc.daemon = True
    img_proc.start()

    while ctl_loop_var:  # main loop

        while not command.empty():
            q_element = command.get()
            task = q_element[0]
            data = q_element[1]

            func = getattr(img_proc, task)
            func(data)

            if task == "stop":
                ctl_loop_var = False


if __name__ == '__main__':
    cmd_queue = queue.Queue()

    ctl = threading.Thread(target=ctl_loop, args=(cmd_queue, ))
    ctl.daemon = True
    ctl.start()

    time.sleep(40)

    cmd_queue.put(('stop', ''))

Solution

  • The solution is straightforward and has nothing to do with plt.hist. Simply add the line time.sleep(0.01) in your main loop. The reason is that threading is not the same as multiprocessing. All threads share the same process (CPU), meaning only one thread can run at a time. In your case, the main thread (the while ctl_loop_var loop) checks as quickly as possible if ctl_loop_var is still True, preventing the other thread from doing anything. Therefore, ensure you are not creating unnecessary CPU load. This applies to multiprocessing as well, though the impact may be less noticeable.

    def ctl_loop(command):
        ctl_loop_var = True
    
        img_proc = ImageProcessor()
        img_proc.daemon = True
        img_proc.start()
    
        while ctl_loop_var:  # main loop
    
            while not command.empty():
                q_element = command.get()
                task = q_element[0]
                data = q_element[1]
    
                if task == "stop":
                    img_proc.stop()
                    ctl_loop_var = False
    
                else:
                    func = getattr(img_proc, task)
                    func(data)
        
            time.sleep(.01)  # give the other thread time to process
    

    Furthermore, the code also fixes two bugs in the original code:

    1. ImageProcessor.stop takes only one argument
    2. ImageProcessor.stop wasn't stopping the thread properly when the main thread is overloaded.

    What I also observed is, your implementation with plt.hist(..., bins=256, range=(0.0, 16384)) is about 7 times faster than plt.hist(..., bins=list_of_bins)! Don't change it ;).