I'm writing a program for acquiring data from a sCMOS (scientific CMOS) camera. Since the plan is to acquire at high framerates, I would like to save to disk while I acquire, thus increasing the total amount of time I can record before ending without memory.
Is there a way to continuously save to the same file in binary format? Ideally excluding the option of making one file per frame.
After tinkering for a while I found a solution to this problem by using the multithreading module. The idea is to have two processes running, the main one acquiring data and the worker one saving continuously to disk. To achieve it, you need to define a Queue, that will share data in a safe way between the processes. Once a frame is saved, it will liberate the memory. It is important to use multiprocessing and not threading. Multiprocessing really separates the process into a different Python interpreter. Threading uses the same interpreter. Therefore if one of your processes sucks 100% of the core in which you are running your script, things will halt. In my application this is crucial, since it alters the framerate significantly.
Beware: I'm using h5py to save the files in HDF5 format, but you can easily adapt the code for saving to a plain textfile, using numpy, etc.
First I define the worker function that will be later send to a different process. The input are the file where to save data and the queue with the data. The infinite loop is because I don't the function to exit before I decide, even if the queue is empty. The exit flag is just a string passed to the queue.
import h5py
from multiprocessing import Process, Queue
def workerSaver(fileData,q):
"""Function that can be run in a separate thread for continuously save data to disk.
fileData -- STRING with the path to the file to use.
q -- Queue that will store all the images to be saved to disk.
"""
f = h5py.File(fileData, "w") # This will overwrite the file. Be sure to supply a new file path.
allocate = 100 # Number of frames to allocate along the z-axis.
keep_saving = True # Flag that will stop the worker function if running in a separate thread.
# Has to be submitted via the queue a string 'exit'
i=0
while keep_saving:
while not q.empty():
img = q.get()
if i == 0: # First time it runs, creates the dataset
x = img.shape[0]
y = img.shape[1]
dset = f.create_dataset('image', (x,y,allocate), maxshape=(x,y,None)) # The images are going to be stacked along the z-axis.
# The shape along the z axis will be increased as the number of images increase.
if type(img)==type('exit'):
keep_saving = False
else:
if i == dset.shape[2]:
dset.resize(i+allocate,axis=2)
dset[:,:,i] = img
i+=1
f.close()
And now, the important part of the code, where we define the behavior of the worker.
import numpy as np
import time
fileData = 'path-to-file.dat'
# Queue of images. multiprocessing takes care of handling the data in and out
# and the sharing between parent and child processes.
q = Queue(0)
# Child process to save the data. It runs continuously until an exit flag
# is passed through the Queue. (q.put('exit'))
p = Process(target=workerSaver,args=(fileData,q,))
p.start()
example_image = np.ones((50,50))
for i in range(10000):
q.put(example_image)
print(q.qsize())
time.sleep(0.01) # Sleep 10ms
q.put('Exit') # Any string would work
p.join()
Check that the process p
is started and will be running before we start filling the queue q
. For sure there are cleverer ways to store data (for example in chunks and not every single image), but I have check and the disk is at its full writing speed, so I'm not sure if there's an improvement on that side. Knowing exactly the type of data we are going to save also helps speed things up, specially with HDF5 (it's not the same storing 8bit integer than 32bit)