python tensorflow deep-learning robotics

How to design realtime deeplearnig application for robotics using python?

I have created a machine learning software that detects objects(duh!), processes the objects based on some computer vision parameters and then triggers some hardware that puts the object in the respective bin. The objects are placed on a conveyer belt and a camera is mounted at a point to snap pictures of objects(one object at a time) when they pass beneath the camera. I don't have control over the speed of the belt.

Now, the challenge is that I have to configure a ton of things to make the machine work properly.

The first problem is the time model takes to create segmentation masks, it varies from one object to another.

Another issue is how do I maintain signals that are generated after computer vision processing, send them to actuators in a manner that it won't get misaligned with the computer vision-based inferencing.

My initial design includes creating processes responsible for a specific task and then make them communicate with one other as per the necessity. However, the problem of synchronization still persists.

As of now, I am thinking of treating the software stack as a group of services as we usually do in backend and make them communicate using something like celery and Redis queue.

I am a kind of noob in system design, come from a background of data science. I have explored python's multithreading module and found it unusable for my purpose(all threads run on single core). I am concerned if I used multiprocessing, there could be additional delays in individual processes due to messaging and thus, that would add another uncertainty to the program.

Additional Details:

Programming Frameworks and Library: Tensorflow, OpenCV and python
Camera Resolution: 1920P
Maximum Accutuation Speed: 3 triggers/second
Deep Learning Models: MaskRCNN/UNet

P.S: You can also comment on the technologies or the keywords I should search for because a vanilla search yields nothing good.

Solution

Let me summarize everything first.

What you want to do
1. The "object" is on the conveyer belt
2. The camera will take pictures of the object
3. MaskRCNN will run to do the analyzing
Here are some problems you're facing
1. "The first problem is the time model takes to create segmentation masks, it varies from one object to another."
-> if you want to reduce the processing time for each image, then an accelerator (FPGA, Chip, etc) or some acceleration technique is needed. Intel OpenVino and Intel DL stick is a good start.

-> if there are too many pictures to process then you'll have 2 choices: 1) put a lot of machines so all the job can be done or 2) select only the important job and discard others. The fact that you set the "Maximum Accutuation" to a fixed number (3/sec) made me think that this is the problem you're facing. A background subtractor is a good start for creating images capture triggers.
1. "Another issue is how do I maintain signals that are generated after computer vision processing, send them to actuators in a manner that it won't get misaligned with the computer vision-based inferencing."
-> a "job distributor" like Celery is good choice here. If the message is stacked inside the broker (Redis), then some tasks will have to wait. But this can easily by scaling up your computer.

Just a few advice here:

a vision system also includes the hardware parts, so a hardware specification is a must.
Clarify the requirements
Impossible things do exist, so sometimes you could reduce some factors (reliable, cost) of your project.