FFmpeg and video4linux2 parameters - how to capture still images faster?

Problem Summary

I have built an 18-camera array of USB webcams, attached to a Raspberry Pi 400 as the controller. My Python 3.8 code for capturing an image from each webcam is slow, and I am trying to find ways to speed it up.

The FFMPEG and video4linux2 command line options are confusing to me, so I'm not sure if the delays are due to my poor choice of parameters, and a better set of options would solve the problem.

The Goal

I am trying to capture one image from each camera as quickly as possible.

I am using FFMPEG and video4linux2 command line options to capture each image within a loop of all the cameras as shown below.

Expected results

I just want a single frame from each camera. The frame rate is 30 fps, so I was expecting that capture time would be on the order of 1/30th to 1/10th of a second worst case. But the performance timer is telling me that each capture is taking 2-3 seconds.

Additionally, I don't really understand the ffmpeg output, but this output worries me:

frame=    0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    1 fps=0.5 q=8.3 Lsize=N/A time=00:00:00.06 bitrate=N/A speed=0.0318x    
video:149kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing

I don't understand why the "frame=" line is repeated 4 times. And in the 4th repitition, the fps says 0.5, which I would interpret as one frame every 2 seconds not the 30FPS that I specified.

Specific Questions:

Can anyone explain to me what this ffmpeg output means, and why it is taking 2 seconds per image captured, and not closer to 1/30th of a second?

Can anyone explain to me how to capture the images in less time per capture?

should I be spawning a separate thread for each ffmpeg call, so they run asynchronously, instead of serially? Or would that not really save time in practice?

Actual results

  Input #0, video4linux2,v4l2, from '/dev/video0':
  Duration: N/A, start: 6004.168748, bitrate: N/A
    Stream #0:0: Video: mjpeg, yuvj422p(pc, bt470bg/unknown/unknown), 1920x1080, 30 fps, 30 tbr, 1000k tbn, 1000k tbc
Stream mapping:
  Stream #0:0 -> #0:0 (mjpeg (native) -> mjpeg (native))
Press [q] to stop, [?] for help
Output #0, image2, to '/tmp/video1.jpg':
  Metadata:
    encoder         : Lavf58.20.100
    Stream #0:0: Video: mjpeg, yuvj422p(pc), 1920x1080, q=2-31, 200 kb/s, 30 fps, 30 tbn, 30 tbc
    Metadata:
      encoder         : Lavc58.35.100 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
frame=    0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    1 fps=0.5 q=8.3 Lsize=N/A time=00:00:00.06 bitrate=N/A speed=0.0318x    
video:149kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Captured /dev/video0 image in: 3 seconds
Input #0, video4linux2,v4l2, from '/dev/video2':
  Duration: N/A, start: 6007.240871, bitrate: N/A
    Stream #0:0: Video: mjpeg, yuvj422p(pc, bt470bg/unknown/unknown), 1920x1080, 30 fps, 30 tbr, 1000k tbn, 1000k tbc
Stream mapping:
  Stream #0:0 -> #0:0 (mjpeg (native) -> mjpeg (native))
Press [q] to stop, [?] for help
Output #0, image2, to '/tmp/video2.jpg':
  Metadata:
    encoder         : Lavf58.20.100
    Stream #0:0: Video: mjpeg, yuvj422p(pc), 1920x1080, q=2-31, 200 kb/s, 30 fps, 30 tbn, 30 tbc
    Metadata:
      encoder         : Lavc58.35.100 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1
frame=    0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed=   0x    
frame=    1 fps=0.5 q=8.3 Lsize=N/A time=00:00:00.06 bitrate=N/A speed=0.0318x    
video:133kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Captured /dev/video2 image in: 3 seconds
...

The code:

list_of_camera_ids = ["/dev/video1","/dev/video2", "/dev/video3", "/dev/video4",
                      "/dev/video5","/dev/video6", "/dev/video7", "/dev/video8",
                      "/dev/video9","/dev/video10", "/dev/video11", "/dev/video12",
                      "/dev/video13","/dev/video14", "/dev/video15", "/dev/video16",
                      "/dev/video17","/dev/video18"
                     ]
for this_camera_id in list_of_camera_ids:
    full_image_file_name = '/tmp/' + os.path.basename(this_camera_id) + 'jpg'
    image_capture_tic = time.perf_counter()
    
    run_cmd = subprocess.run([
                              '/usr/bin/ffmpeg', '-y', '-hide_banner',
                              '-f', 'video4linux2',
                              '-input_format',  'mjpeg',
                              '-framerate', '30',
                              '-i', this_camera_id,
                              '-frames', '1',
                              '-f', 'image2',
                              full_image_file_name
                             ],
                             universal_newlines=True,
                             stdout=subprocess.PIPE,
                             stderr=subprocess.PIPE
                            )  
         print(run_cmd.stderr)
         image_capture_toc = time.perf_counter()       
         print(f"Captured {camera_id} image in: {image_capture_toc - image_capture_tic:0.0f} seconds")

ADDITIONAL DATA: In response to an answer by Mark Setchell that said more information is needed to answer this question, I now elaborate the requested information here:

cameras: Cameras are USB-3 cameras that identify themselves as:

idVendor           0x0bda Realtek Semiconductor Corp.
idProduct          0x5829

I tried to add the lengthy lsusb dump for one of the cameras but then this post exceeds the 30000 character limit

How the cameras are attached: USB 3 port of Pi to a master USB-3 7-port hub, with 3 spur 7 port hubs (not all ports in the spur hubs are occupied).

Camera resolution: HD Format 1920x1080

Why am I setting a frame rate if I only want 1 image?

I set a frame rate which seems odd given that that specifies the time between frames, but you only want a single frame. I did that because I don't know how to get a single image from FFMPEG. This was the one example of FFMPEG command options that I found discussed on the web that I could get to capture a single image successfully. I've discovered innumerable sets of options that don't work! I wrote this post because my web searches did not yield an example that works for me. I am hoping that someone much better informed than I am will show me a way that works!

Why am I scanning the cameras sequentially rather than in parallel?

I did this just to keep things simple first and a loop over the list seemed easy and pythonic. It was clear to me that I might later be able to spawn a separate thread for each FFMPEG call, and maybe get a parallel speed up that way. Indeed, I would welcome an example of how to do that.

But in any case the single image capture taking 3 seconds seems way too long anyway.

Why am I only using a single 1 of the 4 cores on your Raspberry Pi?

The sample code I posted is just a snippet from my entire program. Image capturing takes place in a child thread at present, while a Window GUI with an event loop is running in the main thread, so that user input isn't blocked during imaging.

I am not knowledgeable enough about the cores of the Raspberry Pi 400, nor about how the Raspberry Pi OS (aka Raspbian) manages allocation of threads to cores, nor whether Python can or should be explicitly directing threads to be running in specific cores.
I would welcome the suggestions of Mark Setchell (or anyone else knowledgeable about these issues) to recommend a best practice and include example code.

Solution

First off, thanks to https://stackoverflow.com/users/1109017/llogan who provided me the clue I needed in the comments below.

I am recording this solution here for easy discovery by others who might not read comments.

Here is my revised program:

list_of_camera_ids = ["/dev/video1","/dev/video2", "/dev/video3", "/dev/video4",
                      "/dev/video5","/dev/video6", "/dev/video7", "/dev/video8",
                      "/dev/video9","/dev/video10", "/dev/video11", "/dev/video12",
                      "/dev/video13","/dev/video14", "/dev/video15", "/dev/video16",
                      "/dev/video17","/dev/video18"
                     ]
for this_camera_id in list_of_camera_ids:
    full_image_file_name = '/tmp/' + os.path.basename(this_camera_id) + 'jpg'
    image_capture_tic = time.perf_counter()
    
    run_cmd = subprocess.run([
                              'v4l2-ctl','-d',
                              this_camera_id,
                              '--stream-mmap', 
                              '--stream-count=1',
                              '--stream-to=' +
                              full_image_file_name,"&"
                             ],
                             universal_newlines=True,
                             stdout=subprocess.PIPE,
                             stderr=subprocess.PIPE
                            )  
         print(run_cmd.stderr)
         image_capture_toc = time.perf_counter()       
         print(f"Captured {camera_id} image in: {image_capture_toc - image_capture_tic:0.0f} seconds")

Additional notes: This code is a substantial speed up!

With my previous method, each image took 3-4 seconds each to capture. In the serialized loop as shown in the original post, 18 images would typically take between 45 and 60 seconds to complete.

With my modified code, using llogan's suggestion, capture time is now less than 1 second per camera. Furthermore, simply by spawning each one in background by appending a "&" to the command, they are automatically running in parallel, and total time for 18 cameras is now about 10 seconds so the average time per camera is now about .55 seconds on the Raspberry Pi 400.

I suspect that I might be incurring some extra overhead by spawning off processes using the trivial "&" method for parallelization. Perhaps some of that could be reduced further if I could just spawn threads instead of full blown processes. But that's a level of performance tuning that I don't have experience with yet.