Search code examples
pmddpdkdpdk-pmd

DPDK Multi-process; Kill a primary process and restart as a secondary doesn't work


I'm already running up to 4 DPDK processes next to each other without any issues and I can also restart secondary processes successfully. I read here in end of the symmetric multi-process section, that you can destroy the primary process and restart it as a secondary.

But when I'm trying to restart the primary process, I run into some problems.

For example: Running 2 processes. Each of them will stream data from its own dedicated port to the 0. queue of the virtual function. The goal is now to restart the first process as secondary.

After the init of the EAL , mbufs, and rings, I call rte_eal_remote_launch() for each process with its own dedicated lcore which launches a function that does some packet processing.

Start primary:

$ sudo mp_dpdk_app -l 0-4 -n 2 --proc-type=primary -- -p 3 --num-procs=2 --proc-id=0

Output: 

EAL init start.
EAL: Detected CPU lcores: 64
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 1048576 kB hugepages reported
EAL: VFIO support initialized
EAL: Using IOMMU type 8 (No-IOMMU)
EAL: Probe PCI driver: net_ixgbe_vf (8086:10ed) device: 0000:19:10.0 (socket 0)
EAL: Probe PCI driver: net_ixgbe_vf (8086:10ed) device: 0000:19:10.1 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
EAL Process Type: PRIMARY

Start the secondary:

$ sudo mp_dpdk_app -l 0-4 -n 2 --proc-type=secondary -- -p 3 --num-procs=2 --proc-id=1

Output:

EAL init start.
EAL: Detected CPU lcores: 64
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_13330_2fd6664d78de
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Using IOMMU type 8 (No-IOMMU)
EAL: Probe PCI driver: net_ixgbe_vf (8086:10ed) device: 0000:19:10.0 (socket 0)
eth_ixgbevf_dev_init(): No TX queues configured yet. Using default TX function.
EAL: Probe PCI driver: net_ixgbe_vf (8086:10ed) device: 0000:19:10.1 (socket 0)
eth_ixgbevf_dev_init(): No TX queues configured yet. Using default TX function.
EAL Process Type: SECONDARY

Kill primary and restart with:

$ sudo mp_dpdk_app -l 0-4 -n 2 --proc-type=secondary -- -p 3 --num-procs=2 --proc-id=0

But the init fails with the following output: 

EAL init start.
EAL: Detected CPU lcores: 64
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_13473_2fda4aa02c52
EAL: failed to send to (/var/run/dpdk/rte/mp_socket) due to Connection refused
EAL: Fail to send request /var/run/dpdk/rte/mp_socket:bus_vdev_mp
vdev_scan(): Failed to request vdev from primary
EAL: Selected IOVA mode 'PA'
EAL: failed to send to (/var/run/dpdk/rte/mp_socket) due to Connection refused
EAL: Fail to send request /var/run/dpdk/rte/mp_socket:eal_vfio_mp_sync
EAL: Cannot request default VFIO container fd
EAL: VFIO support could not be initialized
EAL: Requested device 0000:19:10.0 cannot be used
EAL: Requested device 0000:19:10.1 cannot be used
EAL: Error - exiting with code: 1
Cause: :: no Ethernet ports found

I noticed that a new mp socket is created (mp_socket_13473_2fda4aa02c52).

But somehow the EAL tries then to connect to the rte/mp_socket, which was created by the primary process at the beginning and don't use the new one. If I exit the primary process with rte_eal_cleanup() , the /rte/mp_socket is removed and I still can't start a new secondary process due to the error /rte/mp_process does not exist

My hardware setup:

Network devices using DPDK-compatible driver
============================================
0000:19:10.0 '82599 Ethernet Controller Virtual Function 10ed' drv=vfio-pci unused=ixgbevf
0000:19:10.1 '82599 Ethernet Controller Virtual Function 10ed' drv=vfio-pci unused=ixgbevf

The processes don't have to communicate in-between each other. Can anybody give me a clue about this issue? Any help will be appreciated.


Solution

  • The short answer to the question In DPDK multiprocess scenario (Primary & 1 or multiple secondary processes), if the DPDK primary process is stopped|killed can restarting the process allows it to communicate with the secondary process? is No, it can not.

    let me explain this for clarity below

    1. Please make use of DPDK documentation on multiprocess, which clarifies it is primary process is one which initializes the huge pages and creates the MP_HANDLE to communicate with the secondary process. In contrast, secondary or multiple secondaries attach with primary using MP_HANDLE.
    2. Primary process makes use of PCI, virtual, Hugepage, cores and creates unique file-id in the default path as /var/run/dpdk/{file-prefix}-appname.
    3. there ae configuration or conf settings saved in this path, which helps the secondary to make use of the hugepage address to MMAP in shared mode.
    4. So when a primary comes up with all the required resource secondary dpdk process make uses pre-initialized environment to run certain subset of DPDK API

    enter image description here

    So when a primary process is stopped|killed, the resource mapping and configuration are not cleaned by default, which allows the existing secondary process to continue working. In case of EAL-ARGS --no-shconf immediate clean up of the folder /var/run/dpdk/{file-prefix}-dpdk is triggered (also does not support secondary process). Thereby starting|restarting a new primary will fail, as the PCI resources, hugepages, cores and file path (/var/run/dpdk/{file-prefix}-appname) already is been used.

    hence the expectation of restarting primary can it connect to running secondary is not true. There are internal check rte_eal_init routine with the standard DPDK releases.