Search code examples
multiprocessinghardwaredpdktimestampingmellanox

DPDK Multi-process HW Timestamping issue


I am trying to get HW timestamping work in a dpdk multi-process environment. To make HW timestamping work, i referenced the rxtx_callbacks example. It uses rte_eth_read_clock(). In the description of this function it says:

E.g, a simple heuristic to derivate the frequency would be: 
uint64_t start, end; rte_eth_read_clock(port, start); 
rte_delay_ms(100); 
rte_eth_read_clock(port, end); 
double freq = (end - start) * 10;

Compute a common reference with: 
uint64_t base_time_sec = current_time(); 
uint64_t base_clock; rte_eth_read_clock(port, base_clock);

Then, convert the raw mbuf timestamp with: 
base_time_sec + (double)(*timestamp_dynfield(mbuf) - base_clock) / freq;

I follow this approach, I get it working for single process. Then I modify the symmetric multi process example, so that each instance calculates these params (freq, base_time, base_clock). You can find it here Apparantely the secondary process cannot call rte_eth_read_clock(), gives this error:

EAL: Detected CPU lcores: 128
EAL: Detected NUMA nodes: 2
EAL: Auto-detected process type: SECONDARY
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_2429753_54915832f7e13
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:ca:00.0 (socket 1)
EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:ca:00.1 (socket 1)
APP: Finished Process Init.
MLNX_DPDK: Cannot read device clock, err: Unknown error -95
EAL: Error - exiting with code: 1
  Cause: Cannot configure dpdk system
EAL: failed to send to (/var/run/dpdk/rte/mp_socket) due to Bad file descriptor
EAL: Cannot send message to primary

I then change it so that primary process shares the nic_freq, base_time, base_clock with secondary process. I then calculate the actual timestamp for each packet like as mentioned,

pkt_timestamp_ns = base_time_ns + (double)(*hwts_field(bufs[i]) - base_nic_clock) / nic_freq;

here, hwts_field is just like it is in rxtx_callbacks

static inline rte_mbuf_timestamp_t *
hwts_field(struct rte_mbuf *mbuf)
{
    return RTE_MBUF_DYNFIELD(mbuf, hwts_dynfield_offset, rte_mbuf_timestamp_t *);
}

Primary process is able to get correct timestamps. But Secondary process gets wrong value from hwts_field(). Here's the output for secondary process

Received 1 packets on port 1 | q 0 in this burst
ts: 18446744073709551615
rx_ts: 1617275617280, base_clock: 533084928073076
Skipping
Received 1 packets on port 1 | q 0 in this burst
ts: 18446744073709551615
rx_ts: 1617275011072, base_clock: 533084928073076
Skipping
Received 1 packets on port 1 | q 0 in this burst
ts: 18446744073709551615
rx_ts: 1617274404864, base_clock: 533084928073076
Skipping
Received 1 packets on port 0 | q 0 in this burst
ts: 18446744073709551615
rx_ts: 1616080175104, base_clock: 533084928073076
Skipping
Received 1 packets on port 1 | q 0 in this burst
ts: 18446744073709551615
rx_ts: 1617273798656, base_clock: 533084928073076
Skipping
Received 1 packets on port 1 | q 0 in this burst
ts: 18446744073709551615
rx_ts: 1617273192448, base_clock: 533084928073076
Skipping
ts: calculated timestamp in ns
rx_ts: clock cycle at packet rx
base_clock: base clock cycle calculated at primary proc start.

The rx_ts is lower than base_clock! and ts equals 2^64-1

So, my two questions would be

  1. Is there a better/different way to get packet timestamps in ns?
  2. Why am I getting a lower clock value after I have read the base clock cycle?

Solution

  • UPDATE

    In order for the timestamp field to be accessible in the secondary process, one should revise the code the way that rte_mbuf_dynfield_lookup gets invoked both in primary and secondary processes:

    for (i = 0; i < num_ports; i++)
    {
        if (proc_type == RTE_PROC_PRIMARY) {
            if (smp_port_init(ports[i], mp, (uint16_t)num_procs) < 0)
                rte_exit(EXIT_FAILURE, "Error initialising ports\n");
        }
    
        /*
         * Now this is not hidden inside "smp_port_init" and
         * occurs both in primary and secondary processes.
         */
        hwts_dynfield_offset =
            rte_mbuf_dynfield_lookup(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
                                     NULL);
    }
    

    ORIGINAL ANSWER

    The timestamps being UINT64_MAX is likely due to misuse of rte_mbuf_dyn_rx_timestamp_register in the provided example program. As it follows from the source code, the program registers a new instance of the timestamp field, whereas the ethdev driver in question registers such of its own, — apparently, the driver in question will use its own dynfield offset and not the one of the application's when setting mbuf timestamps.

    In order for the issue to be fixed, the example program should be revised as follows:

    • drop rte_mbuf_dyn_rx_timestamp_register invocation from the code
    • preserve the existing RTE_ETH_RX_OFFLOAD_TIMESTAMP request
    • introduce TS field lookup somewhere after rte_eth_dev_start:
    hwts_dynfield_offset =
            rte_mbuf_dynfield_lookup(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
    

    For an additional example, one may refer to the use of TS field lookup in test-pmd.

    As for the rte_eth_read_clock failure in the secondary process, the error code suggests the operation probably being unsupported. Nevertheless, the driver in question does provide the necessary callback in secondary mode. Perhaps it pays to insert debug printouts both to rte_eth_read_clock implementation and to the driver part in order to figure out where the issue comes from.

    Also, one may try to enable device-specific tx_pp parameter when attaching the device(s) to EAL:

    -a 0000:ca:00.0,tx_pp=100000 -a 0000:ca:00.1,tx_pp=100000