Search code examples
linuxkernelebpfxdp-bpfxdp-pdf

XDP_TX don't work for veth for the simplest L2 forwarding


I created ns3 as the router namespace. I created ns1 and ns2 as the clients. ns3 and ns1 have peer veth3_1, veth1_3. ns3 and ns2 have peer veth3_2, veth2_3. A UDP packet from ns1 to ns2 is received in the XDP program deployed in veth2_3.

SEC("xdp_ingress")
int xdp_ingress_func(struct xdp_md* ctx) {
    void* data_end = (void*)(long)ctx->data_end;
    void* data = (void*)(long)ctx->data;
    struct ethhdr* eth = data;
    if ((void*)(eth + 1) > data_end) {
        return XDP_PASS;
    }
    if (eth->h_proto != __builtin_bswap16(ETH_P_IP)) {
        return XDP_PASS;
    }
    char tmp_mac[6];
    __builtin_memcpy(tmp_mac, eth->h_dest, ETH_ALEN);
    __builtin_memcpy(eth->h_dest, eth->h_source, ETH_ALEN);
    __builtin_memcpy(eth->h_source, tmp_mac, ETH_ALEN);
    return XDP_TX;
}

However, I can only observe the packet to ns2 when using tcpdump on interface veth3_2. But I can't observe the packet forwarding back. Here is the setup shell for environment:

ip link add veth1_3 type veth peer name veth3_1
ip link add veth2_3 type veth peer name veth3_2

# ns3
ip link set veth3_1 netns ns5
ip link set veth3_2 netns ns5
ip netns exec ns3 sysctl -w net.ipv4.ip_forward=1
ip netns exec ns3 ip link add name br0 type bridge
ip netns exec ns3 ip link set br0 up
ip netns exec ns3 ip link set veth3_1 master br0
ip netns exec ns3 ip link set veth3_2 master br0
ip netns exec ns3 ip link set veth3_1 up
ip netns exec ns3 ip link set veth3_2 up
ip netns exec ns3 ip addr add 10.0.0.1/8 dev br0

# ns1
ip link set veth1_3 netns ns1
ip netns exec ns1 ip addr add 10.0.0.2/8 dev veth1_3
ip netns exec ns1 ip link set veth1_3 up
ip netns exec ns1 ip link set lo up
ip netns exec ns1 ip route add default via 10.0.0.1

# ns2
ip link set veth2_3 netns ns2
ip netns exec ns2 ip addr add 10.0.0.3/8 dev veth2_3
ip netns exec ns2 ip link set veth2_3 up
ip netns exec ns2 ip link set lo up
ip netns exec ns2 ip route add default via 10.0.0.1

The above is the simplified problem.

Actually, at veth1_3's tc egress, I push another IP and UDP header before the original L3 header. Like |IP|TCP| to |IP|UDP|IP|TCP|.

I've checked the L2 address is correct.

I checked in bpf_printk that the packet is received by the XDP program, and executed to the return XDP_TX;.


Solution

  • I found that in the implementation of veth about XDP. You should deploy an at least an XDP programs that does nothing but return XDP_PASS;, then you can use XDP_TX normally.

    It is because, XDP_TX transmits data with xdp_ring, but the xdp_ring only works when both sides of veth applied a XDP program.

    I noticed that in the older version of veth driver, there is a fallback when the peer is not using XDP. It was tramitting packets in a normal way. But it was removed in current version of kernel.