I have deployed OpenStack and configured OVS-DPDK on compute nodes for high-performance networking. My workload is a general-purpose workload like running haproxy
, mysql
, apache
, and XMPP
etc.
When I did load-testing, I found performance is average and after 200kpps packet rate I noticed packet drops. I heard and read DPDK can handle millions of packets but in my case, it's not true. In guest, I am using virtio-net
which processes packets in the kernel so I believe my bottleneck is my guest VM.
I don't have any guest-based DPDK application like testpmd
etc. Does that mean OVS+DPDK isn't useful for my cloud? How do I take advantage of OVS+DPDK with a general-purpose workload?
We have our own loadtesting tool which generate Audio RTP traffic which is pure UDP based 150bytes packets and noticed after 200kpps audio quality go down and choppy. In short DPDK host hit high PMD cpu usage and loadtest showing bad audio quality. when i do same test with SRIOV based VM then performance is really really good.
$ ovs-vswitchd -V
ovs-vswitchd (Open vSwitch) 2.13.3
DPDK 19.11.7
Intel NIC X550T
# ethtool -i ext0
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x80000d63, 18.8.9
expansion-rom-version:
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
In the following output what does these queue-id:0 to 8 and why only the first queue is in use but not others, they are always zero. What does this mean?
ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 2:
isolated : false
port: vhu1c3bf17a-01 queue-id: 0 (enabled) pmd usage: 0 %
port: vhu1c3bf17a-01 queue-id: 1 (enabled) pmd usage: 0 %
port: vhu6b7daba9-1a queue-id: 2 (disabled) pmd usage: 0 %
port: vhu6b7daba9-1a queue-id: 3 (disabled) pmd usage: 0 %
pmd thread numa_id 1 core_id 3:
isolated : false
pmd thread numa_id 0 core_id 22:
isolated : false
port: vhu1c3bf17a-01 queue-id: 3 (enabled) pmd usage: 0 %
port: vhu1c3bf17a-01 queue-id: 6 (enabled) pmd usage: 0 %
port: vhu6b7daba9-1a queue-id: 0 (enabled) pmd usage: 54 %
port: vhu6b7daba9-1a queue-id: 5 (disabled) pmd usage: 0 %
pmd thread numa_id 1 core_id 23:
isolated : false
port: dpdk1 queue-id: 0 (enabled) pmd usage: 3 %
pmd thread numa_id 0 core_id 26:
isolated : false
port: vhu1c3bf17a-01 queue-id: 2 (enabled) pmd usage: 0 %
port: vhu1c3bf17a-01 queue-id: 7 (enabled) pmd usage: 0 %
port: vhu6b7daba9-1a queue-id: 1 (disabled) pmd usage: 0 %
port: vhu6b7daba9-1a queue-id: 4 (disabled) pmd usage: 0 %
pmd thread numa_id 1 core_id 27:
isolated : false
pmd thread numa_id 0 core_id 46:
isolated : false
port: dpdk0 queue-id: 0 (enabled) pmd usage: 27 %
port: vhu1c3bf17a-01 queue-id: 4 (enabled) pmd usage: 0 %
port: vhu1c3bf17a-01 queue-id: 5 (enabled) pmd usage: 0 %
port: vhu6b7daba9-1a queue-id: 6 (disabled) pmd usage: 0 %
port: vhu6b7daba9-1a queue-id: 7 (disabled) pmd usage: 0 %
pmd thread numa_id 1 core_id 47:
isolated : false
$ ovs-appctl dpif-netdev/pmd-stats-clear && sleep 10 && ovs-appctl
dpif-netdev/pmd-stats-show | grep "processing cycles:"
processing cycles: 1697952 (0.01%)
processing cycles: 12726856558 (74.96%)
processing cycles: 4259431602 (19.40%)
processing cycles: 512666 (0.00%)
processing cycles: 6324848608 (37.81%)
Does processing cycles mean my PMD is under stress? but i am only hitting 200kpps rate?
This is my dpdk0 and dpdk1 port statistics
sudo ovs-vsctl get Interface dpdk0 statistics
{flow_director_filter_add_errors=153605,
flow_director_filter_remove_errors=30829, mac_local_errors=0,
mac_remote_errors=0, ovs_rx_qos_drops=0, ovs_tx_failure_drops=0,
ovs_tx_invalid_hwol_drops=0, ovs_tx_mtu_exceeded_drops=0,
ovs_tx_qos_drops=0, rx_128_to_255_packets=64338613,
rx_1_to_64_packets=367, rx_256_to_511_packets=116298,
rx_512_to_1023_packets=31264, rx_65_to_127_packets=6990079,
rx_broadcast_packets=0, rx_bytes=12124930385, rx_crc_errors=0,
rx_dropped=0, rx_errors=12, rx_fcoe_crc_errors=0, rx_fcoe_dropped=12,
rx_fcoe_mbuf_allocation_errors=0, rx_fragment_errors=367,
rx_illegal_byte_errors=0, rx_jabber_errors=0, rx_length_errors=0,
rx_mac_short_packet_dropped=128, rx_management_dropped=35741,
rx_management_packets=31264, rx_mbuf_allocation_errors=0,
rx_missed_errors=0, rx_oversize_errors=0, rx_packets=71512362,
rx_priority0_dropped=0, rx_priority0_mbuf_allocation_errors=1096,
rx_priority1_dropped=0, rx_priority1_mbuf_allocation_errors=0,
rx_priority2_dropped=0, rx_priority2_mbuf_allocation_errors=0,
rx_priority3_dropped=0, rx_priority3_mbuf_allocation_errors=0,
rx_priority4_dropped=0, rx_priority4_mbuf_allocation_errors=0,
rx_priority5_dropped=0, rx_priority5_mbuf_allocation_errors=0,
rx_priority6_dropped=0, rx_priority6_mbuf_allocation_errors=0,
rx_priority7_dropped=0, rx_priority7_mbuf_allocation_errors=0,
rx_undersize_errors=6990079, tx_128_to_255_packets=64273778,
tx_1_to_64_packets=128, tx_256_to_511_packets=43670294,
tx_512_to_1023_packets=153605, tx_65_to_127_packets=881272,
tx_broadcast_packets=10, tx_bytes=25935295292, tx_dropped=0,
tx_errors=0, tx_management_packets=0, tx_multicast_packets=153,
tx_packets=109009906}
stats
sudo ovs-vsctl get Interface dpdk1 statistics
{flow_director_filter_add_errors=126793,
flow_director_filter_remove_errors=37969, mac_local_errors=0,
mac_remote_errors=0, ovs_rx_qos_drops=0, ovs_tx_failure_drops=0,
ovs_tx_invalid_hwol_drops=0, ovs_tx_mtu_exceeded_drops=0,
ovs_tx_qos_drops=0, rx_128_to_255_packets=64435459,
rx_1_to_64_packets=107843, rx_256_to_511_packets=230,
rx_512_to_1023_packets=13, rx_65_to_127_packets=7049788,
rx_broadcast_packets=199058, rx_bytes=12024342488, rx_crc_errors=0,
rx_dropped=0, rx_errors=11, rx_fcoe_crc_errors=0, rx_fcoe_dropped=11,
rx_fcoe_mbuf_allocation_errors=0, rx_fragment_errors=107843,
rx_illegal_byte_errors=0, rx_jabber_errors=0, rx_length_errors=0,
rx_mac_short_packet_dropped=1906, rx_management_dropped=0,
rx_management_packets=13, rx_mbuf_allocation_errors=0,
rx_missed_errors=0, rx_oversize_errors=0, rx_packets=71593333,
rx_priority0_dropped=0, rx_priority0_mbuf_allocation_errors=1131,
rx_priority1_dropped=0, rx_priority1_mbuf_allocation_errors=0,
rx_priority2_dropped=0, rx_priority2_mbuf_allocation_errors=0,
rx_priority3_dropped=0, rx_priority3_mbuf_allocation_errors=0,
rx_priority4_dropped=0, rx_priority4_mbuf_allocation_errors=0,
rx_priority5_dropped=0, rx_priority5_mbuf_allocation_errors=0,
rx_priority6_dropped=0, rx_priority6_mbuf_allocation_errors=0,
rx_priority7_dropped=0, rx_priority7_mbuf_allocation_errors=0,
rx_undersize_errors=7049788, tx_128_to_255_packets=102664472,
tx_1_to_64_packets=1906, tx_256_to_511_packets=68008814,
tx_512_to_1023_packets=126793, tx_65_to_127_packets=1412435,
tx_broadcast_packets=1464, tx_bytes=40693963125, tx_dropped=0,
tx_errors=0, tx_management_packets=199058, tx_multicast_packets=146,
tx_packets=172252389}
dpdk interface
# dpdk-devbind.py -s
Network devices using DPDK-compatible driver
============================================
0000:3b:00.1 'Ethernet Controller 10G X550T 1563' drv=vfio-pci unused=ixgbe
0000:af:00.1 'Ethernet Controller 10G X550T 1563' drv=vfio-pci unused=ixgbe
Network devices using kernel driver
===================================
0000:04:00.0 'NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 165f' if=eno1 drv=tg3 unused=vfio-pci
0000:04:00.1 'NetXtreme BCM5720 2-port Gigabit Ethernet PCIe 165f' if=eno2 drv=tg3 unused=vfio-pci
0000:3b:00.0 'Ethernet Controller 10G X550T 1563' if=int0 drv=ixgbe unused=vfio-pci
0000:af:00.0 'Ethernet Controller 10G X550T 1563' if=int1 drv=ixgbe unused=vfio-pci
OVS
# ovs-vsctl show
595103ef-55a1-4f71-b299-a14942965e75
Manager "ptcp:6640:127.0.0.1"
is_connected: true
Bridge br-tun
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
datapath_type: netdev
Port br-tun
Interface br-tun
type: internal
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port vxlan-0a48042b
Interface vxlan-0a48042b
type: vxlan
options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="10.72.4.44", out_key=flow, remote_ip="10.72.4.43"}
Port vxlan-0a480429
Interface vxlan-0a480429
type: vxlan
options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="10.72.4.44", out_key=flow, remote_ip="10.72.4.41"}
Port vxlan-0a48041f
Interface vxlan-0a48041f
type: vxlan
options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="10.72.4.44", out_key=flow, remote_ip="10.72.4.31"}
Port vxlan-0a48042a
Interface vxlan-0a48042a
type: vxlan
options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="10.72.4.44", out_key=flow, remote_ip="10.72.4.42"}
Bridge br-vlan
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
datapath_type: netdev
Port br-vlan
Interface br-vlan
type: internal
Port dpdkbond
Interface dpdk1
type: dpdk
options: {dpdk-devargs="0000:af:00.1", n_txq_desc="2048"}
Interface dpdk0
type: dpdk
options: {dpdk-devargs="0000:3b:00.1", n_txq_desc="2048"}
Port phy-br-vlan
Interface phy-br-vlan
type: patch
options: {peer=int-br-vlan}
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
datapath_type: netdev
Port vhu87cf49d2-5b
tag: 7
Interface vhu87cf49d2-5b
type: dpdkvhostuserclient
options: {vhost-server-path="/var/lib/vhost_socket/vhu87cf49d2-5b"}
Port vhub607c1fa-ec
tag: 7
Interface vhub607c1fa-ec
type: dpdkvhostuserclient
options: {vhost-server-path="/var/lib/vhost_socket/vhub607c1fa-ec"}
Port vhu9a035444-83
tag: 8
Interface vhu9a035444-83
type: dpdkvhostuserclient
options: {vhost-server-path="/var/lib/vhost_socket/vhu9a035444-83"}
Port br-int
Interface br-int
type: internal
Port int-br-vlan
Interface int-br-vlan
type: patch
options: {peer=phy-br-vlan}
Port vhue00471df-d8
tag: 8
Interface vhue00471df-d8
type: dpdkvhostuserclient
options: {vhost-server-path="/var/lib/vhost_socket/vhue00471df-d8"}
Port vhu683fdd35-91
tag: 7
Interface vhu683fdd35-91
type: dpdkvhostuserclient
options: {vhost-server-path="/var/lib/vhost_socket/vhu683fdd35-91"}
Port vhuf04fb2ec-ec
tag: 8
Interface vhuf04fb2ec-ec
type: dpdkvhostuserclient
options: {vhost-server-path="/var/lib/vhost_socket/vhuf04fb2ec-ec"}
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
ovs_version: "2.13.3"
I have created guest vms using openstack and they can see them they are connected using vhost socket (Ex: /var/lib/vhost_socket/vhuf04fb2ec-ec)
When I did load-testing, I found performance is average and after 200kpps packet rate I noticed packet drops. In short DPDK host hit high PMD cpu usage and loadtest showing bad audio quality. when i do same test with SRI
[Answer] this observation is not true based on the live debug done so far. The reason as stated below
To have a fair comparison against SRIOV approach, the following changes have been made with respect to similar question
External Port <==> DPDK Port0 (L2fwd) DPDK net_vhost <--> QEMU (virtio-pci)
Numbers achieved with iperf3 (bidirectional) is around 10Gbps.
Note: requested to run trex, pktgen to try out Mpps. Expectation is to reach minimum 8 MPPS with the current setup.
Hence this is not DPDK, virtio-client, qemu-kvm or SRIOV related issue, instead a configuration or platform setup issue.