Commit
224e2dc3ab67311b4bad94bf891d1b5c5ca6128f
by Pau Espin Pedroltestsuites: upf: Support setting net iface irqs in c240 host
The Cisco C240 machines consist of 2 CPU packages, each with 20 cores
(hyperthreading disabled). Those 2 CPUs packages are hence placed in 2
different NUMA zones.
By default, mlx5 driver creates one rx-queue per core, in this case it
creates 40 rx-queues with 40 irqs (one for each rx-queue/core).
This means it doesn't take into account the fact that the network card
is plugged into a given PCIe bus belonging to one of the 2 available
NUMA zones.
As a result, when a packet is received and put into the rx-queue+irq
belonging to the other NUMA zone, a penalty in memory-bandwitch is
incurred, which ends up in a performance throughput penalty.
With 100 flows at ~70Gbps being sent to the host, by default load spread
among all 40 cores can reach processing at 55Gbps 4.62 MPPS.
By decreasing the rx-queues to 20 and pinning the irqs to the cores in
the same NUMA node, using only those 20 CPUs we get a small gain
reaching ~62 Gbps 5.22 MPPS.
Change-Id: I26fce50c04b043b61ba418d7090b2573e7807b08