Fixing memif's poor performance in a VPP tutorial


Problem statement

I’m going through the Progressive VPP Tutorial and I’ve noticed that in the Connecting Two FD.io VPP Instances the ping is rather atrocious:

vpp# ping 10.10.2.2
116 bytes from 10.10.2.2: icmp_seq=1 ttl=64 time=21.2216 ms
116 bytes from 10.10.2.2: icmp_seq=2 ttl=64 time=20.0083 ms
116 bytes from 10.10.2.2: icmp_seq=3 ttl=64 time=20.0124 ms
116 bytes from 10.10.2.2: icmp_seq=4 ttl=64 time=12.8354 ms
116 bytes from 10.10.2.2: icmp_seq=5 ttl=64 time=11.0053 ms

Statistics: 5 sent, 5 received, 0% packet loss

Because 10-20ms on a “loopback” interface is tragic.

What is also weird is that one core from the VM is pegged at 100%.

I’ll look into that.

Answers

I dug around and found VPP memif ping taking ~20ms thread which matches my experience.

I experimented a bit:

$ cat startup1.conf
unix {cli-listen /run/vpp/cli-vpp1.sock}
api-segment { prefix vpp1 }
plugins { plugin dpdk_plugin.so { disable } }
# here be the fix:
cpu { main-core 0 }

$ cat startup2.conf
unix {cli-listen /run/vpp/cli-vpp2.sock}
api-segment { prefix vpp2 }
plugins { plugin dpdk_plugin.so { disable } }
# here be the fix:
cpu { main-core 1 }

And with this setup (pegging one instance to core0 and the other to core1) the following config:

# 1
create interface memif id 0 master
set int state memif0/0 up
set int ip address memif0/0 10.10.2.1/24


# 2
create interface memif id 0 slave
set int state memif0/0 up
set int ip address memif0/0 10.10.2.2/24

takes the ping now to much more reasonable result:

vpp# sh int addr        
local0 (dn):
memif0/0 (up):
  L3 10.10.2.1/24

vpp# ping 10.10.2.2
116 bytes from 10.10.2.2: icmp_seq=1 ttl=64 time=.0289 ms
116 bytes from 10.10.2.2: icmp_seq=2 ttl=64 time=.0262 ms
116 bytes from 10.10.2.2: icmp_seq=3 ttl=64 time=.0293 ms
116 bytes from 10.10.2.2: icmp_seq=4 ttl=64 time=.0262 ms
116 bytes from 10.10.2.2: icmp_seq=5 ttl=64 time=.0314 ms

Statistics: 5 sent, 5 received, 0% packet loss

The core1 pegged at 100% – however – remains.

Given that the rx-placement is polling:

vpp# sh int rx-placement
Thread 0 (vpp_main):
 node memif-input:
    memif0/0 queue 0 (polling)

and given this TNSR CPU utilization on SG-5100 thread, I’m led to believe the 100% cpu peg is normal. Which is strange but probably not unreasonable.

Edited to add: The VPP CPU Load section of Troubleshooting documentation page says it clearly:

With at least one interface in polling mode, the VPP CPU utilization is always 100%.