Loadtesting EdgeRouter 4 in more depth
Previously I have load-tested a bunch of routers. A reader wrote me a nice email, complimenting on the post… and asking whether I considered tuning the EdgeRouter 4 setup:
One question I had was around the ER4 results, you commented that there was a major falloff when a threshold was hit. Your commentary actually is spot on for how the Canvium offload works - there is a tracking table for flows that can be overloaded at the default levels. Did you try increasing the size of “System → Offload → Flow Table Size” or reducing size of the Flow Lifetime?
I would imagine that is why you were able to blow out performance on the imix results as the small streams eventually starved out the Flow Table as they were shorter than the reap interval of Flow Lifetime.
More info is here: Hardware-Offloading – Section: Optional Offloading Optimizations and Testing
Answering that question with flat “no” would not be very useful, I think.
Therefore in this post I’ll try and answer the question: What happens with the loadtest when I tweak the flow lifetime and flow table size?
I resurrected the same setup from the Loadtesting bunch of routers article1. In the interest of saving time I’ll be only using imix.py profile for all the tests.
After getting a baseline, I will use the following config command to tweak flow lifetime:
configure # Set flow lifetime to 12s (default). # For the test I will try 6, 1, 0. set system offload flow-lifetime 12 commit; exit
And then I’ll use the following config command to tweak the flow table size:
configure # Set flow table size to 8192 for mem consumption of 1-4MB (default). # For the test I will try 65536, 524288, 4194304, 16777216. set system offload ipv4 table-size 8192 commit; exit
Issues during the test run
Below (each per section) are issues I’ve encountered as I was running the test.
The Unifi documentation page (linked above) says that
0-4294967295. That is a lie:
ubnt@EdgeRouter-4# set system offload flow-lifetime 0 must be between 1-300
0 value for us. Given the lie, I have also run a test with explicit
flow lifetime of
12, to make sure that unspecified default is indeed
I might be reading the output of
show ubnt offload statistics wrong:
ubnt@EdgeRouter-4:~$ show ubnt offload statistics [...] Forwarding cache size (IPv4) ============================= table_size (buckets) 16384 [...] Flow cache table size (IPv6) ============================= table_size (buckets) 8192 [...]
but the default doesn’t seem to be
8196 for IPv4 – like the documentation says
– but rather
8196 for IPv6.
Are the table sizes applied?
During the testing I was overcome by a weird feeling that the
could be only adjusted during reboot2. So for one of the values (
table-size = 512k)
I also rebooted the device and re-ran the test. That is marked in the results
But spoiler alert: no, it is likely applied at runtime, at
overall results; full details on loadtest results
ftoX suffix signifies results for
flow-timeout value of
X, and the
tblY suffix for
table-size value of
I was expecting significant delta for at least the increased table sizes. Unfortunately – as depicted above – there’s hardly any difference.
In desperation I also tried
flow-lifetime = 1 and
table-size = 16777216.
tbl16Mfto1. Still no luck boosting the performance significantly.
Scaled down imix
Therefore I also edited down the
imix.py profile to only use smaller amount
of source IP, destination IP combos. In the graphs below
signifies 1 src_ip + 1 dst_ip. Similarly for
result with less flows; full details in flows – results
Even though it’s very surprising, neither the
table-size nor the
made significant difference for
imix performance on EdgeRouter 4.
The only factor making a significant difference was downscaling number of flows.
Maybe there’s something more to be tweaked in the settings, but I don’t know what that would be.
In any case, there’s some opaque bottleneck that doesn’t seem to be directly influenced by the flow table size (or the flow lifetime)4.
And if I were reading my own notes thoroughly, I would saved myself some initial debugging due to the missing
ip neigh add ...on the DUT. Because baseline like this is no joy. ↩
Which is what happens to me when I’m tweaking a value of (IMO) fundamental importance and I’m getting unexpectedly negligible delta. ↩
As my wife’s shirt says: “There are two kinds of people in this world: Those who can extrapolate from incomplete data” ↩
I think that similar issue would come up when using stateful firewall on Linux. Which would be direct analog of what the offload does. That said, I would expect netfilter to be highly responsive to