
I’ve done a lot of posts featuring ZeroTier in this Blog. There’s a lot of different overlay solutions that I could feature. There’s ZeroTier, Tailscale, NetBird, Nebula, TwinGate…just to name a few. The reason I feature ZeroTier so heavily is it is uniquely suited for site-to-site networking over the different offerings. This is because it creates a virtual LAN rather than a collection of point-to-points like other solutions. Next hops on a multiaccess ethernet networks just need to point to an IP, and ARP will resolve the IP so the MAC address of the next hop can be imposed on the packet.
The simplicity of that solution allows network engineers to treat ZeroTier not as some additional thing to integrate with an already complex environment, but to just treat ZeroTier as any LAN segment they are accustomed to working with. You can run anything over ZeroTier that you could run over a traditional Ethernet Network.
Performance in ZeroTier
If you scour the internet, a lot of people will praise ZeroTier, but a frequent complaint is the performance against WireGuard and sometimes IPsec. While I certainly wouldn’t categorize the typical performance of ZeroTier as bad, it usually can’t beat WireGuard. WireGuard can max out 1Gbps circuits even on cheap and low powered hardware.
In this post, we’re going to aim to bring the performance up to that of WireGuard; maybe even surpass it.
NOTE: In this post, we’re going to be comparing speeds between ZeroTier, WireGuard, OpenVPN, and IPsec VTI tunnels. It’s important to note that this entire post is aimed at improving the overall throughput of ZeroTier between 2 VyOS hosts. What it is not is a statement on the superiority of the raw speed between protocols. Most of what will be done in this article to improve speeds using ZeroTier can also be applied to WireGuard and OpenVPN.
What causes the lower performance of ZeroTier
ZeroTier has a lot of great functionality, but as a product, it’s still relatively young compared to WireGuard. The ZeroTier developers wanted to focus on functionality over performance. Provide solid functionality, then improve it.
ZeroTier is generally single-threaded, whereas WireGuard is optimized for multi-threading. This means that with more cores, WireGuard should generally beat ZeroTier as it can leverage all of the cores in the system.
Look at the below graphics showing the CPU utilization when running iPerf between 2 devices. You can see the much better utilization of all cores using WireGuard.
WireGuard:

ZeroTier:

Developing for good multi-threaded scaling can be difficult, and I’m sure it is on ZeroTier’s roadmap, but what if we had another way to use more cores with ZeroTier, can that help us get to an even footing with WireGuard? Let’s explore that idea.
Building a Proof of Concept – The Hardware
I wanted to have 2 systems with identical specs to hopefully remove some variability. I didn’t want to spend a lot of money on a proof of concept, so I tried to find the cheapest mini PCs available that had at least 4 cores, and 2 Network Interfaces. After some searching, I found the Minisforum GK41. It has a Quad Core Celeron (J4125), with 8GB of RAM, and 2x1Gbps interfaces. I was able to get 2 of them on sale for around $120 USD each.
The GK41 can be purchased on Amazon, but I can’t find a product page on the Minisforum website. They do have an old presentation for it which can be found here: https://www.minisforum.com/ueditor/file/20200820/1597895064757677.pdf
I received mine, let’s start setting up a Proof of Concept.

Building a Proof of Concept – The Software
I’m going to use VyOS as the operating system for each system. That’s been another staple of my blog posts, as I love the direction VyOS takes their product. They have developed a feature rich Software Router/Firewall, while allowing it to feel familiar to Network Engineers coming from the enterprise.
For our VPN solutions, I’m going to build WireGuard, VTI IPsec, and OpenVPN directly within VyOS, as those are already fully integrated within VyOS. For ZeroTier, I’m going to install it in containers. If you need to see how to do this, check out this previous blog post:
https://lev-0.com/2024/01/16/dynamic-multipoint-vpn-with-zerotier-and-vyos-part-4-more-zerotier/
Initial Testing
Let’s get a baseline of these mini PCs to see what their max unencrypted throughput is.
[ 5] 0.00-10.00 sec 1.09 GBytes 941 Mbits/sec receiver
That’s a good sign, we can max out the 1Gbps connection (941Mbits is the payload throughput after overhead).
Let’s see what we get with our different VPN solutions:
WireGuard:
[ 5] 0.00-10.00 sec 1021 MBytes 856 Mbits/sec receiver
IPsec:
[ 5] 0.00-10.00 sec 958 MBytes 803 Mbits/sec receiver
OpenVPN:
[ 5] 0.00-10.00 sec 327 MBytes 274 Mbits/sec receiver
ZeroTier:
[ 5] 0.00-10.00 sec 645 MBytes 541 Mbits/sec receiver
Honestly, I’m quite surprised on the performance of these Mini-PCs. I wasn’t expecting to not just max out WireGuard, but normal IPsec as well. OpenVPN and ZeroTier was along the lines of what I expected, and it’s actually good that ZeroTier didn’t max out the circuit, since it allows us to test our Proof of Concept.
Using more cores
We talked previously on how WireGuard sees better performance due to it’s ability to scale to additional cores in the system. Rewriting ZeroTier to be multi-threaded will take a large effort, but one quick way to use more cores with ZeroTier is to simply install it multiple times.
Normally in Linux, this would be difficult, but with Containers, it becomes quite easy. We have 4 cores, so let’s go ahead and configure 4 instances of ZeroTier.
Each of the containers are going to try to listen on UDP 9993, which obviously won’t work. We also want to make sure that we don’t build ZeroTier over any of the other VPN solutions. We need to modify the local.conf file for each ZeroTier instance so it blacklists the necessary interfaces and listens on a unique port.
{
"physical": {},
"virtual": {},
"settings": {
"primaryPort": "9994",
"interfacePrefixBlacklist": [
"eth10",
"eth12",
"eth13",
"eth14",
"dum0",
"vti0",
"vtun1",
"wg0"
]
}
}
I have setup the 4 ZeroTier interfaces as eth10-14. The local.conf file should be placed in whatever folder you mapped to ‘zerotier-one’, and the container restarted.
I’m going to assign each node an IP in a ‘/30’ subnet. This ensure that the nodes on each side go only between a single node on the opposite end.


Creating a floating IP
We’ll need to create an IP that we can create ECMP routing over to be able to use all of the ZeroTier containers at the same time.
Router1:
set interfaces dummy dum0 address '10.0.55.1/32'
Router2:
set interfaces dummy dum0 address '10.0.55.2/32'
We then need to create routing between the 2 routers for that dummy interface. Notice that we disable 3 of the routes so we can test performance with 1,2,3, and 4 cores separately.
Router1:
set protocols static route 10.0.55.2/32 next-hop 10.14.0.2
set protocols static route 10.0.55.2/32 next-hop 10.14.0.6 disable
set protocols static route 10.0.55.2/32 next-hop 10.14.0.10 disable
set protocols static route 10.0.55.2/32 next-hop 10.14.0.14 disable
Router2:
set protocols static route 10.0.55.1/32 next-hop 10.14.0.1
set protocols static route 10.0.55.1/32 next-hop 10.14.0.5 disable
set protocols static route 10.0.55.1/32 next-hop 10.14.0.9 disable
set protocols static route 10.0.55.1/32 next-hop 10.14.0.13 disable
We also need to enable layer-4 hashing for ECMP in VyOS. This will allow our traffic to be load-balanced based on the source/destination port in the packet. Sending multiple threads with iPerf will create unique source/destination port pairings to allow ECMP.
set system ip multipath layer4-hashing
We should have a slight performance penalty with ZeroTier by using the floating IP, but it should be small. Let’s try it with just one core to see what penalty we get.
We need to make sure our iPerf tests now use that floating IP. We can do it with something like this in iPerf.
iperf3 -c 10.0.55.1 -B 10.0.55.2
Here’s the results of iPerf
[ 5] 0.00-10.00 sec 526 MBytes 441 Mbits/sec receiver
You can see we lost about 100Mbps of throughput in doing this. Let’s now try 2 cores and see if our change was even worth it. We’re going to re-enable some of the static routes we disabled earlier.
Router1:
delete protocols static route 10.0.55.2/32 next-hop 10.14.0.6 disable
Router2:
delete protocols static route 10.0.55.1/32 next-hop 10.14.0.5 disable
We’ll need to run iPerf with multiple threads to use multiple ZeroTier instances.
iperf3 -c 10.0.55.1 -B 10.0.55.2 -P 16
iPerf Results (ZeroTier ECMP with 2 cores):
[SUM] 0.00-10.01 sec 1.01 GBytes 864 Mbits/sec receiver
Perfect, that worked. But we now have a problem, I’ve only used 2 cores and we’ve already run into a limit of the 1Gbps interfaces on this box. I guess that’s a good problem to have if you intend to use this mini-PC in production. I don’t want to spend more money on Mini-PCs to try to test this, so USB 2.5Gbps adapters to the (hopefully) rescue.
Alright, I received 2 of them, now back to testing.

I guess we need new baselines now.
Unencrypted:
[SUM] 0.00-10.00 sec 2.60 GBytes 2.34 Gbits/sec receiver
WireGuard:
[ 5] 0.00-10.00 sec 1.83 GBytes 1.57 Gbits/sec receiver
IPsec:
[ 5] 0.00-10.00 sec 1.58 GBytes 1.36 Gbits/sec receiver
OpenVPN:
[ 5] 0.00-10.00 sec 270 MBytes 226 Mbits/sec receiver
ZeroTier:
[ 5] 0.00-10.00 sec 469 MBytes 393 Mbits/sec receiver
We can see we actually were able to get quite a bit more throughput using both WireGuard and IPsec. As we weren’t able to max out ZeroTier or OpenVPN before with a single core, they didn’t really change much. Now back to the testing. Let’s try ZeroTier with 2 Cores again.
iPerf Results (ZeroTier ECMP with 2 cores):
[SUM] 0.00-10.00 sec 1.19 GBytes 1.02 Gbits/sec receiver
That’s looking great, we’re seeing near linear scaling when using ECMP with ZeroTier. Let’s see what we get with 3 cores.
Router1:
delete protocols static route 10.0.55.2/32 next-hop 10.14.0.10 disable
Router2:
delete protocols static route 10.0.55.1/32 next-hop 10.14.0.9 disable
iPerf Results (ZeroTier ECMP with 3 cores):
[SUM] 0.00-10.01 sec 1.69 GBytes 1.45 Gbits/sec receiver
And finally, with all 4 cores.
Router1:
delete protocols static route 10.0.55.2/32 next-hop 10.14.0.14 disable
Router2:
delete protocols static route 10.0.55.1/32 next-hop 10.14.0.13 disable
iPerf Results (ZeroTier ECMP with 4 cores):
[SUM] 0.00-10.00 sec 2.25 GBytes 1.93 Gbits/sec receiver
It look all 4 cores, but we were finally able to beat both WireGuard and IPsec on this cheap Mini-PC. Getting almost 2Gbps of encrypted throughput with ZeroTier for a little over $100 USD for each PC is kind of amazing.
Here was the CPU utilization during that test:

Part of me thought about stopping this post here, but seeing the near linear scaling made me very curious. I have some other PCs that can max out a 2.5Gbps interface with a single ZeroTier instance; what could a more capable box get as a max aggregate throughput?
Building a (bigger) Proof of Concept – The Hardware
As I was starting the planning for all of this, Minisforum announced a new Mini-PC, which had 10G networking, along with a PCIe slot, which could allow for even greater network speeds. I went back and forth on whether or not to get a couple of them, but I ultimately figured I could find a use for them as Network Attached Storage boxes or something later.
The PC is called the MS-01, and comes with an Intel 14-core processor (6 P-core; 8 E-core). The max clock speed is 5.4Ghz for the P-cores and 4Ghz for the E-cores. I spec’ed them out with 96GB of RAM since I plan on turning them into servers later.
You can check out the MS-01 here:
https://store.minisforum.com/products/minisforum-ms-01?_pos=1&_sid=83a4251da&_ss=r
I’ve received them, let’s do some baseline testing for these. I’m using the 10G interfaces between the boxes.

Further Testing
We’re going to port our config from the GK41s directly to the MS-01, to include the config files for our containers. This allows everything to be the same. Additionally, since we have 6 P-cores, I’m going to go ahead and configure 6 instances of ZeroTier on each system.
Let’s get some baselines for this system.
Unencrypted:
[SUM] 0.00-10.00 sec 10.9 GBytes 9.35 Gbits/sec receiver
WireGuard:
[ 5] 0.00-10.00 sec 7.63 GBytes 6.55 Gbits/sec receiver
IPsec:
[ 5] 0.00-10.00 sec 6.66 GBytes 5.72 Gbits/sec receiver
OpenVPN:
[ 5] 0.00-10.00 sec 2.54 GBytes 2.18 Gbits/sec receiver
ZeroTier:
[SUM] 0.00-10.00 sec 5.23 GBytes 4.49 Gbits/sec receiver
Even without touching anything, that’s pretty impressive. We see nearly triple of all the performance of the GK41. I think that would probably be plenty for most people. But I want to see if we can get 10Gbps of ZeroTier. Let’s try 2 cores.
iPerf Results (ZeroTier ECMP with 2 cores):
[SUM] 0.00-10.00 sec 9.76 GBytes 8.38 Gbits/sec receiver
Well, I can already see a problem….we’re again seeing near linear scaling, which means we only need 3 cores to max out 10G.
Remember how I said this Mini-PC had a PCIe slot in it….let’s order some 25Gb cards.

Now that they’re installed, let’s start ramping up our core count. Remember we have 6 P-Cores.
iPerf Results (ZeroTier ECMP with 3 cores):
[SUM] 0.00-10.00 sec 13.5 GBytes 11.6 Gbits/sec receiver
We took down that 10G target pretty easily.
iPerf Results (ZeroTier ECMP with 4 cores):
[SUM] 0.00-10.00 sec 16.4 GBytes 14.1 Gbits/sec receiver
iPerf Results (ZeroTier ECMP with 5 cores):
[SUM] 0.00-10.01 sec 19.2 GBytes 16.5 Gbits/sec receiver
We’re getting close, but we’re seeing that linear scaling decrease (possible reason for that in a bit). Let’s see if 6 P-Cores is enough to max out the 25Gb connection.
iPerf Results (ZeroTier ECMP with 6 cores):
[SUM] 0.00-10.00 sec 22.0 GBytes 18.9 Gbits/sec receiver
Sadly using 6 cores fell a little short. It’s very possible that I could have maxed it with just 6 cores, but it’s important to note that I’m not pinning any of the containers to the P-Cores, so it’s possible that some of those slower 2Gbps jumps when adding more cores was the CPU scheduler having the containers use the E-Cores.
We’ve come this far, so we might as well push it over the edge….we do still have 8 E-Cores after all.
I added 2 more containers to finish the test, and let’s see what we get after having 8 ECMP paths for ZeroTier:
iPerf Results (ZeroTier ECMP with 8 cores):
[SUM] 0.00-10.00 sec 24.8 GBytes 21.3 Gbits/sec receiver
WireGuard who?
That’s pretty much the max we should expect given the overhead of each packet. We even still have 6 E-Cores remaining in theory. I know what you’re thinking….but I’m not going to buy some 40Gb cards.
Conclusion
You may wonder how practical a solution like this is. For anyone that has configured VPNs to AWS VPG, you already know that overcoming an encrypted throughput limit (1.25Gbps to a VPG) requires simply adding more tunnels and doing ECMP over them. While no single flow can exceed that limit, the aggregate throughput of all traffic can reach pretty impressive speeds. With our MS-01s, we can have single flows of 4.5Gbps. To put that in perspective, that’s enough to saturate a SATA3 SSD. As an aggregate we can almost saturate a PCIe 3.0 NVMe SSD.
For ZeroTier, this speaks to the promise as a solution in the future. As the developers continue to improve on the product, and allow for native multi-threading, you may start to see to see ZeroTier being the best performing solution without needing to design ECMP.






Leave a Reply