VSS, vPC, Stackwise, MLAG, MC-MLAG….these are just a few of the common switch virtualization technologies we’ve come to know and love (or despise depending on your experience). One common thread among all of these, is that they are vendor specific, so I can’t use Stackwise on a device without a Stackwise port; or use Arista MLAG with a Cisco vPC switch. Wouldn’t it be nice if there was a vendor agnostic solution for stacking switches into a logical virtual switch. That is where ESI-Multihoming (sometimes referred to as EVPN-MH) enters the ring.
ESI-Multihoming uses Open standard protocols to allow multiple devices to behave as a single control plane for multiple switches (or routers in more of a service provider solution). It allows us to have common concepts when configuring switch stacking regardless of the vendor in use, but also allow us to stack a switch from one vendor to a switch from another (like Cisco to Arista). Here are some summary benefits to ESI-Multihoming:
Interoperability between vendors. We can stack switches between different vendors (like Cisco and Arista), or even between switch models from the same vendor (like Catalyst and Nexus switches from Cisco).
vPC (Cisco), MLAG (Arista), MC-MLAG (Juniper)….all of these technologies have a major limitation; you can only have 2 switches in the group. ESI-Multihoming allows for the ability to stack more switches within these groups.
Lab Environment
We’re going to use VyOS as PE routers that will serve as our ESI nodes. Additionally we’re going to use Nexus 9K switches for our layer 2 environment. Finally, we’ll use another VyOS instance as the client that needs to reach a service (the internet in our lab).
NOTE: I just mentioned a lot about stacking switches, but we’re going to use routers as the PEs, which is also perfectly valid.
Devices
Role
Device
Version
PEs
VyOS
2025.11.14-0020-rolling
Core Switches
Cisco Nexus
version 10.3(7)
Client
VyOS
2025.11.14-0020-rolling
Interfaces:
Device
Interface
Description
PE1 and PE2
eth0
To Internet
PE1 and PE2
eth1
VxLAN Fabric connection between PEs
PE1 and PE2
eth2
Connection from PEs to Nexus Switches
SW1 and SW2
mgmt0
vPC Keepalive
SW1 and SW2
Eth1/9
vPC Peer Link
SW1 and SW2
Eth1/1
Connection to PE (port-channel 10)
SW1 and SW2
Eth1/8
Connection to Client (port-channel 20)
Topology
NOTE: This topology is simple for the sake of testing, but in production, you’d want redundancy for your interfaces like this:
Initial Switch Configuration
The configuration of these switches will be out of scope of this article, but it is a fairly simple configuration. You don’t actually need to use switches that support stacking to complete this lab, but if you have access to switch images that support something like vPC or MLAG, I recommend you use them, since there are some nuances when using them.
Switch-1:
hostname SW1
feature vpc feature lacp
spanning-tree mode mst
spanning-tree mst configuration name ESI revision 1 instance 1 vlan 1-4094 spanning-tree mst 1 root primary
int mgmt 0 ip address 10.1.2.1/24 vrf context managaement no shut
vpc domain 1 role priority 1000 peer-keepalive destination 10.1.2.2
int eth1/9 switchport channel-group 1 mode active lacp rate fast
int po1 vpc peer-link switchport mode trunk
vlan 10
Switch-2:
hostname SW2
feature vpc feature lacp
spanning-tree mode mst
spanning-tree mst configuration name ESI revision 1 instance 1 vlan 1-4094 spanning-tree mst 1 root secondary
int mgmt 0 ip address 10.1.2.2/24 vrf context managaement no shut
vpc domain 1 role priority 1000 peer-keepalive destination 10.1.2.1
int eth1/9 switchport lacp rate fast channel-group 1 mode active
int po1 vpc peer-link switchport mode trunk
vlan 10
Initial PE Configuration
PE1:
set interfaces ethernet eth0 address 'dhcp' set interfaces ethernet eth0 dhcp-options default-route-distance '150'
set nat source rule 10 outbound-interface name 'eth0' set nat source rule 10 translation address 'masquerade'
set system host-name 'PE1'
PE2:
set interfaces ethernet eth0 address 'dhcp' set interfaces ethernet eth0 dhcp-options default-route-distance '150'
set nat source rule 10 outbound-interface name 'eth0' set nat source rule 10 translation address 'masquerade'
set system host-name 'PE2'
This is all pretty simple so far. We configured Spanning-Tree and vPC on the Switches:
SW1# show vpc Legend: (*) - local vPC is down, forwarding via vPC peer-link
vPC domain id : 1 Peer status : peer adjacency formed ok vPC keep-alive status : peer is alive Configuration consistency status : success Per-vlan consistency status : success Type-2 consistency status : success vPC role : secondary Number of vPCs configured : 0 Peer Gateway : Disabled Dual-active excluded VLANs : - Graceful Consistency Check : Enabled Auto-recovery status : Disabled Delay-restore status : Timer is off.(timeout = 30s) Delay-restore SVI status : Timer is off.(timeout = 10s) Delay-restore Orphan-port status : Timer is off.(timeout = 0s) Operational Layer3 Peer-router : Disabled Virtual-peerlink mode : Disabled
vPC Peer-link status --------------------------------------------------------------------- id Port Status Active vlans -- ---- ------ ------------------------------------------------- 1 Po1 up 1,10
SW1# show spanning-tree vlan 10
MST0001 Spanning tree enabled protocol mstp Root ID Priority 1 Address 0ca8.0000.1b08 This bridge is the root Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 1 (priority 0 sys-id-ext 1) Address 0ca8.0000.1b08 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
And we configured our PEs so they will have access to the internet:
vyos@vyos:~$ show interfaces ethernet Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down Interface IP Address S/L Description --------- ---------- --- ----------- eth0 10.0.101.74/24 u/u eth1 - u/u eth2 - u/u eth3 - u/D eth4 - u/D eth5 - u/D
vyos@vyos:~$ show ip route Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure
S>* 0.0.0.0/0 [150/0] via 10.0.101.1, eth0, weight 1, 00:00:08 C>* 10.0.101.0/24 is directly connected, eth0, weight 1, 00:00:09 K * 10.0.101.0/24 [0/0] is directly connected, eth0, weight 1, 00:00:09 L>* 10.0.101.74/32 is directly connected, eth0, weight 1, 00:00:09
Before we move on with the configuration, let’s actually talk about ESI-Multihoming a little bit.
What is Ethernet Segment Identifier Multihoming?
ESI-Multihoming leverages EVPN to unify the control-planes of multiple devices into a single logical device (like all of the technologies I mentioned at the start of this article). This allows for the forwarding of Layer2 traffic across a fabric, where the data-plane is typically VxLAN, or MPLS in SP environments where the device supports it. L2VPNs using MPLS as the data-plane is not supported on LInux, so this lab will use VxLAN.
EVPN routes will be shared between the PEs using BGP. We need to know about the different EVPN route types to help understand how this solution works. Here is a refresher for the different route types:
EVPN Route Type
Purpose
Type-1: Ethernet Auto-Discovery (A-D) Route
This is used to advertise that a PE is participating in an EVPN Service
Type-2: MAC/IP Advertisement Route
When a MAC address (or ARP entry) is learned on the PE, it can advertise them into BGP as a type-2 route, which greatly limits flooding in the EVPN environment, since remote PEs won’t also need to flood-and-learn to receive them.
Type-3: Inclusive Multicast Ethernet Tag Route (IMET)
Signals that a PE wants to receive BUM (broadcast, unknown unicast, multicast) traffic for a given EVI.
Type-4: Ethernet Segment (ES) Route
This is the route type that will do a lot of the heavy lifting in an ESI solution. It signals to other PEs that it belongs to a specific ESI. If another PE also belongs to the same ESI, it is effectively stacked.
Type-5: IP Prefix Route
This is an IPv4/IPv6 route just like you’d normally see in MP-BGP, only advertised using EVPN. This won’t be used in this lab, but it can be very useful if you want to use EVPN not only for L2VPNs, but L3VPNs as well.
set interfaces ethernet eth1 address '10.1.2.1/24'
set interfaces vxlan vxlan0 parameters neighbor-suppress set interfaces vxlan vxlan0 port '4789' set interfaces vxlan vxlan0 vni '100' set interfaces vxlan vxlan0 source-address '10.1.2.1'
PE2:
set interfaces ethernet eth1 address '10.1.2.2/24'
set interfaces vxlan vxlan0 parameters neighbor-suppress set interfaces vxlan vxlan0 port '4789' set interfaces vxlan vxlan0 vni '100' set interfaces vxlan vxlan0 source-address '10.1.2.2'
Let’s break that down a little:
set interfaces ethernet eth1 address '10.1.2.2/24'
This configures the interface that connects our 2 PE routers.
set interfaces vxlan vxlan0 parameters neighbor-suppress
Neighbor Suppression allows the PEs to respond to ARP requests on behalf of clients, limiting the flooding of ARP and ND traffic
set interfaces vxlan vxlan0 port '4789'
This is the default port for a VxLAN interface, but hardcoding it is useful in case we need to add additional interfaces in the future.
set interfaces vxlan vxlan0 vni '100'
This is used to scope the bridge-domain of our layer 2 traffic.
set interfaces vxlan vxlan0 source-address '10.1.2.1'
VxLAN traffic from this VTEP will use 10.1.2.1 as the address for its underlay.
NOTE: Trying to commit this VxLAN interface config will fail until it’s added to a bridge which we’ll do next
Configuring the Bridge Interface:
PE1:
set interfaces bridge br0 address '10.0.0.2/24' set interfaces bridge br0 member interface vxlan0
PE2:
set interfaces bridge br0 address '10.0.0.3/24' set interfaces bridge br0 member interface vxlan0
Let’s try to ping across our VxLAN interface:
vyos@vyos:~$ ping 10.0.0.3 count 1 PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data. From 10.0.0.2 icmp_seq=1 Destination Host Unreachable
That doesn’t look too good. Don’t worry though, we expect that. We’re not doing flood and learn, we need a control-plane for MAC learning. Let’s configure BGP.
Configuring BGP:
PE1:
set protocols bgp address-family l2vpn-evpn advertise-all-vni set protocols bgp peer-group ESI_PEERS address-family ipv4-unicast default-originate set protocols bgp peer-group ESI_PEERS address-family ipv4-unicast nexthop-self set protocols bgp peer-group ESI_PEERS address-family l2vpn-evpn set protocols bgp peer-group ESI_PEERS remote-as '65000' set protocols bgp system-as '65000'
set protocols bgp neighbor 10.1.2.2 peer-group ESI_PEERS
PE2:
set protocols bgp address-family l2vpn-evpn advertise-all-vni set protocols bgp peer-group ESI_PEERS address-family ipv4-unicast default-originate set protocols bgp peer-group ESI_PEERS address-family ipv4-unicast nexthop-self set protocols bgp peer-group ESI_PEERS address-family l2vpn-evpn set protocols bgp peer-group ESI_PEERS remote-as '65000' set protocols bgp system-as '65000'
set protocols bgp neighbor 10.1.2.1 peer-group ESI_PEERS
Let’s break that down a little bit:
set protocols bgp address-family l2vpn-evpn advertise-all-vni
This will advertise our EVPN routes generated and learned to other PEs
set protocols bgp peer-group ESI_PEERS address-family ipv4-unicast default-originate
We want to be able to route around failures, so this will send a default route to the other PE. We’ll use this later in testing.
set protocols bgp peer-group ESI_PEERS address-family ipv4-unicast nexthop-self
We have directly connected peers, so we don’t need next-hop-unchanged behavior for our routes. This makes IPv4 reachability much easier.
set protocols bgp peer-group ESI_PEERS address-family l2vpn-evpn
The l2vpn-evpn address family is what we will use for our EVPN routes that are learned/generated.
set protocols bgp neighbor 10.1.2.1 peer-group ESI_PEERS
Finally, we enable the BGP peering between our peers.
Let’s verify BGP came up:
vyos@PE1:~$ show bgp l2vpn evpn summary BGP router identifier 10.1.2.1, local AS number 65000 VRF default vrf-id 0 BGP table version 0 RIB entries 3, using 384 bytes of memory Peers 1, using 24 KiB of memory Peer groups 1, using 64 bytes of memory
vyos@PE1:~$ show bgp l2vpn evpn BGP table version is 3, local router ID is 10.1.2.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id] EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP] EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP] EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP] EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path Route Distinguisher: 10.1.2.1:2 *> [3]:[0]:[32]:[10.1.2.1] 10.1.2.1 32768 i ET:8 RT:65000:100 Route Distinguisher: 10.1.2.2:2 *>i [3]:[0]:[32]:[10.1.2.2] 10.1.2.2 100 0 i RT:65000:100 ET:8
Displayed 2 out of 2 total prefixes
We have our local IMET routes, which is all we should expect to have since we haven’t learned any L2 info yet. Let’s try pinging over the VxLAN interface again:
vyos@PE1:~$ ping 10.0.0.3 count 1 PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data. 64 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=4.35 ms
--- 10.0.0.3 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 4.352/4.352/4.352/0.000 ms
That verifies that our fabric is working for basic VxLAN traffic. Now we need to configure our bond interface (port-channel) that will allow our downstream switches to see us as a stacked device.
set interfaces bonding bond0 address '10.0.0.10/24' set interfaces bonding bond0 lacp-rate 'fast' set interfaces bonding bond0 member interface 'eth1' set interfaces bonding bond0 member interface 'eth0'
set protocols static route 0.0.0.0/0 next-hop 10.0.0.1
Let’s break that down a little bit:
set interfaces bonding bond0 evpn es-id '100' set interfaces bonding bond0 evpn es-sys-mac 'aa:bb:cc:dd:ee:f0'
These define the values that will create our Type-4 EVPN route. These values should match on any PE that you want to participate in this ESI.
set interfaces bonding bond0 evpn uplink
This defines this port as an interface where traffic enters/exits our EVPN fabric.
set interfaces bonding bond0 lacp-rate 'fast' set interfaces bonding bond0 member interface 'eth2' set interfaces bonding bond0 mode '802.3ad'
These commands enable LACP and add our interface towards the switch to the bond. We also configure the rate as ‘fast‘, which will allow convergence to be quicker when interfaces in the bond become unavailable.
set interfaces bonding bond0 system-mac 'aa:bb:cc:dd:ee:f0'
Our downstream switches need to see our PEs as a single device, so we need to make sure the LACP system-mac matches from each PE. If you don’t do this, one interface will be suspended or independent on the switches.
set interfaces bonding bond0 evpn es-df-pref '1000'
ESI uses a concept of a designated forwarder for BUM traffic to prevent loops. We set PE1 as the preferred DF (higher value). We’ll go into this in more depth later.
set interfaces bridge br0 member interface bond0
Finally, we add the bond interface to our bridge we configured earlier.
Let’s verify our bond interfaces
PE1:
vyos@PE1:~$ show interfaces bonding bond0 lacp detail Interface Members Mode Rate System-MAC Hash ----------- --------- ------ ------ ----------------- ------ bond0 eth2 active fast aa:bb:cc:dd:ee:f0 layer2
vyos@PE1:~$ show interfaces bonding bond0 lacp neighbors Interface Member Local ID Remote ID ----------- -------- ----------------- ----------------- bond0 eth2 aa:bb:cc:dd:ee:f0 00:23:04:ee:be:01
PE2:
vyos@PE2:~$ show interfaces bonding bond0 lacp detail Interface Members Mode Rate System-MAC Hash ----------- --------- ------ ------ ----------------- ------ bond0 eth2 active fast aa:bb:cc:dd:ee:f0 layer2
vyos@PE2:~$ show interfaces bonding bond0 lacp neighbors Interface Member Local ID Remote ID ----------- -------- ----------------- ----------------- bond0 eth2 aa:bb:cc:dd:ee:f0 00:23:04:ee:be:01
SW1:
SW1# show port-channel summary interface po10 Flags: D - Down P - Up in port-channel (members) I - Individual H - Hot-standby (LACP only) s - Suspended r - Module-removed b - BFD Session Wait S - Switched R - Routed U - Up (port-channel) p - Up in delay-lacp mode (member) M - Not in use. Min-links not met -------------------------------------------------------------------------------- Group Port- Type Protocol Member Ports Channel -------------------------------------------------------------------------------- 10 Po10(SU) Eth LACP Eth1/1(P)
SW1# show lacp neighbor interface po10 Flags: S - Device is sending Slow LACPDUs F - Device is sending Fast LACPDUs A - Device is in Active mode P - Device is in Passive mode port-channel10 neighbors Partner's information Partner Partner Partner Port System ID Port Number Age Flags Eth1/1 65535,aa-bb-cc-dd-ee-f00x1 324 FA
LACP Partner Partner Partner Port Priority Oper Key Port State 255 0x9 0x3f
SW2:
SW2# show port-channel summary interface po10 Flags: D - Down P - Up in port-channel (members) I - Individual H - Hot-standby (LACP only) s - Suspended r - Module-removed b - BFD Session Wait S - Switched R - Routed U - Up (port-channel) p - Up in delay-lacp mode (member) M - Not in use. Min-links not met -------------------------------------------------------------------------------- Group Port- Type Protocol Member Ports Channel -------------------------------------------------------------------------------- 10 Po10(SU) Eth LACP Eth1/1(P)
SW2# show lacp neighbor interface po10 Flags: S - Device is sending Slow LACPDUs F - Device is sending Fast LACPDUs A - Device is in Active mode P - Device is in Passive mode port-channel10 neighbors Partner's information Partner Partner Partner Port System ID Port Number Age Flags Eth1/1 65535,aa-bb-cc-dd-ee-f00x1 464 FA
LACP Partner Partner Partner Port Priority Oper Key Port State 255 0x9 0x3f
You’ll notice from the PEs that both of the switches claim to have the same system-mac (remember PE1 connects to SW1, and PE2 to SW2), even though they’re different switches.
You’ll also notice that the 2 switches see each PE as the same system-mac, since we configured that in the bond.
Let’s check if we have the ESI routes in BGP on PE1:
vyos@PE1:~$ show bgp l2vpn evpn route type 4 BGP table version is 2, local router ID is 10.1.2.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id] EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP] EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP] EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP] EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path Extended Community Route Distinguisher: 10.1.2.1:3 *> [4]:[03:aa:bb:cc:dd:ee:f0:00:00:64]:[32]:[10.1.2.1] 10.1.2.1 32768 i ET:8 ES-Import-Rt:aa:bb:cc:dd:ee:f0 DF: (alg: 2, pref: 1000) Route Distinguisher: 10.1.2.2:3 *>i [4]:[03:aa:bb:cc:dd:ee:f0:00:00:64]:[32]:[10.1.2.2] 10.1.2.2 100 0 i ET:8 ES-Import-Rt:aa:bb:cc:dd:ee:f0 DF: (alg: 2, pref: 900)
You can see we have both the local type-4 route, as well as the route from PE2. This is critical, because this is how the 2 PEs know they are part of the same ESI.
Let’s try to ping the bridge IPs from our client:
vyos@Client:~$ ping 10.0.0.2 count 1 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. From 10.0.0.10 icmp_seq=1 Destination Host Unreachable
That doesn’t look good, let’s investigate what’s happening.
vyos@Client:~$ show arp interface bond0 Address Interface Link layer address State --------- ----------- -------------------- ------- 10.0.0.2 bond0 FAILED
We can see that ARP isn’t resolving. Remember when I said we would talk about the Designated Forwarder more. Let’s get into it.
The Designated Forwarded (DF)
In normal layer 2 environments, the well known mechanism to prevent loops is Spanning Tree, but in an ESI environment, we use a designated forwarder (DF). In fact, all of the other solutions like vPC and MLAG use the same concept, though the presence of the DF may not be shown in any command output. Whatever device is the primary is the DF.
The DF is the only PE that is allowed to forward Broadcast, Unknown Unicast, and Broadcast (BUM) traffic into the downstream switch environment. In the ESI, a DF is elected amongst all PEs for a given ESI. In our config, we set the DF priority of PE1 to be higher than PE2, making it the DF, let’s verify that:
vyos@PE1:~$ show evpn es detail ESI: 03:aa:bb:cc:dd:ee:f0:00:00:64 Type: Local,Remote Interface: bond0 State: up Bridge port: yes Ready for BGP: yes VNI Count: 1 MAC Count: 4 DF status: df DF preference: 1000 Nexthop group: 536870913 VTEPs: 10.1.2.2 df_alg: preference df_pref: 900 nh: 268435458
We can see that the “DF status” shows the device as the DF. If it were not the DF, it would say ‘non-df’.
Why is this important? Because the PEs will view spanning tree from the switches as BUM traffic, which PE2 will forward to PE1 (the DF), who in turn forwards it to SW1. This makes the 2 switches think they’re connected with 2 interfaces; the direct peer-link interface, and the L2VPN link through the PEs, using the VxLAN interface. Let’s verify that on the switches:
SW1# show spanning-tree vlan 10
MST0001 Spanning tree enabled protocol mstp Root ID Priority 1 Address 0ca8.0000.1b08 This bridge is the root Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 1 (priority 0 sys-id-ext 1) Address 0ca8.0000.1b08 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
We can see that ‘Po10’, which is the interface going to the PE bond interface is in a blocking state.
So how do we solve that, you can do one of 2 solutions.
Block the spanning tree frames across the VxLAN interface with the VyOS firewall.
Enable spanning tree on the bridge interface within VyOS.
I’m going to do option 2, and enable STP on the VyOS bridge. This will prevent VyOS from forwarding the STP frame to the DF, since it won’t treat it as BUM traffic.
PE1 and PE2:
set interfaces bridge br0 stp
If we check spanning-tree again, we can see it’s no longer in a blocking state:
SW1# show spanning-tree vlan 10
MST0001 Spanning tree enabled protocol mstp Root ID Priority 1 Address 0ca8.0000.1b08 This bridge is the root Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 1 (priority 0 sys-id-ext 1) Address 0ca8.0000.1b08 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
vyos@Client:~$ ping 10.0.0.2 count 1 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. 64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=8.46 ms
--- 10.0.0.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 8.456/8.456/8.456/0.000 ms
vyos@Client:~$ ping 10.0.0.3 count 1 PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data. 64 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=7.10 ms
--- 10.0.0.3 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 7.096/7.096/7.096/0.000 ms
That looks a lot better, let’s check out our EVPN routes again:
vyos@PE1:~$ show bgp l2vpn evpn route type 2 BGP table version is 22, local router ID is 10.1.2.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id] EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP] EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP] EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP] EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path Extended Community Route Distinguisher: 10.1.2.1:2 *> [2]:[0]:[48]:[0c:3e:de:5c:00:01] 10.1.2.1 32768 i ESI:03:aa:bb:cc:dd:ee:f0:00:00:64 ET:8 RT:65000:100 *> [2]:[0]:[48]:[0c:3e:de:5c:00:01]:[32]:[10.0.0.10] 10.1.2.1 32768 i ESI:03:aa:bb:cc:dd:ee:f0:00:00:64 ET:8 RT:65000:100 ND:Proxy Route Distinguisher: 10.1.2.2:2 *>i [2]:[0]:[48]:[0c:3e:de:5c:00:01] 10.1.2.2 100 0 i ESI:03:aa:bb:cc:dd:ee:f0:00:00:64 RT:65000:100 ET:8 *>i [2]:[0]:[48]:[0c:3e:de:5c:00:01]:[32]:[10.0.0.10] 10.1.2.2 100 0 i ESI:03:aa:bb:cc:dd:ee:f0:00:00:64 RT:65000:100 ET:8 ND:Proxy *>i [2]:[0]:[48]:[0c:a8:00:00:01:01] 10.1.2.2 100 0 i ESI:03:aa:bb:cc:dd:ee:f0:00:00:64 RT:65000:100 ET:8 ND:Proxy
Here we can see that we have the MAC and ARP learned:
The end goal of our lab, is to allow the client to be able to access the internet through either PE, with the benefit that the traffic could be load-balanced across both PEs, instead of being active-backup like with something like VRRP. Let’s configure a gateway for the client to use.
Configuring an Anycast Gateway
We want each PE to be able to respond as the gateway, so we need to give each device the same IP and MAC address to prevent MAC flaps from occuring.
PE1 and PE2:
set interfaces bridge br0 address '10.0.0.1/24' set interfaces bridge br0 mac 'aa:bb:cc:dd:ee:f0'
Let’s see if we can ping the internet now:
vyos@Client:~$ ping 1.1.1.1 count 1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=9.18 ms
--- 1.1.1.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 9.180/9.180/9.180/0.000 ms
vyos@Client:~$ traceroute 1.1.1.1 traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets 1 10.0.0.2 (10.0.0.2) 7.859 ms 8.429 ms 10.152 ms .... hops omitted
Pings work, and we can see that we’re routing through PE1 (PE1 owns 10.0.0.2) to get to the internet.
Now it’s time to test failover. Let’s disable the interface between SW1 and PE1, as well as SW2 to Client, which should force internet bound traffic through PE2 (Client->SW1->SW2->PE2->Internet):
vyos@Client:~$ ping 1.1.1.1 count 1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=9.67 ms
--- 1.1.1.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 9.671/9.671/9.671/0.000 ms
vyos@Client:~$ traceroute 1.1.1.1 traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets 1 10.0.0.3 (10.0.0.3) 6.040 ms 5.833 ms 5.706 ms .... hops omitted
Now we’re routing to the internet through PE2. Let’s go deeper into failure scenarios. This is the scenario that you actually want ESI to solve. We’re going to disable the connection to the internet on PE2. This will make traffic need to traverse Client->SW1->SW2->PE2->PE1–>Internet. If you remember, we configured default-originate between our PEs, so we should have 2 default routes on each router. Without ESI, we wouldn’t be able to forward layer2 around the failure. Let’s first look at PE2:
vyos@PE2:~$ show ip route Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure
S>* 0.0.0.0/0 [150/0] via 10.0.101.1, eth0, weight 1, 00:00:17 B 0.0.0.0/0 [200/0] via 10.1.2.1, eth1, weight 1, 01:07:48
The local default has a preference (AD) of 150, making it preferred. Let’s disable the internet interface on PE2, which will leave only the single default route through PE1:
vyos@PE2:~$ show ip route Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure
B>* 0.0.0.0/0 [200/0] via 10.1.2.1, eth1, weight 1, 01:12:01
vyos@Client:~$ traceroute 1.1.1.1 traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets 1 10.0.0.3 (10.0.0.3) 5.790 ms 6.082 ms 5.932 ms 2 * * * .... more failed hops omitted
Always 1 step forward, 1 step back isn’t it. If we look on PE1, we can see what is going on:
vyos@PE1:~$ show arp Address Interface Link layer address State ------------ ----------- -------------------- ------- 10.0.0.10 br0 FAILED
PE1 no longer knows the MAC of 10.0.0.10 (the client). You might ask what is causing that, well it’s our anycast MAC address in this specific failure scenario. When PE1 ARPs for 10.0.0.10, it puts the anycast IP and MAC in the request. PE2 will receive the response from the client first. PE2 also owns that IP and MAC, so it’ll drop the response from the client, since it has no idea that the ARP request was ever sent from PE1. It just thinks it’s junk traffic from the client.
The solution is to make sure the bridge interface has a unique MAC address between the PEs, which defeats our ability to use the bridge interface as the anycast gateway (since we can’t define multiple MAC addresses on the bridge interface). But we can move the anycast gateway to a MACVLAN interface, and add that to the bridge. Let’s go ahead and configure that.
Configure a MACVLAN interface for the Anycast Gatway
We first need to delete the anycast gateway config from the bridge.
NOTE: You’ll need to reboot VyOS to have it generate a new random MAC address. Don’t forget to save the config before rebooting.
After you’ve rebooted, let’s configure the MACVLAN interface, VyOS calls it a pseudo-ethernet interface.
PE1 and PE2:
set interfaces pseudo-ethernet peth0 address '10.0.0.1/32' set interfaces pseudo-ethernet peth0 mac 'aa:bb:cc:dd:ee:f0' set interfaces pseudo-ethernet peth0 source-interface 'br0'
We can’t add peth0 directly to the bridge, we need to set the source-interface of the interface as br0. Let’s test pinging the internet from the client again:
vyos@Client:~$ ping 1.1.1.1 count 1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=64.3 ms
--- 1.1.1.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 64.286/64.286/64.286/0.000 ms
vyos@Client:~$ traceroute 1.1.1.1 traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets 1 10.0.0.3 (10.0.0.3) 6.518 ms 9.130 ms 8.962 ms 2 10.1.2.1 (10.1.2.1) 8.630 ms 9.715 ms 9.550 ms .... hops omitted
You can see traffic hits PE2, and then goes across the interface between the PEs to PE1 before heading towards the internet. So are we fully good then since we’re routing around the failure? Well, kind of. We introduced a new issue which we can see by looking at ARP on the client:
vyos@Client:~$ show arp Address Interface Link layer address State --------- ----------- -------------------- ------- 10.0.0.1 bond0 ce:74:b3:e6:a7:a9 DELAY
We are not using the correct MAC address for the anycast gateway. This is because by default, a LInux bridge will not respond with the MAC behind the bridge, it’ll respond using the bridges MAC address. We need the client to receive the anycast MAC in the ARP request so that it can load-balance using anycast. We need to configure this on both of the PEs: This tells the bridge to ignore ARP messages that are not for the bridge itself (like members). More info: https://sysctl-explorer.net/net/ipv4/arp_ignore/
set interfaces bridge br0 ip enable-arp-ignore
Let’s try to ping the internet again:
vyos@Client:~$ ping 1.1.1.1 count 1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. From 10.0.0.10 icmp_seq=1 Destination Host Unreachable
vyos@Client:~$ ping 1.1.1.1 count 1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=13.6 ms
--- 1.1.1.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 13.623/13.623/13.623/0.000 ms
vyos@Client:~$ show arp Address Interface Link layer address State --------- ----------- -------------------- --------- 10.0.0.1 bond0 aa:bb:cc:dd:ee:f0 REACHABLE
And finally, we are routing around the failure, and we have the MAC address we expect to have in the ARP resolution. So now we must be done right? Well, not quite….I swear we’re close, but there’s one more issue we need to resolve. To see the issue, let’s ping the gateway from the client (more than once).
NOTE: This would also happen if pinging the internet, and the internet connection from PE2 was available. In our failure scenario, PE2’s internet connection is down, so we need to ping the gateway to see the issue.
vyos@Client:~$ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=5.81 ms 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=6.17 ms (DUP!) 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=5.19 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=6.80 ms (DUP!)
We’re duplicating packets in this failure scenario. It’s important to note, assuming traffic hits PE1 first, it wouldn’t duplicate without this failure scenario. Why? What is the issue? It’s that pesky designated forwarder again.
When traffic hits PE2, it responds to the ping since it owns the address, but remember how we couldn’t add the MACVLAN interface directly to the bridge, this means the bridge doesn’t know which interface to route it out of, making the traffic unknown unicast. PE2 responds to the ping, but also forwards to the DF, which will respond to the ping as well.
We can see that in the bridge fdb. br0 owns that MAC, but it’s just a floating MAC on the bridge. The MAC is actually owned by peth0:
vyos@PE2:~$ sudo bridge fdb show | grep aa:bb aa:bb:cc:dd:ee:f0 dev br0 self permanent
We can solve this with MAC anchoring, by anchoring the MAC address to an interface that won’t resond to ARP, and then adding that interface to the bridge. We can use a dummy interface for this (don’t add the anycast IP address, just the MAC):
PE1 and PE2:
set interfaces bridge br0 member interface dum100 set interfaces dummy dum100 mac 'aa:bb:cc:dd:ee:f0'
Now let’s look at the bridge fdb again, and we can see an interface in the bridge now owns the MAC, making it known Unicast and not BUM traffic:
vyos@PE2:~$ sudo bridge fdb show | grep aa:bb aa:bb:cc:dd:ee:f0 dev br0 self permanent aa:bb:cc:dd:ee:f0 dev dum100 vlan 1 master br0 permanent aa:bb:cc:dd:ee:f0 dev dum100 master br0 permanent
And finally, let’s see the pings again to see if we’re duplicating traffic or not:
vyos@Client:~$ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=5.43 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=6.02 ms 64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=6.56 ms
But are we actually done? I’m kidding, yes, we’ve finally reached the finish line.
Conclusion
While there’s a lot involved to get this up and running, the config will be largely identical from PE to PE, so it’s highly template-able. Expanding it is actually quite simple, we can just configure more bonds and bridges, or even better, make the design VLAN aware, which we’ll be doing in part 2 of this series.
2 responses to “ESI-Multihoming on VyOS: Part 1 – Basic Setup”
Anupam Murikurthy
Hello! This is a great write-up! I love your blog, so much awesome VYOS content!
However, I wanted to point out that EVPN multihoming and split horizon filters have not been implemented yet in FRR. This is why you had to enable STP on the VTEPs to consume the NXOS BPDUs and not forward them back into the ES and have STP block ports. This is not normal behavior and there is a true problem. Broadcast traffic will flood from non-DF to the DF and then flood into the same ES and create MAC flapping in downstream switch CAM tables.
There is an issue open on the FRR GitHub for this exact problem. Please help us get more visibility on the issue so the maintainers can fix it.
Yeah, there’s a few things that are not 100%. Another one is the ability to protodown interfaces so traffic isn’t blackholed when a PE loses access to it’s fabric connections.
As far as the split-horizon rules, I don’t know that FRR ever really intended for that to be something FRR does, since FRR is effectively just a control-plane. They do implement a hook (on_rib_process_dplane_results) that can be used to configure a system. So in this case, VyOS would listen at that hook, and either disable ucast/mcast/bcast flooding on the non-DFs towards the DF (easiest, but kind of brute force). Or simply mark BUM traffic with nftables and block it towards the ESI ports.
Leave a Reply