Blog

EVPN-VXLAN Underlay and Overlay Design for Enterprise SONiC Switches: A Practical Architecture Guide

A practical architecture guide to designing EVPN-VXLAN fabrics on Enterprise SONiC switches, covering underlay routing, overlay control plane, multi-tenancy, and operational considerations for Australian data center and

By xSONiC Team · · SONiCopen networkingdata centerAI fabricEthernetautomation

Why EVPN-VXLAN on SONiC Matters Now

Enterprise network teams in Australia face a familiar tension. Data center fabrics need to scale across multiple sites, support tenant isolation for hybrid cloud workloads, and handle the east-west traffic patterns that AI and distributed applications demand. At the same time, proprietary NOS platforms lock buyers into single-vendor upgrade cycles and opaque licensing structures.

SONiC (Software for Open Networking in the Cloud) addresses this by offering a production-hardened, open-source network operating system that supports EVPN-VXLAN natively. Built on a modular, container-based architecture where each network function runs in its own Docker container, SONiC provides the fault isolation and simplified upgrade paths that monolithic switch software cannot match.

For Australian enterprises evaluating a campus refresh or data center fabric build, the question is no longer whether SONiC can run EVPN-VXLAN. The question is how to design the underlay and overlay correctly for your environment.

This guide walks through the practical architecture decisions.


EVPN-VXLAN Architecture Fundamentals

Before diving into SONiC-specific design, it helps to establish the two-layer model that EVPN-VXLAN uses.

The Underlay: IP Reachability

The underlay is a pure Layer 3 IP fabric that provides reachability between VTEP (VXLAN Tunnel Endpoint) addresses. It carries encapsulated VXLAN traffic but has no knowledge of tenant or VLAN state.

Design choices for the underlay include:

  • Routing protocol: BGP (eBGP or iBGP), OSPF, or IS-IS. BGP is the most common choice in SONiC deployments because SONiC has mature BGP support and its architecture was originally designed for large-scale BGP fabrics in cloud provider environments.
  • Addressing: Typically loopback addresses assigned to each VTEP. IPv4 is standard, but IPv6 underlay support is available.
  • Topology: Spine-leaf (Clos) is the recommended topology. Every leaf switch connects to every spine switch, creating a predictable, non-blocking fabric with well-understood failure domains.

The Overlay: Tenant and Service Delivery

The overlay carries the EVPN control plane and VXLAN data plane. EVPN uses MP-BGP to distribute MAC/IP binding information, while VXLAN encapsulates Layer 2 frames in UDP packets for transport across the Layer 3 underlay.

Key overlay design decisions:

  • EVPN route types: Type 2 (MAC/IP advertisement), Type 3 (multicast Ethernet segment), Type 5 (IP prefix advertisement for inter-subnet routing).
  • Multi-tenancy model: VRF-based tenant isolation with EVPN Type 5 routes for inter-VRF communication via a centralized or distributed gateway.
  • Symmetric vs. asymmetric IRB: Symmetric IRB is generally preferred for production fabrics because it avoids the trombone routing problem and scales better with the number of subnets.

Underlay Design Choices on SONiC

BGP as the Preferred Underlay Protocol

SONiC was designed from the ground up for BGP-centric networking. The FRRouting (FRR) suite runs inside a dedicated container, providing a full BGP implementation that supports:

  • eBGP underlay with unique AS per leaf and per spine
  • iBGP with route reflectors for larger fabrics
  • BGP unnumbered for simplified addressing on point-to-point leaf-spine links

For Australian enterprise deployments, eBGP with unique AS per device is the recommended starting point. It simplifies troubleshooting because AS path analysis immediately shows the path through the fabric, and it avoids the complexity of route reflector placement.

Spine-Leaf Topology Sizing

A typical enterprise spine-leaf fabric on SONiC-based switches might look like this:

ComponentSmall (PoC/Lab)Medium (Single DC)Large (Multi-Site)
Spine switches244-8 per site
Leaf switches416-3232-64 per site
Uplink speed100G100G/400G400G/800G
Server-facing ports25G/100G25G/100G100G/200G
VTEP loopback pool/32 per device/32 per device/32 per device

Underlay IP Addressing

Keep the underlay addressing simple:

  • Assign a /32 loopback address to each switch for its VTEP source IP.
  • Use /31 point-to-point links between leaf and spine switches.
  • Reserve a contiguous block for future fabric expansion.

Overlay Design: EVPN Control Plane on SONiC

EVPN Multi-Homing

SONiC supports EVPN multi-homing (ESI-based), which allows a server or downstream switch to connect to multiple leaf switches with active-active forwarding. This eliminates the need for legacy MC-LAG configurations in the data center and improves utilization of all uplinks.

Key considerations:

  • All-active multi-homing: Preferred for most deployments. Both leaf switches forward traffic for the multi-homed host.
  • Split-horizon filtering: SONiC handles this through EVPN ESI split-horizon rules, preventing loops and duplicate traffic.
  • DF (Designated Forwarder) election: SONiC supports the standard DF election algorithm for multi-homed Ethernet segments.

Inter-VLAN Routing with Symmetric IRB

For tenant workloads that span multiple subnets, symmetric Integrated Routing and Bridging (IRB) is the recommended approach:

  1. Traffic is routed at the source VTEP toward the destination VTEP.
  2. The destination VTEP performs a second routing lookup for the final hop.
  3. No traffic trombone occurs, even when source and destination hosts are on different subnets.

SONiC implements symmetric IRB through its VXLAN and EVPN modules. The configuration involves defining VLAN-VNI mappings, enabling IRB interfaces, and configuring EVPN Type 5 routes for prefix advertisement.

Multi-Tenancy with VRFs

Each tenant gets its own VRF instance on SONiC. Inter-tenant communication (when required) flows through a controlled routing policy, typically via a centralized firewall or gateway.

Tenant Isolation MethodUse CaseSONiC Support
VRF-lite per tenantSimple multi-tenancyNative
EVPN Type 5 prefix routesInter-subnet routing across VTEPsNative
Route-target filteringControlling which tenants see which routesVia FRR BGP config
ACL-based micro-segmentationPer-flow or per-application policyVia SAI/ASIC ACLs

Operational Considerations for Australian Enterprises

SONiC Configuration and Automation

SONiC uses JSON-based configuration (config_db.json) as its primary configuration model. For enterprise teams accustomed to CLI-driven workflows, this requires an adjustment, but it brings significant advantages:

  • Version control: Configuration files integrate naturally with Git-based workflows.
  • Idempotent automation: Tools like Ansible, Puppet, or custom Python scripts can manage SONiC configurations predictably.
  • NETCONF/YANG: SONiC has growing support for NETCONF and YANG models, enabling integration with existing network management platforms.

Monitoring and Visibility

SONiC provides several built-in monitoring capabilities:

  • show CLI commands for real-time state inspection
  • Streaming telemetry support for integration with monitoring platforms like Prometheus, Grafana, or ELK
  • LLDP for neighbor discovery and topology mapping
  • BGP session monitoring through FRR commands

For Australian enterprises running hybrid monitoring stacks (on-premises SIEM with cloud-based analytics), SONiC’s streaming telemetry exports integrate with standard collection pipelines.

Hardware Selection: ASIC Considerations

SONiC runs on switches from multiple vendors and ASICs, as documented by the SONiC Foundation. The Switch Abstraction Interface (SAI) decouples the NOS from the silicon, giving buyers flexibility in hardware selection.

When selecting hardware for an EVPN-VXLAN fabric, evaluate:

  • VXLAN VTEP hardware offload: Not all ASICs support hardware VTEP termination. Confirm that the switch ASIC handles VXLAN encapsulation and decapsulation in hardware.
  • VRF scale: Check the maximum number of VRFs the ASIC supports.
  • ACL and TCAM capacity: Multi-tenant EVPN fabrics with micro-segmentation require adequate TCAM resources.
  • Table sizes: Route scale, MAC address table size, and ECMP group size should align with your fabric design.

NVIDIA Spectrum switches, for example, support SONiC and offer hardware VXLAN offload with documented scale characteristics up to 512K IPv4 routes on current-generation silicon.


Design Checklist: EVPN-VXLAN Fabric on SONiC

Use this checklist when planning a new EVPN-VXLAN deployment on SONiC-based switches:

  • Define fabric scale: number of spines, leaves, and future growth targets
  • Select underlay addressing scheme (loopback /32s, /31 point-to-point links)
  • Choose underlay routing protocol (eBGP with unique AS per device recommended)
  • Define VTEP loopback address pool
  • Plan tenant VRF instances and route-target assignments
  • Choose IRB mode (symmetric IRB recommended)
  • Confirm ASIC support for hardware VXLAN offload on selected switches
  • Define VLAN-to-VNI mapping for each tenant
  • Plan inter-VRF routing policy (centralized gateway or distributed)
  • Set up configuration management (Git + Ansible, or equivalent)
  • Design monitoring and telemetry pipeline
  • Validate with a PoC before production rollout

Linking EVPN-VXLAN to Your Broader Fabric Strategy

EVPN-VXLAN does not exist in isolation. For many Australian enterprises, the EVPN fabric is the foundation layer that supports:

  • AI/ML cluster backends: VXLAN segments isolate GPU traffic and RDMA flows. See the xSONIC AI Fabric solution for how EVPN-VXLAN connects to high-performance compute fabrics.
  • Campus-to-DC extension: EVPN-VXLAN can extend from the data center leaf into the campus aggregation layer, enabling consistent policy across sites.
  • Multi-site fabrics: EVPN-VXLAN with Type 5 routes enables inter-site connectivity without MPLS, which is relevant for enterprises with geographically distributed operations across Australian states.

The open nature of SONiC means you can start with a single-vendor hardware platform and expand to multi-vendor as your fabric grows, without redesigning the overlay.


What This Means for Your Next Refresh

If your current data center fabric relies on a proprietary NOS with EVPN-VXLAN support, the path to SONiC is shorter than many teams expect. The protocol behavior is standards-based. The configuration model is documented. The community is active, with the SONiC Foundation operating under the Linux Foundation with contributions from major networking silicon and equipment vendors.

The risk is not in the protocol. It is in the operational transition. Teams need to plan for:

  • Training on SONiC’s configuration model (JSON-based, CLI supplementary)
  • Integration with existing monitoring and automation toolchains
  • Hardware selection that confirms VXLAN and EVPN feature support on the target ASIC

For Australian enterprises, where data sovereignty and operational simplicity are priorities, SONiC-based EVPN-VXLAN fabrics offer a credible alternative to proprietary stacks with the added benefit of community-driven feature development and no per-switch NOS licensing costs.

Sources Reviewed