Why This Decision Matters for Australian Private AI Builds
Enterprise teams across Australia are standing up private AI infrastructure for large language model inference, RAG pipelines, and multimodal services. The network fabric connecting GPU servers is not a secondary concern. It is the difference between a cluster that trains and infers at scale, and one that stalls on congestion, packet loss, or unpredictable tail latency.
Two dominant fabric technologies compete for this role: InfiniBand and Ethernet. InfiniBand has a long track record in HPC and large-scale training. Ethernet is the incumbent in every enterprise data center and campus, and modern high-speed Ethernet with RoCE v2, DCBX, and congestion notification mechanisms has closed the performance gap for many AI workloads.
For Australian buyers, the decision is shaped by factors beyond raw throughput: local vendor and integrator availability, support ecosystem depth, operational team skills, multi-tenancy requirements, and the cost and complexity of maintaining two separate network domains. This guide provides a structured framework for that decision, anchored to xSONIC’s open Ethernet AI fabric approach.
InfiniBand and Ethernet: Architecture Fundamentals
Understanding the structural differences between InfiniBand and Ethernet is a prerequisite for any informed fabric decision.
InfiniBand architecture
InfiniBand is a switched fabric interconnect designed from the ground up for low-latency, high-bandwidth server-to-server communication. Key characteristics include:
- Lossless transport: InfiniBand uses credit-based flow control at the link level, meaning buffers are reserved before transmission begins. This eliminates packet loss under normal operation.
- RDMA-native: Remote Direct Memory Access (RDMA) is a core protocol, not an overlay. Applications can read and write remote memory without involving the operating system kernel.
- Subnet manager model: The fabric is managed by a centralized subnet manager that computes and distributes routing tables. This simplifies path computation but creates a single management domain.
- Separate physical domain: InfiniBand requires its own adapters (HCAs), switches, and cables. It does not share infrastructure with standard Ethernet LANs.
Ethernet architecture
Ethernet is a family of standards (IEEE 802.3) originally designed for general-purpose LAN and WAN connectivity. For AI workloads, Ethernet has evolved significantly:
- Lossless Ethernet via PFC and ECN: Priority Flow Control (IEEE 802.1Qbb) and Explicit Congestion Notification allow Ethernet to emulate lossless behavior for RDMA traffic classes, though configuration complexity is higher than InfiniBand’s native approach.
- RoCE v2 (RDMA over Converged Ethernet v2): This protocol carries RDMA operations over standard UDP/IP Ethernet. It is supported by NVIDIA ConnectX NICs, Broadcom, and other vendors.
- Standard management tooling: Ethernet networks use familiar protocols (BGP, EVPN-VXLAN, SNMP, gNMI, NETCONF/YANG) and are managed with the same teams and tools that run the rest of the data center.
- Multi-purpose fabric: The same physical infrastructure can carry AI backend traffic, management traffic, storage, and east-west application traffic, though careful QoS and VLAN/priority planning is required.
Key architectural distinction: InfiniBand is a purpose-built AI/HPC fabric that requires a dedicated physical domain. Ethernet is a general-purpose fabric that can be hardened for AI workloads through protocol additions (RoCE v2, DCBX, PFC, ECN, Fast CNP). Both can deliver high throughput and low latency at 400G and 800G, but they differ in operational model, ecosystem, and cost structure.
Performance Comparison: Latency, Throughput, and Congestion Behavior
Raw bandwidth is converging. Both InfiniBand and Ethernet support 400 Gb/s today, with 800 Gb/s products available or announced from major vendors. The performance differences lie in latency consistency, congestion handling, and tail latency behavior under load.
Latency
- InfiniBand typically delivers lower single-hop latency (sub-microsecond for switch traversal) due to its simpler forwarding pipeline and credit-based flow control that avoids queuing delays.
- Ethernet with RoCE v2 at 400G delivers switch traversal latencies in the low microsecond range. Modern ASICs (such as those used in NVIDIA Spectrum-4 switches and equivalent Broadcom/Marvell silicon) have significantly reduced Ethernet forwarding latency.
- For large-scale training jobs running all-reduce or all-to-all collectives across hundreds of GPUs, the cumulative effect of per-hop latency differences can be meaningful. For inference workloads with smaller message sizes and fewer hops, the difference is often negligible.
Congestion handling
- InfiniBand’s credit-based flow control is inherently lossless and avoids the head-of-line blocking issues that can occur with PFC-based Ethernet.
- Ethernet PFC (Priority Flow Control) can create PFC storms if misconfigured, where congestion propagates backward across multiple hops. Proper DCBX configuration, ECN marking, and congestion notification (including Fast CNP mechanisms) mitigate this risk, but require careful engineering.
Throughput at scale
- Both technologies can saturate 400G links. Ethernet requires careful traffic engineering (ECMP load balancing, adaptive routing) to avoid elephant flow collisions on oversubscribed links.
- InfiniBand uses adaptive routing natively in many vendor implementations, distributing traffic across equal-cost paths dynamically.
- For AI backend fabrics where every GPU server has a dedicated high-speed uplink (non-oversubscribed leaf-spine), both technologies deliver near-identical aggregate throughput.
Bottom line for Australian buyers: For GPU clusters under approximately 128 nodes with non-oversubscribed leaf-spine topology, well-configured Ethernet with RoCE v2 delivers performance within a few percentage points of InfiniBand for most AI workloads. Above that scale, or for latency-critical HPC workloads, InfiniBand retains an advantage that buyers should evaluate against their specific workload profiles.
Decision Criteria Framework: Matching Fabric to Your AI Infrastructure
Use this framework to evaluate which fabric best fits your private AI deployment requirements.
| Criterion | Favors InfiniBand | Favors Ethernet | Notes |
|---|---|---|---|
| Cluster size (GPU nodes) | >128 nodes, hyperscale training | Up to ~128 nodes for most workloads | Larger clusters amplify latency and congestion management differences |
| Workload type | Large-scale distributed training (all-reduce, all-to-all) | Inference, RAG, fine-tuning, mixed AI+enterprise | Inference workloads are less sensitive to fabric latency |
| Existing network team skills | Dedicated HPC/networking team available | Enterprise network team with Ethernet experience | Training InfiniBand operations is a real cost |
| Multi-tenancy requirement | Dedicated AI cluster, single purpose | Shared fabric for AI, storage, management, enterprise | Ethernet supports VRF, EVPN-VXLAN multi-tenancy natively |
| Vendor flexibility | Accept single-vendor or limited vendor set | Require multi-vendor NOS, open switching, supply chain diversity | SONiC and open Ethernet allow NOS portability |
| Budget and TCO | Capex budget for dedicated HPC fabric | Opex-sensitive, want to extend existing Ethernet investments | Separate InfiniBand domain doubles cabling, optics, management |
| Operational tooling | UFM or vendor-specific fabric manager acceptable | Standard telemetry: gNMI, SNMP, streaming telemetry, sFlow | Ethernet tooling aligns with broader data center observability stacks |
| Future-proofing | Commitment to InfiniBand roadmap (NDR, XDR) | 400G/800G Ethernet roadmap, silicon photonics convergence | Ethernet silicon photonics is a convergence vector to watch |
Scoring approach: Assign each criterion a weight (1-5) based on your organization’s priorities. Score each fabric 1-5 per criterion. Multiply by weight and sum. The higher score indicates the stronger fit. Document your scoring and assumptions for stakeholder review.
Checklist: Pre-Deployment Fabric Evaluation for Private AI
Use this checklist before committing to a fabric technology or vendor.
Business and procurement
- Documented AI workload requirements (training vs inference, cluster size, model scale)
- TCO model comparing InfiniBand and Ethernet over 3-5 years, including cabling, optics, switches, NICs, support, and operational staff time
- Vendor and distributor shortlist confirmed for Australian market (with lead times)
- Board or steering committee approval for dedicated AI fabric vs shared Ethernet investment
Technical architecture
- Network topology designed (leaf-spine recommended for both fabrics)
- For Ethernet: RoCE v2, PFC, ECN, DCBX, and congestion notification configuration documented
- For Ethernet: QoS policy defining RDMA traffic class, lossless priority, and best-effort traffic
- For InfiniBand: Subnet manager HA plan, routing algorithm selection, partition key design
- Cable and optics plan for required speeds (100G/200G/400G/800G) with bill of materials
- NIC selection confirmed (ConnectX for InfiniBand or RoCE v2; verify firmware compatibility)
- GPU server NIC-to-switch port mapping and ECMP/adaptive routing plan
Operations and monitoring
- Telemetry pipeline defined (streaming telemetry, gNMI, sFlow, SNMP trap routing)
- Fabric health dashboard designed (link utilization, PFC pause frames, ECN marks, packet drops, RDMA retry counters)
- Fault domain isolation plan (blast radius per leaf, per rack, per spine)
- Runbook for fabric failure scenarios (link failure, switch failure, congestion event)
- For Ethernet with SONiC: NOS upgrade and rollback procedure documented
Related xSONiC Resources
Sources Reviewed
- Microsoft campus - Wikipedia: https://en.wikipedia.org/wiki/Microsoft_campus
- Supports: input source for finding, recommendation, claim, and evidence review.
- SONiC Foundation: https://sonicfoundation.dev/
- Supports: input source for finding, recommendation, claim, and evidence review.
- SONiC GitHub: https://github.com/sonic-net/SONiC
- Supports: input source for finding, recommendation, claim, and evidence review.
- Azure SONiC Documentation: https://azure.github.io/SONiC
- Supports: input source for finding, recommendation, claim, and evidence review.
- Open Compute Networking: https://www.opencompute.org/projects/networking
- Supports: input source for finding, recommendation, claim, and evidence review.
- Broadcom Ethernet Switching: https://www.broadcom.com/products/ethernet-connectivity/switching
- Supports: input source for finding, recommendation, claim, and evidence review.
- Marvell Switching: https://www.marvell.com/products/switching.html
- Supports: input source for finding, recommendation, claim, and evidence review.
- NVIDIA Ethernet Switching: https://www.nvidia.com/en-us/networking/ethernet-switching
- Supports: input source for finding, recommendation, claim, and evidence review.