Why INT Telemetry Matters for AI Data Center Switches
AI and machine learning training clusters place extreme demands on the data center network. GPU-to-GPU communication — typically over RoCE v2 (RDMA over Converged Ethernet) — is highly latency-sensitive and congestion-intolerant. When a single flow stalls or a link degrades, training jobs can slow by orders of magnitude or fail entirely.
Traditional SNMP polling and sFlow/NetFlow sampling provide coarse-grained, after-the-fact visibility. They cannot tell you which hop introduced 10 microseconds of added latency or which port experienced micro-burst congestion during a collective AllReduce operation. This is where In-band Network Telemetry (INT) changes the equation.
INT is a data plane telemetry framework originally defined in the P4/INT specification (p4.org). It embeds telemetry metadata — switch ID, ingress/egress port, queue depth, latency, and timestamp — directly into the packet header as the packet traverses each INT-capable switch hop. The destination host or a dedicated collector extracts and reports this per-hop data, giving operators real-time, hop-by-hop path visibility.
For Australian enterprises deploying private AI infrastructure — whether on-premise GPU clusters or collocated AI racks — INT telemetry on SONiC-based open switches eliminates the need to buy into a proprietary vendor’s closed visibility stack. It also integrates naturally with SONiC’s container-based, modular architecture.
This playbook covers the full deployment lifecycle: understanding INT architecture, verifying ASIC and platform support, planning collector infrastructure, deploying INT on a SONiC spine-leaf fabric, and integrating telemetry data into operational workflows.
INT Architecture Fundamentals
In-band Network Telemetry operates by inserting a shim header and INT instruction stack into packets as they traverse the network. Each INT-capable switch along the path appends requested metadata fields based on the instruction bitmap carried in the packet.
INT Header Structure
The INT header consists of three key components:
- INT Shim Header: Identifies the packet as an INT packet and specifies the INT type (e.g., hop-by-hop, destination).
- INT Metadata Header: Contains the instruction bitmap that tells each switch what data to collect — switch ID, ingress port ID, egress port ID, hop latency, queue depth/occupancy, egress timestamp, and ingress timestamp.
- INT Stack: The accumulated per-hop metadata, appended by each switch as the packet transits.
INT Operating Modes
| Mode | Description | Best For |
|---|---|---|
| Hop-by-hop (HBH) | Every intermediate switch appends metadata | Full path visibility in spine-leaf topologies |
| Destination only | Only the egress leaf extracts and reports metadata | Reduced overhead, endpoints handle reporting |
| Source-to-destination | Source inserts INT request, destination removes and reports | End-to-end path tracing for specific flows |
Key Data Fields Collected Per Hop
- Switch ID: Identifies which switch in the fabric processed the packet
- Ingress/Egress Port ID: Which physical port received and forwarded the packet
- Hop Latency: Time spent at this switch (typically in nanoseconds)
- Queue Occupancy/Depth: Buffer utilization at egress — critical for detecting micro-burst congestion
- Egress Transmit Utilization: Link utilization percentage at the time of transit
- Timestamps: Ingress and egress timestamps for precise latency computation
INT and RoCE v2 Interaction
For AI workloads using RoCE v2, INT can be applied selectively to RDMA traffic. This is important because:
- RoCE v2 uses UDP encapsulation, and INT headers can be inserted between the outer headers and the payload
- PFC (Priority Flow Control) pauses and ECN-marked packets can be correlated with INT queue depth data
- Collective operations (AllReduce, AllGather) benefit from per-hop visibility into where tail latency accumulates
INT traffic itself is regular Ethernet/IP traffic — it does not require special control plane protocols. The telemetry overhead per packet is typically 8 to 16 bytes per hop for basic fields, scaling with the number of metadata instructions enabled.
SONiC INT Support: What You Need to Know
SONiC (Software for Open Networking in the Cloud) is an open-source network operating system that runs on switches from multiple hardware vendors and ASIC families. It is a Linux Foundation project with broad industry backing, including contributions from major cloud providers and network silicon vendors.
SONiC Architecture Relevant to INT
SONiC’s container-based architecture means that each network function runs in its own Docker container. For INT telemetry, the relevant components include:
- SWSS (Switch State Service): Translates INT configuration into ASIC-specific SAI (Switch Abstraction Interface) calls
- Syncd: Communicates with the ASIC SDK via SAI to program INT instructions
- Telemetry Agent: Collects and exports INT data from the switch (gNMI/gRPC-based streaming telemetry)
- SAI INT Extensions: SAI API extensions for INT that define how INT instructions are programmed into the forwarding pipeline
ASIC Requirements
Not all network ASICs support INT. For SONiC deployments, the ASIC must support P4-compatible INT metadata insertion. The following table summarizes known INT-capable ASIC families relevant to SONiC:
| ASIC Vendor | Family | INT Support | Max Port Speed | Notes |
|---|---|---|---|---|
| Memory-based pipeline (Memory/Memory) | Memory-based pipeline (Memory/Memory) | Yes | Up to 800G | Memory-based pipeline (Memory/Memory) ASICs are the primary INT-capable silicon for SONiC |
| Memory-based pipeline (Memory/Memory) | Memory-based pipeline (Memory/Memory) | Via P4 SAI | Up to 100G | Memory-based pipeline (Memory/Memory) — confirmed INT support |
SONiC Version Requirements
INT telemetry support in SONiC depends on the specific SONiC distribution and version:
- Community SONiC: INT support is available via PINS (P4 Integrated Network Stack) and SAI INT extensions, but maturity varies by branch
Configuration Interfaces
SONiC supports multiple configuration interfaces for INT:
- config_db.json: Direct JSON configuration of INT session parameters
- SONiC CLI: Some Enterprise SONiC distributions provide CLI commands for INT
- RESTCONF/NETCONF: Programmable configuration via YANG models (for integration with AIDC Controller)
- gNMI: For streaming telemetry data export
Pre-Deployment Decision Criteria
Before deploying INT telemetry on your xSONIC data center switches, evaluate the following decision criteria:
Decision Matrix: Is INT Right for Your AI Fabric?
| Criterion | Requirement | Your Environment |
|---|
When INT Is Essential vs. Optional
INT is essential when:
- Your AI training jobs are sensitive to tail latency (P99 > 2x P50)
- You experience intermittent training slowdowns you cannot diagnose with SNMP/sFlow
- You need to validate that your RoCE v2 fabric meets specific latency SLAs (e.g., < 2 microseconds per hop)
- You are deploying a new GPU backend fabric and want built-in observability from day one
INT is optional when:
- Your AI cluster is small (fewer than 8 nodes) and traffic patterns are well understood
- You already have deep packet capture infrastructure in place at key aggregation points
- Your ASIC platform does not support INT (in this case, consider sFlow with extended metadata or eBPF-based alternatives)
Alternative Visibility Methods Comparison
| Method | Granularity | Overhead | Hop-by-Hop | AI Fabric Suitability |
|---|---|---|---|---|
| INT (In-band Network Telemetry) | Per-packet, per-hop | Low (8-16B/hop) | Yes | Best — purpose-built for data plane visibility |
| sFlow with extensions | Sampled (1-in-N) | Medium | No (egress only) | Good for aggregate traffic analysis, poor for micro-burst detection |
| NetFlow/IPFIX | Sampled or all flows | Medium-High | No | Flow-level only, no per-hop latency |
| SNMP polling | Counter-based | Low | No | Too coarse for AI fabric troubleshooting |
| Mirror/port capture | Full packet | Very high | No (single point) | Useful for deep debugging, not scalable |
| eBPF/XDP telemetry | Custom | Variable | Depends on implementation | Emerging option, requires custom development |
Related xSONiC Resources
Sources Reviewed
- SONiC Foundation: https://sonicfoundation.dev/
- Supports: input source for finding, recommendation, claim, and evidence review.
- SONiC GitHub: https://github.com/sonic-net/SONiC
- Supports: input source for finding, recommendation, claim, and evidence review.
- Azure SONiC Documentation: https://azure.github.io/SONiC
- Supports: input source for finding, recommendation, claim, and evidence review.
- Open Compute Networking: https://www.opencompute.org/projects/networking
- Supports: input source for finding, recommendation, claim, and evidence review.
- Broadcom Ethernet Switching: https://www.broadcom.com/products/ethernet-connectivity/switching
- Supports: input source for finding, recommendation, claim, and evidence review.
- Marvell Switching: https://www.marvell.com/products/switching.html
- Supports: input source for finding, recommendation, claim, and evidence review.
- NVIDIA Ethernet Switching: https://www.nvidia.com/en-us/networking/ethernet-switching
- Supports: input source for finding, recommendation, claim, and evidence review.
- Continue: https://www.nvidia.com/
- Supports: input source for finding, recommendation, claim, and evidence review.