Blog

INT Telemetry and Path Visibility for AI Data Center Switches: An xSONIC Deployment Playbook

A deep practical guide for Australian enterprise and data center buyers deploying In-band Network Telemetry (INT) on SONiC-based switches to achieve hop-by-hop path visibility in AI/ML GPU clusters. Covers INT

By xSONiC Team · · SONiCopen networkingdata centerAI fabricEthernetautomation

Why INT Telemetry Matters for AI Data Center Switches

AI and machine learning training clusters place extreme demands on the data center network. GPU-to-GPU communication — typically over RoCE v2 (RDMA over Converged Ethernet) — is highly latency-sensitive and congestion-intolerant. When a single flow stalls or a link degrades, training jobs can slow by orders of magnitude or fail entirely.

Traditional SNMP polling and sFlow/NetFlow sampling provide coarse-grained, after-the-fact visibility. They cannot tell you which hop introduced 10 microseconds of added latency or which port experienced micro-burst congestion during a collective AllReduce operation. This is where In-band Network Telemetry (INT) changes the equation.

INT is a data plane telemetry framework originally defined in the P4/INT specification (p4.org). It embeds telemetry metadata — switch ID, ingress/egress port, queue depth, latency, and timestamp — directly into the packet header as the packet traverses each INT-capable switch hop. The destination host or a dedicated collector extracts and reports this per-hop data, giving operators real-time, hop-by-hop path visibility.

For Australian enterprises deploying private AI infrastructure — whether on-premise GPU clusters or collocated AI racks — INT telemetry on SONiC-based open switches eliminates the need to buy into a proprietary vendor’s closed visibility stack. It also integrates naturally with SONiC’s container-based, modular architecture.

This playbook covers the full deployment lifecycle: understanding INT architecture, verifying ASIC and platform support, planning collector infrastructure, deploying INT on a SONiC spine-leaf fabric, and integrating telemetry data into operational workflows.

INT Architecture Fundamentals

In-band Network Telemetry operates by inserting a shim header and INT instruction stack into packets as they traverse the network. Each INT-capable switch along the path appends requested metadata fields based on the instruction bitmap carried in the packet.

INT Header Structure

The INT header consists of three key components:

  • INT Shim Header: Identifies the packet as an INT packet and specifies the INT type (e.g., hop-by-hop, destination).
  • INT Metadata Header: Contains the instruction bitmap that tells each switch what data to collect — switch ID, ingress port ID, egress port ID, hop latency, queue depth/occupancy, egress timestamp, and ingress timestamp.
  • INT Stack: The accumulated per-hop metadata, appended by each switch as the packet transits.

INT Operating Modes

ModeDescriptionBest For
Hop-by-hop (HBH)Every intermediate switch appends metadataFull path visibility in spine-leaf topologies
Destination onlyOnly the egress leaf extracts and reports metadataReduced overhead, endpoints handle reporting
Source-to-destinationSource inserts INT request, destination removes and reportsEnd-to-end path tracing for specific flows

Key Data Fields Collected Per Hop

  • Switch ID: Identifies which switch in the fabric processed the packet
  • Ingress/Egress Port ID: Which physical port received and forwarded the packet
  • Hop Latency: Time spent at this switch (typically in nanoseconds)
  • Queue Occupancy/Depth: Buffer utilization at egress — critical for detecting micro-burst congestion
  • Egress Transmit Utilization: Link utilization percentage at the time of transit
  • Timestamps: Ingress and egress timestamps for precise latency computation

INT and RoCE v2 Interaction

For AI workloads using RoCE v2, INT can be applied selectively to RDMA traffic. This is important because:

  • RoCE v2 uses UDP encapsulation, and INT headers can be inserted between the outer headers and the payload
  • PFC (Priority Flow Control) pauses and ECN-marked packets can be correlated with INT queue depth data
  • Collective operations (AllReduce, AllGather) benefit from per-hop visibility into where tail latency accumulates

INT traffic itself is regular Ethernet/IP traffic — it does not require special control plane protocols. The telemetry overhead per packet is typically 8 to 16 bytes per hop for basic fields, scaling with the number of metadata instructions enabled.

SONiC INT Support: What You Need to Know

SONiC (Software for Open Networking in the Cloud) is an open-source network operating system that runs on switches from multiple hardware vendors and ASIC families. It is a Linux Foundation project with broad industry backing, including contributions from major cloud providers and network silicon vendors.

SONiC Architecture Relevant to INT

SONiC’s container-based architecture means that each network function runs in its own Docker container. For INT telemetry, the relevant components include:

  • SWSS (Switch State Service): Translates INT configuration into ASIC-specific SAI (Switch Abstraction Interface) calls
  • Syncd: Communicates with the ASIC SDK via SAI to program INT instructions
  • Telemetry Agent: Collects and exports INT data from the switch (gNMI/gRPC-based streaming telemetry)
  • SAI INT Extensions: SAI API extensions for INT that define how INT instructions are programmed into the forwarding pipeline

ASIC Requirements

Not all network ASICs support INT. For SONiC deployments, the ASIC must support P4-compatible INT metadata insertion. The following table summarizes known INT-capable ASIC families relevant to SONiC:

ASIC VendorFamilyINT SupportMax Port SpeedNotes
Memory-based pipeline (Memory/Memory)Memory-based pipeline (Memory/Memory)YesUp to 800GMemory-based pipeline (Memory/Memory) ASICs are the primary INT-capable silicon for SONiC
Memory-based pipeline (Memory/Memory)Memory-based pipeline (Memory/Memory)Via P4 SAIUp to 100GMemory-based pipeline (Memory/Memory) — confirmed INT support

SONiC Version Requirements

INT telemetry support in SONiC depends on the specific SONiC distribution and version:

  • Community SONiC: INT support is available via PINS (P4 Integrated Network Stack) and SAI INT extensions, but maturity varies by branch

Configuration Interfaces

SONiC supports multiple configuration interfaces for INT:

  • config_db.json: Direct JSON configuration of INT session parameters
  • SONiC CLI: Some Enterprise SONiC distributions provide CLI commands for INT
  • RESTCONF/NETCONF: Programmable configuration via YANG models (for integration with AIDC Controller)
  • gNMI: For streaming telemetry data export

Pre-Deployment Decision Criteria

Before deploying INT telemetry on your xSONIC data center switches, evaluate the following decision criteria:

Decision Matrix: Is INT Right for Your AI Fabric?

CriterionRequirementYour Environment

When INT Is Essential vs. Optional

INT is essential when:

  • Your AI training jobs are sensitive to tail latency (P99 > 2x P50)
  • You experience intermittent training slowdowns you cannot diagnose with SNMP/sFlow
  • You need to validate that your RoCE v2 fabric meets specific latency SLAs (e.g., < 2 microseconds per hop)
  • You are deploying a new GPU backend fabric and want built-in observability from day one

INT is optional when:

  • Your AI cluster is small (fewer than 8 nodes) and traffic patterns are well understood
  • You already have deep packet capture infrastructure in place at key aggregation points
  • Your ASIC platform does not support INT (in this case, consider sFlow with extended metadata or eBPF-based alternatives)

Alternative Visibility Methods Comparison

MethodGranularityOverheadHop-by-HopAI Fabric Suitability
INT (In-band Network Telemetry)Per-packet, per-hopLow (8-16B/hop)YesBest — purpose-built for data plane visibility
sFlow with extensionsSampled (1-in-N)MediumNo (egress only)Good for aggregate traffic analysis, poor for micro-burst detection
NetFlow/IPFIXSampled or all flowsMedium-HighNoFlow-level only, no per-hop latency
SNMP pollingCounter-basedLowNoToo coarse for AI fabric troubleshooting
Mirror/port captureFull packetVery highNo (single point)Useful for deep debugging, not scalable
eBPF/XDP telemetryCustomVariableDepends on implementationEmerging option, requires custom development

Sources Reviewed