Data Center Solution

IPTPath Telemetry Guide

Follow traffic quality hop by hop through the fabric.

Back to Data Center Solutions

Overview

IPTPath telemetry is a path-oriented INT technique for monitoring quality across an AI or data center fabric. It follows selected traffic through ingress, transit, and egress nodes, then exports path information to a collector for analysis.

The goal is not just to know that a fabric is busy. The goal is to know which path, node, queue, or link is creating latency, loss risk, or unstable behavior.

IPT Compared With Other INT Modes

DimensionBDCHDCIPTPath
Main triggerDrop or buffer overflow.Forwarding delay threshold.Path telemetry selection or sampling.
Main signalQueue occupancy and drop context.Delay and queue context.Per-hop path quality and queue/delay metrics.
Best useLoss diagnosis.Latency diagnosis.End-to-end path monitoring and localization.

Telemetry Domain

An IPTPath deployment defines a telemetry domain. Switches in that domain take different roles depending on where the monitored traffic enters, travels, and exits.

RoleResponsibilityExample
Ingress nodeIdentifies selected traffic and starts telemetry processing.Server-facing leaf switch.
Transit nodeAdds or updates path observations as traffic crosses the fabric.Spine or intermediate leaf.
Egress nodeFinalizes telemetry information and sends records to collector.Destination leaf or boundary switch.
CollectorStores, correlates, and presents telemetry data.Operations platform or telemetry pipeline.

Packet Processing View

Selected production flow
      |
      v
Ingress xSONiC switch marks or samples the flow
      |
      v
Transit switches add path and queue observations
      |
      v
Egress switch exports telemetry record
      |
      v
Collector builds path-quality view

What To Monitor

MetricWhy It MattersOperator Question
Per-hop forwarding delayReveals where packets are waiting.Which switch or queue is causing latency?
Queue depthIndicates congestion pressure.Is a lossless class close to pause or overflow?
Path identityShows the route traffic actually took.Did traffic follow the intended ECMP path?
Sampling rateControls data volume and visibility.Are we seeing enough without overwhelming collectors?

AI Fabric Use Cases

  • Identify slow links or hot queues during large distributed training jobs.
  • Compare path quality before and after ECMP, QoS, or routing changes.
  • Troubleshoot intermittent storage or checkpointing latency.
  • Validate whether backend GPU traffic stays on the intended fabric layer.
  • Correlate queue pressure with application phase timing.

Deployment Guidance

  1. Define the telemetry domain and select ingress, transit, and egress roles.
  2. Choose monitored traffic classes or flow selectors.
  3. Set sampling and export policy based on collector capacity.
  4. Correlate telemetry records with topology, workload timing, and queue policy.
  5. Review path quality during both steady-state and failure scenarios.

xSONiC Platform Fit

IPTPath telemetry fits xSONiC AI and data center fabrics where operators need more than device-level health. It is especially useful for large 400G and 800G fabrics where many equal-cost paths exist and path-level visibility is required to localize performance problems.

Related Products

Products commonly paired with this solution.

Use these related platforms as a starting point for sizing, comparison, and follow-up discussion.

XS-DC-64X200-LS-G1 front panel product image

XS-DC-64X200-LS-G1

Data Center AI

64-port 200G leaf/spine switch for high-bandwidth storage, compute, and scale-out data center fabrics.

12.8Tbps
19,040Mpps
XS-DC-64X800-AI-G1 front panel product image

XS-DC-64X800-AI-G1

Data Center AI

64-port 800G AI fabric switch for large-scale GPU clusters, HPC backbones, and ultra-high-throughput data center networks.

51.2Tbps
42,000Mpps
Next Step

Move from IPTPath Telemetry Guide into implementation.

Use the related products below to continue comparing platforms, or open a conversation if you need help mapping the solution to your environment.