XS-DC-32X400-SP-G2
Data Center AI32-port 400G spine/core switch for high-capacity data center fabrics and AI-ready backbones.
- 12.8Tbps
- 5,600Mpps
Data Center Solution
Build predictable lossless Ethernet for RDMA and GPU workloads.
RoCEv2 carries RDMA over routed Ethernet and is widely used in AI, HPC, and high-performance storage networks. It gives applications low-latency remote memory access while preserving the operational flexibility of Ethernet.
RoCEv2 is also unforgiving. Packet loss, congestion, incorrect priority mapping, or inconsistent lossless policy can quickly reduce job performance. An xSONiC RoCEv2 design should therefore treat QoS, routing, telemetry, and failure testing as one system.
| Layer | Design Decision | Validation |
|---|---|---|
| Application | Identify RDMA workloads and traffic phases. | Test all-reduce, storage, and failure behavior. |
| Host NIC | Configure priorities, DSCP, ECN, and PFC expectations. | Confirm NIC counters and congestion response. |
| xSONiC leaf | Map traffic classes to queues and lossless priorities. | Check PFC, ECN, ETS, and DCBX state. |
| Fabric | Provide predictable ECMP, bandwidth, and convergence. | Validate path diversity and failure recovery. |
| Telemetry | Monitor queue depth, drops, pause frames, and latency. | Correlate network state with workload timing. |
RoCEv2 deployments should keep RDMA traffic classification explicit. Operators usually define a DSCP or priority value for RDMA, map that value to a queue, and then apply PFC only where lossless behavior is required.
Application traffic
|
v
Host NIC marks DSCP / priority
|
v
xSONiC leaf maps priority to queue
|
v
PFC / ECN / ETS policy applies to selected traffic class
| Mechanism | Purpose | Design Warning |
|---|---|---|
| PFC | Prevents packet loss for selected priorities. | Overuse can spread pause behavior across the fabric. |
| ECN | Marks congestion before queue overflow. | Thresholds must match buffer and workload behavior. |
| CNP | Tells senders to reduce rate after congestion feedback. | Feedback path delay matters during incast. |
| Fast CNP | Shortens sender notification in supported designs. | Requires flow awareness and careful validation. |
| ETS | Balances bandwidth among traffic classes. | Avoid starving non-RDMA operational traffic. |
GPU / storage servers
|
v
100G / 200G / 400G / 800G xSONiC leaves
|
v
High-radix xSONiC spines
|
v
Peer pods, storage, or backend GPU domains
Large AI clusters often separate backend GPU traffic, storage traffic, and frontend service traffic. Smaller deployments may share layers, but the QoS policy should still keep traffic classes explicit.
| Symptom | Likely Cause | Investigation |
|---|---|---|
| Training step time spikes | Queue buildup or path imbalance. | Inspect queue delay, ECN marks, and ECMP path distribution. |
| PFC pause storms | Lossless class under sustained pressure. | Check thresholds, traffic mix, and priority mapping. |
| RDMA retransmission | Lossless policy not consistent end to end. | Compare host NIC and switch QoS state. |
| Good average utilization but poor job performance | Microbursts or tail latency. | Use INT/IPT-style telemetry and workload phase correlation. |
xSONiC 400G and 800G switches fit backend AI fabrics where east-west bandwidth dominates. 100G and 200G systems are useful for storage, frontend, and migration layers where operational stability matters as much as raw port speed.
Related Products
Use these related platforms as a starting point for sizing, comparison, and follow-up discussion.
32-port 400G spine/core switch for high-capacity data center fabrics and AI-ready backbones.
64-port 800G AI fabric switch for large-scale GPU clusters, HPC backbones, and ultra-high-throughput data center networks.
64-port 200G leaf/spine switch for high-bandwidth storage, compute, and scale-out data center fabrics.
32-port 100G leaf/spine switch for VXLAN fabrics, RoCE-ready workloads, and tenant-scale routing.
Use the related products below to continue comparing platforms, or open a conversation if you need help mapping the solution to your environment.