AI Infrastructure

xSONiC AI Inference Server

High-density AMD Instinct inference platform for private AI services.

Back to AI Infrastructure

AMD Instinct MI355X and MI300X based 8-GPU inference platform for private LLM, RAG, multimodal and enterprise AI services.

  • 8 AMD Instinct OAM GPUs
  • Up to 2.3 TB HBM3E with MI355X
  • AMD Infinity Fabric scale-up
  • PCIe Gen5 host I/O
xSONiC AI Inference Server front chassis view

Specification Overview

Technical specifications.

Use this table as the fast path for platform sizing, port planning, and software compatibility checks.

Category AI Infrastructure
Rack Units 8U
Ports Platform-dependent PCIe Gen5 host I/O / High-speed network integration
Switching Capacity Integrated through xSONiC AI fabric and switching design
Forwarding Rate Platform dependent
OS Version xSONiC validated platform software with ROCm ecosystem support
Protocols PCIe Gen5, AMD Infinity Fabric, Ethernet, RoCE, Kubernetes-ready service integration
Management BMC, CLI/API, Telemetry, Deployment and lifecycle service options

Deployment Context

Where this platform fits.

Review positioning, capability notes, and deployment guidance for this xSONiC platform.

Overview

xSONiC AI Inference Server combines AMD Instinct MI355X and MI300X platform options with high-density HBM memory, PCIe Gen5 host connectivity, GPU-to-GPU fabric, and deployment services. It is designed for organizations running private assistants, enterprise search, document intelligence, coding support, and multimodal workflows where data locality, predictable throughput, and operational ownership matter.

Platform Options

AreaMI355X Platform OptionMI300X Platform Option
GPU configuration8 AMD Instinct MI355X OAM GPUs on UBB 2.0 module8 AMD Instinct MI300X OAM GPUs on UBB 2.0 module
GPU memory288 GB HBM3E per GPU, approx. 2.304 TB total192 GB HBM3 per GPU, 1.5 TB total
Memory bandwidthUp to 8 TB/s per GPUUp to 5.3 TB/s per GPU
GPU-to-GPU fabric7 bidirectional AMD Infinity Fabric links per GPU at 153.6 GB/s7 bidirectional AMD Infinity Fabric links per GPU at 128 GB/s
Host I/O8 PCIe Gen5 x16 connections to host CPU8 PCIe Gen5 x16 connections with 128 GB/s per GPU scale-out network bandwidth
Precision supportFP16, BF16, FP8, MXFP6, MXFP4FP32/FP64 for HPC plus FP16/BF16/FP8/INT8 for AI
Power and coolingDirect liquid-cooled option with up to 1400 W module TBP750 W maximum TBP per GPU in the platform specification

Workload Fit

  • Private LLM and assistant service deployment
  • Retrieval-augmented generation and enterprise knowledge search
  • Document intelligence and review workflows
  • Multimodal AI service integration
  • Multi-user inference with batching and accelerator memory bandwidth

Datasheet

Download the xSONiC AI Inference Server datasheet

Typical deployment scenarios

  • Private LLM serving
  • Retrieval-augmented generation
  • Enterprise search
  • Document intelligence
  • Coding assistants
  • Multimodal AI
Technical Consultation

Request Quote for xSONiC AI Inference Server

Share your AI Infrastructure requirements, target topology, and rollout timing. xSONiC will help scope the right fit.