AI Infrastructure

xSONiC AI Inference Server

Name: xSONiC AI Inference Server
Brand: xSONiC
Availability: InStock

High-density AMD Instinct inference platform for private AI services.

AMD Instinct MI355X and MI300X based 8-GPU inference platform for private LLM, RAG, multimodal and enterprise AI services.

Request Quote for xSONiC AI Inference Server Download Datasheet

8 AMD Instinct OAM GPUs
Up to 2.3 TB HBM3E with MI355X
AMD Infinity Fabric scale-up
PCIe Gen5 host I/O

xSONiC AI Inference Server front chassis view

Specification Overview

Technical specifications.

Use this table as the fast path for platform sizing, port planning, and software compatibility checks.

Category	AI Infrastructure
Rack Units	8U
Ports	Platform-dependent PCIe Gen5 host I/O / High-speed network integration
Switching Capacity	Integrated through xSONiC AI fabric and switching design
Forwarding Rate	Platform dependent
OS Version	xSONiC validated platform software with ROCm ecosystem support
Protocols	PCIe Gen5, AMD Infinity Fabric, Ethernet, RoCE, Kubernetes-ready service integration
Management	BMC, CLI/API, Telemetry, Deployment and lifecycle service options

Deployment Context

Where this platform fits.

Review positioning, capability notes, and deployment guidance for this xSONiC platform.

Overview

xSONiC AI Inference Server combines AMD Instinct MI355X and MI300X platform options with high-density HBM memory, PCIe Gen5 host connectivity, GPU-to-GPU fabric, and deployment services. It is designed for organizations running private assistants, enterprise search, document intelligence, coding support, and multimodal workflows where data locality, predictable throughput, and operational ownership matter.

Platform Options

Area	MI355X Platform Option	MI300X Platform Option
GPU configuration	8 AMD Instinct MI355X OAM GPUs on UBB 2.0 module	8 AMD Instinct MI300X OAM GPUs on UBB 2.0 module
GPU memory	288 GB HBM3E per GPU, approx. 2.304 TB total	192 GB HBM3 per GPU, 1.5 TB total
Memory bandwidth	Up to 8 TB/s per GPU	Up to 5.3 TB/s per GPU
GPU-to-GPU fabric	7 bidirectional AMD Infinity Fabric links per GPU at 153.6 GB/s	7 bidirectional AMD Infinity Fabric links per GPU at 128 GB/s
Host I/O	8 PCIe Gen5 x16 connections to host CPU	8 PCIe Gen5 x16 connections with 128 GB/s per GPU scale-out network bandwidth
Precision support	FP16, BF16, FP8, MXFP6, MXFP4	FP32/FP64 for HPC plus FP16/BF16/FP8/INT8 for AI
Power and cooling	Direct liquid-cooled option with up to 1400 W module TBP	750 W maximum TBP per GPU in the platform specification

Workload Fit

Private LLM and assistant service deployment
Retrieval-augmented generation and enterprise knowledge search
Document intelligence and review workflows
Multimodal AI service integration
Multi-user inference with batching and accelerator memory bandwidth

Datasheet

Download the xSONiC AI Inference Server datasheet

Typical deployment scenarios

Private LLM serving
Retrieval-augmented generation
Enterprise search
Document intelligence
Coding assistants
Multimodal AI

Technical Consultation

Request Quote for xSONiC AI Inference Server

Share your AI Infrastructure requirements, target topology, and rollout timing. xSONiC will help scope the right fit.

Request Quote for xSONiC AI Inference Server Browse Similar Platforms