Networking

Network performance, latency, and connectivity options including RDMA and high-speed interconnects.

Key Requirements

  • Out-of-the-box Slurm topology configurationSlurm
  • InfiniBand or RoCEv2 support
  • MPI distribution using higher performance HPC-X mpirun
  • NCCL configured in /etc/nccl.conf with NCCL_IB_GID_INDEX=3 set for RoCEv2
  • NCCL_MIN_NCHANNELS, NCCL_PROTO, NCCL_ALGO NOT set (auto-configuration)
  • SHARP support for enhanced performance
  • 4-node NCCL test within specification
  • PyTorch layer network performance within spec
  • NCCL monitoring plugin availability
  • Network bandwidth and latency testing

All evaluation criteria