ClusterMAX 2.0Silver

Together

Adequate offering with noticeable gaps compared to Gold or Platinum. Room for improvement.

ByJordan NanosDaniel NishballDylan Patel
Published

Together Quick Stats

ClusterMAX Tier
Silver (3 / 5)
Source Rating Cycle
ClusterMAX 2.0
GPUs Offered
Not detailed in review
Slurm Support
Discussed in review
Kubernetes Support
Discussed in review
SOC 2 Mentioned
Not flagged
NCCL Benchmarks
In review
Last Updated
Nov 06, 2025

Want to model Together cluster cost? Calculate H100, H200, B200 & GB200 NVL72 TCO with the ClusterMAX calculator.

Together is a strong provider with a robust cluster offering for both slurm and kubernetes, but it is held back from the gold category due to reliability issues. When comparing offers, we hear from users that they generally expect a lower price per GPU-hr from Together to justify the trade-off in reliability. Together is among a few providers for which we tend to hear the most reliability complaints about from users operating clusters of 64 GPUs or more. We expect this is due to their use of a broad mix of datacenter partners, which creates a “roll-of-the-dice” dynamic for performance and stability. Unfortunately, Together also does not offer to do 1-week POCs with most customers, unlike other Silver, Gold, and Platinum tier providers, which makes it difficult for buyers to know what sort of experience they will have on the cluster before making a multi-million dollar commitment.

These are all the reasons why TogetherAI went from an Gold tier provider to Silver tier provider.

Together’s multi-datacenter strategy seems to be driven by necessity. They have significant compute needs due to their serverless inference endpoint business, which is growing steadily. During our research for this article, we spoke with multiple Neoclouds that claim Together is one of their biggest customers. Competing in the serverless inference endpoint business does provide two key benefits: it creates a sales funnel to cross-sell GPU clusters to inference customers, and it allows Together to absorb the cost of idle cluster compute by running inference workloads on it. It also give Together an opportunity to enjoy the fruits of their kernel team’s labour. TKC is an exceptional feature, and the impact of Tri Dao’s FlashAttention cannot be overstated. During the research for this article, we got to hear directly from Dan Fu about the TKC roadmap. We suspect that Dan is the only person in the industry with the title of “VP, Kernels”, and for good reason. TKC is consistently impressive, and it helps both customers and Together’s serverless inference endpoint business achieve improved performance and efficiency. Together’s model of offsetting costs from idle compute by running public and private serverless endpoints is now being copied by the likes of Nebius. Why not make some extra money from idle compute?

During testing, we got access to a classic Together slurm cluster, a TKE kubernetes cluster, and a soon-to-be-released Instant Cluster in preview. For slurm, the onboarding process was smooth. Just create an account on the console, upload ssh keys, and the together engineering team sends you an onboarding document. One ssh command and the cluster works out of the box. Unfortunately, during testing we noticed that the cluster responded very slowly to terminal commands in a VSCode or cursor remote SSH session. The standard terminal application was fine, and we could replicate the slowness from multiple locations, leading us to believe it was a problem with their datacenter provider

The Kubernetes onboarding experience was less polished. Instead of providing a kubeconfig file to download, we were expected to login and access the cluster via ssh. As mentioned previously, this is atypical for kubernetes admins and users who generally prefer to develop code locally and switch contexts on demand. In addition, we found that standard tools like Helm were not installed, and users do not get sudo permissions by default, requiring more manual setup. Together uses rancher k3s to provide these clusters, which is strange considering how much of the serverless endpoint runs on kubernetes. Together has several customers, including Hedra, Cartesia, and Krea, that are successfully running production inference on thousands of GPUs using these managed K8s clusters. However, at this time, together does not have horizontal node autoscaling capabilities in these clusters. Whatever capacity you commit to is what you get. It is interesting to see the dynamic between the cluster business and the endpoint business in action: users can see it as together competing against itself, or providing end users with choice.

Source: Together. Trying to use our TKS cluster L

“Instant Clusters” is Together’s newest offering, designed to be fully managed via API, CLI, and a Terraform provider. This product allows users to dynamically provision clusters and add or remove nodes on demand, making it suitable for handling burst capacity and autoscaling. The architecture for Instant Clusters provides strong tenant isolation using a multi-layered approach similar to Nebius. First, a base Kubernetes cluster uses KubeVirt to create dedicated Virtual Machines (VMs) for a customer. Second, these VMs are used to form an isolated Kubernetes cluster dedicated to that customer. Third, slurm is then installed into the customer’s dedicated K8s cluster using slurm-operator from Slinky. Overall, this architecture allows Together to offer flexible, on-demand Slurm environments on top of a modern, virtualized stack. Notably, in our testing, Together is the only provider to correctly configure Slinky out-of-the-box with sudo permissions, vim/nano, git, python, and other basic packages pre-installed. They clearly have already rolled out this offering to users, and we are excited for it to launch in full GA.

On these clusters, Together provides 24/7 support from an on-call SRE team that is primarily US-based. For networking, they work directly with customers to configure firewall rules at the datacenter level and provide IP addresses as needed, including 1:1 NAT and public IPs assignable through services like MetalLB.

The final, and most important piece of differentiation from Together and gold tier providers is a proactive and automated approach to monitoring and reliability. This has been a weak point for Together, and is difficult for them to work around given the broad use of datacenter and GPU infrastructure partners they have contracts with.

During our review of the monitoring dashboard, we noted a bug in their Grafana monitoring dashboard that incorrectly reported InfiniBand bandwidth at a physically impossible 1.14 Tbit/s. To their credit, when we pointed this out, their team quickly identified the calculation error in their query and deployed a fix.

For passive health checks, we expect checks run continuously in the background to detect failures on live nodes. This is where the gap between their current implementation and a fully automated system is most clear. Together has implemented detection for many critical issues, including GPUs falling off the bus, PCIe errors, InfiniBand link flaps, high GPU thermals, and high ECC memory error rates. A baseline Kubernetes node health check is also in place. However, the most critical missing piece is automated remediation. While they can detect most of the issues above, the logic to automatically drain a faulty node is still on the roadmap for everything except for GPUs falling off the bus in slurm. Other crucial features on the roadmap include detecting uncorrectable Nvidia XID errors, identifying stalled NCCL jobs, and implementing AI/ML-based predictive failure analysis.

For active health checks, Together has currently implemented a comprehensive suite of tests for single-node validation. It includes Nvidia’s DCGM diagnostics (level 3), PCIe bandwidth tests, single-node NCCL and InfiniBand all-reduce tests to validate local interconnects, and GPU stress tests like GPUBurn. However, key multi-node and application-level tests are still on the roadmap. This includes pairwise ib_write tests to validate the InfiniBand fabric under load, hardware correctness validation with Nvidia’s TinyMeg2, and full-stack performance tests with models like Megatron to ensure TFLOPs and loss convergence match reference numbers. We have previously noted how important these tests are during burn-in and during cluster operation, as they stress both the GPUs and the interconnect at the same time, for an extended period of time, resulting in thermal expansion and contraction of the entire cluster, similar to normal operation. We encourage Together to prioritize implementing these active health checks, as we believe it will help them improve reliability, especially when working with datacenter partners that are not under their direct control.

In summary, Together continues to operate on a solid foundation for managed clusters. They have a large and growing customer base for both their clusters, and serverless inference endpoint products. Their active, single-node health checks are strong. However, the system is not yet complete. We believe that the gap between detecting node failures passively, instead of automatically remediating them proactively is a key reason for the reliability issues users experience today.

Together GPU Cloud FAQ

What tier is Together in ClusterMAX?

Together is rated Silver tier in the ClusterMAX 2.0 GPU cloud rating system by SemiAnalysis (with the ClusterMAX 2.1 Update applied April 2026). Silver is a mid-tier rating in the ClusterMAX rating system. Adequate offering with noticeable gaps compared to Gold or Platinum. Room for improvement.

Is Together SOC 2 Type II certified?

Together's ClusterMAX review does not flag a SOC 2 Type II attestation as confirmed. SemiAnalysis treats SOC 2 Type II as a baseline expectation for any GPU cloud serving enterprise or regulated AI workloads — see the ClusterMAX criteria page for the full security baseline.

Does Together support Slurm?

Yes. The Together review on ClusterMAX covers their Slurm offering — including whether it is managed, self-managed, or runs as Slurm-on-Kubernetes (SUNK, Soperator, or Slinky). See the Orchestration section of the review for the specific Slurm flavor offered and SemiAnalysis' hands-on experience.

Does Together support Kubernetes?

Yes. The Together review on ClusterMAX covers their Kubernetes offering — whether managed Kubernetes is provided, what control plane is used, and how GPU operator, networking, and storage integrate. See the Orchestration and Storage sections of the review for details.

What GPUs does Together offer?

The Together ClusterMAX review covers their current GPU inventory and on-demand availability. SemiAnalysis tracks H100, H200, B200, GB200 NVL72, GB300 NVL72, and MI300X / MI355X availability across all 85 providers in the ClusterMAX 2.0 + 2.1 cohort.

What is the NCCL all-reduce performance on Together?

The Together review on ClusterMAX includes hands-on NCCL all-reduce results from SemiAnalysis testing. NCCL bandwidth (in GB/s) is one of the most important indicators of training cluster health — see the Networking section of the review for the specific numbers and how they compare to the ClusterMAX cohort.

How does Together compare to CoreWeave?

CoreWeave is the only ClusterMAX Platinum provider, while Together is rated Silver. The Together review documents the specific gaps versus CoreWeave across the 10 ClusterMAX criteria (Security, Lifecycle, Orchestration, Storage, Networking, Reliability, Monitoring, Pricing, Partnerships, Availability). See the Together review body and the ClusterMAX /criteria page for the full comparison framework.

Is Together recommended for LLM training?

Together is in a ClusterMAX tier that SemiAnalysis directly recommends for production GPU workloads (Platinum / Gold / Silver / Bronze). The Together review details which workload profiles fit best — large-scale pretraining, fine-tuning, on-demand experimentation, or inference — based on hands-on cluster testing.

All ClusterMAX™ 2.0 + 2.1 reviews