# Tensorwave (Silver) — ClusterMAX GPU Cloud Review

> Tensorwave earns a ClusterMAX 2.0 Silver rating from SemiAnalysis. Tensorwave is a provider that recently raised a $100M Series A from AMD Ventures. As a result, they have an exclusive focus on AMD hardware, including 8,192 MI325X GPUs in their Tucson, Arizona datacenter. Since we love all GPUs and…

- **Provider**: Tensorwave
- **ClusterMAX Tier**: Silver
- **Tier definition**: Adequate offering with noticeable gaps compared to Gold or Platinum. Room for improvement.
- **Authors**: Jordan Nanos, Daniel Nishball, Dylan Patel (SemiAnalysis)
- **Published**: 2025-11-06 (Nov 06, 2025)
- **Last updated**: 2025-11-06 (Nov 06, 2025)
- **Source**: ClusterMAX 2.0
- **Canonical URL**: https://www.clustermax.ai/cloudreview/tensorwave
- **Source article**: https://newsletter.semianalysis.com/p/clustermax-20-the-industry-standard
- **Topics**: Tensorwave review, Tensorwave GPU cloud, Tensorwave ClusterMAX rating, Tensorwave Silver, Silver tier GPU cloud, GPU cloud review, neocloud review, Tensorwave MI325X, MI325X cloud, RoCE, Kubernetes, Slurm, DCGM, ClusterMAX 2.0, SemiAnalysis

---

Tensorwave is a provider that recently raised a $100M Series A from AMD Ventures. As a result, they have an exclusive focus on AMD hardware, including 8,192 MI325X GPUs in their Tucson, Arizona datacenter. Since we love all GPUs and love AMD, we have been working with Tensorwave for a long time as they graciously provide us access to GPUs for benchmarking that is well beyond the scope of ClusterMAX. We are grateful for this support.

Our testing on Tensorwave’s SonK platform has shown it to be largely unstable. The onboarding process is confusing, relying on Rancher’s RKE2 open-source kubernetes distribution (formerly RKE government), Longhorn for storage, and a modified version of Slinky for SonK (to get it to support AMD GPUs properly). To login to the cluster we initially had to escalate to sudo just to run basic kubectl commands and get a “slurm-login” convenience script working. It took a significant amount of back and forth with the Tensorwave team to get a working kubeconfig (notably, this is now easy to download from the console). We also ran into issues with permissions and user groups, which did not seem to be properly synchronized between the jump box and the Slurm login nodes. This issue has also been fixed since our testing period, but it is clear that there is limited experience getting an RBAC-scoped cluster working with an external IAM provider. In addition, the Slurm login node was missing the (now classic) tools we expect: vim, nano, git and sudo permissions to run apt install. However, in Tensorwave’s case, it only took a few hours for the team to modify the base container image to include these tools. We were impressed by this turnaround time.

[](https://substackcdn.com/image/fetch/$s_!bUi8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8feae912-04d3-41c6-bae6-0f0236850397_936x561.png)Source: our Tensorwave Console

In addition to access, there was no topology-aware scheduling in place, health checks were not integrated with Slurm for auto-draining nodes that fail a health check, and the monitoring dashboard was missing critical information about GPU and system health that is unique to AMD’s [RDC package](https://github.com/ROCm/rocm-systems/tree/develop/projects/rdc). While NVIDIA providers get a simpler foundation building on DCGM, Tensorwave has had to build a lot of this from scratch, since they are AMD exclusive. Most importantly, however, is reliability. During our testing, we have experienced a number of reliability issues, including some outages that stretch over multiple hours or days. In a two-month period, we have experienced 7 distinct interruptions: hardware and firmware issues on GPU nodes, a redeployment of Kubernetes, SonK/slurm-login connection issues, maintenance on Weka storage, maintenance on switches and routers, and even a power outage. Notably, none of these issues are directly related to AMD GPUs, it is the rest of the cluster and the facilities around the GPU.

To their credit, the Tensorwave team is always very responsive to our feedback and quick to address issues we raise. We have also seen a general trend of reliability improving over time. Overall, the fact that we have to provide guidance on proper Slurm setup, monitoring, and health checks points to a general lack of experience running multi-tenant clusters at the scale of 8,192 MI325X GPUs or larger. We look forward to collaborating more with Tensorwave over time as they build out more AMD GPU capacity.

---

Other Silver tier providers:
- Together: https://www.clustermax.ai/cloudreview/together
- Lambda: https://www.clustermax.ai/cloudreview/lambda
- Google Cloud (GCP): https://www.clustermax.ai/cloudreview/googlecloud
- Amazon Web Services (AWS): https://www.clustermax.ai/cloudreview/amazonwebservices
- Scaleway: https://www.clustermax.ai/cloudreview/scaleway
- Cirrascale: https://www.clustermax.ai/cloudreview/cirrascale
- GCORE: https://www.clustermax.ai/cloudreview/gcore
- Firmus / Sustainable Metal Cloud (SMC): https://www.clustermax.ai/cloudreview/firmussustainablemetalcloud
- GMO Cloud: https://www.clustermax.ai/cloudreview/gmocloud
- Vultr: https://www.clustermax.ai/cloudreview/vultr
- Voltage Park: https://www.clustermax.ai/cloudreview/voltagepark

Full ClusterMAX 2.0 + 2.1 index: https://www.clustermax.ai/cloudreview
Full LLM dump of all reviews: https://www.clustermax.ai/llms-full.txt