# Firmus / Sustainable Metal Cloud (SMC) (Silver) — ClusterMAX GPU Cloud Review > Firmus / Sustainable Metal Cloud (SMC) earns a ClusterMAX 2.0 Silver rating from SemiAnalysis. Firmus is an Australian company that was recently backed by a strategic investment from Nvidia at a $1.9B valuation: . Their current ambition is to build a “Stargate for the southern hemisphere,” with a specific focus on next-generation rack-scale systems like the GB300 NVL72 and VR. Though we believe that the bulk of Firmus’s experience with immersion cooling is misguided, and now wasted, we also believe that this team is one of the few in the industry that has the engineering chops to monitor and maintain the physical layer of these DLC systems effectively. Our review of their current telemetry and failure prediction system for their immersion deployments demonstrates significant attention to detail, and a deep understanding of the physical stack, down to the signal quality and light levels in custom transceivers and optical cables. However, this experience at the lowest physical level can be undermined by a higher UX level that feels out-of-touch with customer requirements. Our testing began with a difficult wrinkle: cluster access is gated behind a mandatory VPN. This is a significant operational bottleneck for teams accustomed to standard cloud workflows with public IPs or streamlined SSH wrappers. While some security-conscious customers (such as international federal agencies for defense, intelligence, and research) may find this acceptable and even prefer isolation at Layer 2,3, 5 or 7, the general public does not operate this way. The fact that Firmus had no alternative access method prepared was telling for us. Once connected, our slurm environment also had some configuration issues. The standard topology.conf file was not set for topology-aware scheduling, and a simple “srun -N1 –gpus-per-node=8 –pty bash” command took over a minute to execute due to an exceptionally long prolog. It seems that the Firmus team took some of our previous feedback around health checks to an extreme, filling up the prolog with unnecessary dcgm level 3 checks when level 1, 2, or just an epilog with HealthCheckProgram configured would suffice. To their credit, a pre-staged nccl-test script was provided and ran at expected bandwidth. As mentioned previously, the Firmus monitoring stack is unique, going beyond standard DCGM metrics and feeding ML models to predict component failures before they occur. A “link flap” is formally defined as five events in one hour, triggering automated diagnostics. Their internal validation suite is exhaustive, running regression tests on spare nodes that include P2P bandwidth tests, GDR copies, small-scale llama training runs, and NCCL tests to proactively identify GPUs, NVLink, or InfiniBand interconnects that are approaching failure. [](https://substackcdn.com/image/fetch/$s_!ECPy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11a1344a-e46a-49c1-aeca-da6c023cb9f7_894x392.png)Source: Firmus Custom Monitoring Dashboard for Immersion Tanks [](https://substackcdn.com/image/fetch/$s_!Hz_s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff610ff4e-5907-4f66-aaa4-104c67fe15a0_937x454.png)Source: Firmus Customized Grafana Dashboard, showing relevant GPU Utilization Metrics during a training run This level of investment in monitoring at the physical layer is how Firmus plans to back up an aggressive “99.94% SLA”, aiming to differentiate itself from competitors by ensuring maximum goodput – something that we have also heard from top-tier providers like CoreWeave and Nebius. Their business model mirrors other major Nvidia clouds, with attractive prospective pricing for their upcoming rack-scale deployments, much of which is made possible by a low power cost in their massive expansion into Tasmania. We encourage Firmus to double-down on their focus on operational excellence from the physical layer to the orchestration layer (i.e. properly configured slurm and kubernetes clusters) without getting distracted by fancy PaaS and SaaS applications that the vendor-du-jour is pitching. --- Other Silver tier providers: - Together: https://www.clustermax.ai/cloudreview/together - Lambda: https://www.clustermax.ai/cloudreview/lambda - Google Cloud (GCP): https://www.clustermax.ai/cloudreview/googlecloud - Amazon Web Services (AWS): https://www.clustermax.ai/cloudreview/amazonwebservices - Scaleway: https://www.clustermax.ai/cloudreview/scaleway - Cirrascale: https://www.clustermax.ai/cloudreview/cirrascale - GCORE: https://www.clustermax.ai/cloudreview/gcore - GMO Cloud: https://www.clustermax.ai/cloudreview/gmocloud - Vultr: https://www.clustermax.ai/cloudreview/vultr - Voltage Park: https://www.clustermax.ai/cloudreview/voltagepark - Tensorwave: https://www.clustermax.ai/cloudreview/tensorwave Full ClusterMAX 2.0 + 2.1 index: https://www.clustermax.ai/cloudreview Full LLM dump of all reviews: https://www.clustermax.ai/llms-full.txt