Since March, Crusoe has been hard at work expanding their datacenter footprint, while trying to keep the Neocloud business alive.
Crusoe has announced:
-
A partnership with Oracle to develop Abilene, the flagship Stargate project for OpenAI, at over 1.2 GW, worth $15B in joint venture funding.
-
An order of 29 LM2500XPRESS aeroderivative gas turbine packages from GE Vernova, enough for over 1GW of power.
-
A deployment of AMD MI355X GPUs, despite counting Nvidia as a key investor in their $600M Series D fundraise.
-
A 1.8GW datacenter in Wyoming, with a design to scale up to 10GW: https://www.crusoe.ai/resources/newsroom/crusoe-and-tallgrass-announce-ai-data-center-in-wyoming.
-
Expansion of their Iceland facility with atNorth, and a $175M credit facility for this project.
-
A smaller, 12MW facility in Norway with Polar, and planned expansion to 52MW.
-
A $750M credit facility from Brookfield.
-
A $225 million credit facility from Upper90.
-
“Prometheus,” a 150MW facility located in the Permian Basin in West Texas.
Interestingly, Prometheus has debuted Crusoe’s Digital Flare Mitigation technology publicly for the first time. During oil extraction, at sites like the Permian Basin, natural gas flaring is a waste product. However with DFM, Crusoe is able to install onsite mobile datacenter units that divert the waste product to generators for the datacenter onsite. These DFM’s have been announced as “Crusoe Spark”, and now include all the requisite infrastructure required to host B200s.
Source: Crusoe Spark launches (via Crusoe on YouTube)
After all these announcements, Crusoe is left with a claimed 3.4GW of datacenter footprint, some of which is already showing up as revenue on their balance sheet.
So yeah, lots going on.
As for the actual, customer experience on Crusoe, around six months ago when we started testing slurm on Crusoe, they had just launched their fully managed slurm solution called “Auto Clusters”. The lifespan of this service offering has come and gone in the interim period, with the focus now being a Slurm-on-Kubernetes experience. Unfortunately, the new Slurm-on-Kubernetes experience is in its early days and is not usable out of the box.
Starting up a cluster is simple, via the Crusoe CLI, avoiding complicated terraform scripts and simplifying some of the complexity of a webUI.
However, when a simple CLI approach is used, we expect reasonable defaults. Crusoe claims to have developed their Slurm-on-Kubernetes offering in-house, while taking inspiration from Slinky. Unfortunately, the login pod was missing vim, nano, git, python, and sudo permissions. We gave some recommendations on how to take less inspiration from open-source Slinky and make the cluster usable out-of-the-box. The SonK offering also doesn’t support partitions, RBAC, and SSO integration, making it basically unusable for a research lab beyond the scale of about 10 researchers.
In addition, when provisioning a kubernetes cluster without slurm for our testing, we had a lot of extras to setup. A Crusoe CMK cluster does not include a default ReadWriteMany StorageClass, making it impossible to deploy any workload with a persistent volume claim. We had to go through many extra configuration steps on the console to figure out how to configure this storage class.
During our testing, we also encountered several performance and reliability issues on slurm, kubernetes, and on a standalone machine. We repeatedly saw NVML driver mismatch errors inside individual Docker containers, indicating potential image or driver management instability. We expect this is due to Crusoe’s use of cloud-hypervisor, and insistence on building all their infrastructure, including GB200 NVL72, with VMs.
On the networking side, while PKeys for InfiniBand partitioning were integrated, using them through the console was not intuitive. We have also had challenges with shared filesystems randomly unmounting, requirements to deploy OS drives and configure RAID settings manually (with the requisite footguns).
In conversations with Crusoe users when discussing reliability at scale, it has been hit-or-miss. Some have had good experiences, but anyone who tested clusters in Crusoe’s Iceland facility prior to March 2025 seem to have all had a common experience: lots of link flaps and random filesystem unmounts. Crusoe ended up having to clean the 20,000 fiber ends using a “clicker” that were full of dust and other debris. Some people have said that the debris was volcanic ash.
We found that in November 2023 the Icelandic Data Center ICE02 from atNorth started publishing status updates regarding increased seismic activity and volcanic uplift near Mt. Þorbjörn in the Reykjanes Peninsula. The datacenter is about 35km away from this volcano.
Source: https://status.edis.global/notices/y2d8vjath5kvmq8t-iceland-potential-volcanic-eruption
Source: Google Maps. Checking in to see how long it would take an atNorth datacenter technician to visit an active volcano on their lunch break
It is our understanding that this datacenter Crusoe now calls home has continued to experience significant seismic activity and air quality concerns, leading to some more hits on YouTube videos like this one from Iceland.
Overall, Crusoe is clearly executing on an ambitious strategy, securing massive power capacity and datacenter real estate. They have already pivoted once from crypto mining to AI cloud, and seem to be in the process of another pivot from cloud provider to datacenter infrastructure provider.
However, Crusoe is at risk of being downgraded to ClusterMAX Silver due to many of their top individual contributor engineers quitting, leaving the culture in their cloud division beginning to resemble big tech. There are too many middle managers across the organization, especially in engineering. This has caused incredibly slow moving releases, such as their AutoClusters feature, leaving us with concerns about the future of Crusoe’s public cloud offerings. Chase needs to do a rapid course correction if he doesn’t want to lose all of his 10x engineers and eventually lose their Neocloud business with it.