CUDO Compute was founded in 2017 and like many others on our list began its journey as a crypto miner, albeit at a modest scale. CUDO now operates a global partner network of data centers, including a recently announced a partnership with CanopyCloud.io to expand their datacenter network globally.
Our hands-on experience started with their web console, which offers a highly configurable and project-based approach to organizing resources across global datacenters. Today, interconnected nodes are only available in Dallas, while 8-way bare metal servers are available in Paris, Stockholm, or Kristiansand Norway. In total, GPU VMs are available in 6 of 10 global datacenters, with the other 4 providing CPU-only VMs. We decided to grab our first ever African GPU VM, via CUDO’s datacenter in Centurion, South Africa.
Spinning up a VM on CUDO Compute
Spinning up a virtual machine was straightforward, with the provisioning process taking under 4 minutes and the console providing easy ssh key management. Usefully, we could configure a shared disk in the datacenter location we were using, meaning local data can be re-used in between cycles for a VM to spin up/down. However, the 200GB disk we deployed is not a filesystem volume, and is not mounted and visible to the OS image by default. We would prefer a shared filesystem volume that could be mounted to multiple machines, and requires similar underlying functionality on the server side to deliver. We also found it unfortunate that we were logging into the VM as a shared root user, instead of passing RBAC-enforced auth credentials from the console to the underlying VMs.
Furthermore, the base Ubuntu image was not AI-ready out of the box. The driver version and nvidia container toolkit version provided were significantly out of date (meaning insecure). The OS image was also missing pip/pip3, and the python3 binary was not aliased to python, requiring extra steps to set up a basic virtual environment for development. Crucially, CUDO Compute does maintain ISO 27001 compliance with underlying datacenters, a key security attestation that many similar providers lack.
Overall, CUDO Compute has a promising foundation with a flexible, easy-to-use console and global reach. However, the platform is not ready for large scale training and inference due to a lack of managed slurm or kubernetes services, shared file storage, monitoring dashboards, health checks, and any sort of proactive, enterprise support options. We recommend that CUDO focus on refining their base machine images for ease-of-use, consider deploying shared file storage, and continue building experience at the orchestration layer for slurm and kubernetes clusters in the future.