ClusterMAX 2.0Silver

Lambda

Adequate offering with noticeable gaps compared to Gold or Platinum. Room for improvement.

ByJordan NanosDaniel NishballDylan Patel
Published

Lambda Quick Stats

ClusterMAX Tier
Silver (3 / 5)
Source Rating Cycle
ClusterMAX 2.0
GPUs Offered
Not detailed in review
Slurm Support
Discussed in review
Kubernetes Support
Discussed in review
SOC 2 Mentioned
Yes
NCCL Benchmarks
In review
Last Updated
Nov 06, 2025

Want to model Lambda cluster cost? Calculate H100, H200, B200 & GB200 NVL72 TCO with the ClusterMAX calculator.

Lambda is another gold-tier candidate that unfortunately comes in at #1 in the wrong category: customer complaints. Lambda started out in 2012 by building facial recognition software, then pivoted to reselling SuperMicro GPU workstations, servers, and eventually on-premises clusters. Today, they appear to be 100% focused on their “Superintelligence Cloud” and working to shed their legacy on-prem server and workstation business. Their recent announcement with Microsoft suggests they will be providing capacity worth multi-billions across 10s of thousands of Nvidia GPUs.

A recurring theme when we talk to users is that Lambda seems to unfortunately be trying to do everything for everyone. While the company has deep experience in building dedicated HPC clusters, this has not yet translated into a polished, user-friendly cloud console, or cluster monitoring experience. Their product offerings feel conflicted with a new-mslurm, old-mslurm, new-mk8s, old-mk8s, private cloud, 1-Click Cluster, and on-demand Instances.

Notably, 1-Click Clusters aren’t really one click, as you need to wait for approval. It’s more of a 1-Click-if-approved-and-paid-for-then-you-can-have-it-Cluster.

Source: Lambda Labs

For users that want an on-demand machines instantly, Lambda is generally considered to be the top-tier on-demand provider, with the largest fleet of GPUs available. However, in our recent experience, Lambda is in fact suffering from success in on-demand. We are generally met with greyed-out screens showing that capacity is sold-out:

Source: Lambda Labs: trying to get an on-demand GPU instance from Lambda

Also, for a hot minute, Lambda appeared to be getting into the serverless inference API endpoint business, which would put them in direct competition with some of their largest customers. But that is no longer:

Source: Lambda Labs

Overall, we like the focus. Lambda has pivoted, and is very focused on their 1-Click-Cluster (1CC) business, focusing on “big game hunting”.

During our testing, we evaluated both their new (self-managed) and old (rancher-based) Kubernetes offerings, and their newly available slurm offering. Neither of these is UI or CLI driven, instead requiring a Lambda engineer to set up the cluster for you.

Lambda’s Kubernetes product feels like an early-stage offering, marked by technical debt and a challenging user experience. While the current product does not use Rancher, the public documentation still references it, causing initial confusion. The user experience for inference workloads is particularly lacking. Clusters do not come with a default public IP solution (like MetalLB or an external LoadBalancer). Setting up public-facing inference services is complex and not well-documented, requiring significant manual configuration. This reflects a platform that is developed to target training workloads, not inference. While documentation exists for a simple, single-GPU vLLM deployment, there are no examples for multi-GPU, multi-node, or auto-scaling inference workloads.

For monitoring, Lambda uses a mix of open-source tools, including LeptonAI’s gpud for GPU device management and node-problem-detector for health checks, but the integration is not seamless into their monitoring dashboards for the new or old mk8s products. Dashboards are easy to access, but missing integration to the metrics without an install of an agent that is not documented, and upon further inspection, still in development.

For slurm, Lambda’s offering is a more recent addition, and the onboarding process was fraught with issues. The initial setup process was cumbersome: ssh keys were not correctly provisioned on the cluster, the default home directory was not shared across nodes by default, requiring data to be moved manually. New user account creation is a headache, requiring workarounds like unsetting environment variables (XDG_DATA_HOME) to function correctly.

To their credit, once these initial hurdles were overcome, the cluster’s performance was strong. We observed expected allreduce, allgather and alltoall bandwidth on nccl-tests and were able to achieve full MFU on an example torchtitan training workloads. Lambda also provides some useful, albeit hard to find, tooling. For example, a welcome message (which was invisible in some SSH clients like Cursor or VSCode) contained custom instructions for a grafana-access command to quickly view performance metrics.

Lambda’s approach to reliability on the slurm cluster included a custom dcgm-status script, which can be run on-demand:

The script is also scheduled to run on a regular cadence in a low-priority, “preemptible” partition:

Source: our Lambda test cluster

Source: our Lambda test cluster

We were impressed by Lambda’s commitment to developing comprehensive active and passive health checks, and believe that they are well on their way to improving reliability challenges, and building the battle scars necessary to run NVL72 rack-scale systems at scale.

With that said, some of the access issues we encountered point to broader operational challenges at Lambda. Their cloud console (though not our cluster) experienced outages during our brief testing window.

Source: Lambda Labs

Source: Lambda Labs

Internally, there appears to be a general degree of disorganization. When asked about a true “cloud console” experience, Lambda acknowledged that the team’s background is primarily in traditional HPC cluster deployment, not building scalable, self-service cloud infrastructure. We encourage Lambda to truly focus on the cloud experience going forward as they simplify their portfolio and focus on their mslurm and mk8s offerings.

On the positive side, Lambda is actively working on improving its platform based on our feedback. They have a compliance team addressing SOC 2 Type II requirements for individual sites, and are working to implement both SHARP and InfiniBand security keys for multi-tenant isolation, following recent Nvidia recommendations (and, likely, the onboarding of Nvidia as a customer with a $1.5B contract). Their storage offerings primarily focus on VAST, with future S3-compatible offerings currently in development.

Overall, Lambda is a strong provider with deep hardware expertise, massive capacity, and big plans for the future. However, their public cloud product feels immature, and engaging with the team feels chaotic. We encourage Lambda to continue to work on translating their HPC hardware prowess into a stable, easy-to-use, and reliable cloud service.

Lambda GPU Cloud FAQ

What tier is Lambda in ClusterMAX?

Lambda is rated Silver tier in the ClusterMAX 2.0 GPU cloud rating system by SemiAnalysis (with the ClusterMAX 2.1 Update applied April 2026). Silver is a mid-tier rating in the ClusterMAX rating system. Adequate offering with noticeable gaps compared to Gold or Platinum. Room for improvement.

Is Lambda SOC 2 Type II certified?

Lambda's review on ClusterMAX explicitly discusses SOC 2 posture. See the Security section of the Lambda review for the current SOC 2 status, scope of the report, and any related attestations (ISO 27001, HIPAA) tracked by SemiAnalysis.

Does Lambda support Slurm?

Yes. The Lambda review on ClusterMAX covers their Slurm offering — including whether it is managed, self-managed, or runs as Slurm-on-Kubernetes (SUNK, Soperator, or Slinky). See the Orchestration section of the review for the specific Slurm flavor offered and SemiAnalysis' hands-on experience.

Does Lambda support Kubernetes?

Yes. The Lambda review on ClusterMAX covers their Kubernetes offering — whether managed Kubernetes is provided, what control plane is used, and how GPU operator, networking, and storage integrate. See the Orchestration and Storage sections of the review for details.

What GPUs does Lambda offer?

The Lambda ClusterMAX review covers their current GPU inventory and on-demand availability. SemiAnalysis tracks H100, H200, B200, GB200 NVL72, GB300 NVL72, and MI300X / MI355X availability across all 85 providers in the ClusterMAX 2.0 + 2.1 cohort.

What is the NCCL all-reduce performance on Lambda?

The Lambda review on ClusterMAX includes hands-on NCCL all-reduce results from SemiAnalysis testing. NCCL bandwidth (in GB/s) is one of the most important indicators of training cluster health — see the Networking section of the review for the specific numbers and how they compare to the ClusterMAX cohort.

How does Lambda compare to CoreWeave?

CoreWeave is the only ClusterMAX Platinum provider, while Lambda is rated Silver. The Lambda review documents the specific gaps versus CoreWeave across the 10 ClusterMAX criteria (Security, Lifecycle, Orchestration, Storage, Networking, Reliability, Monitoring, Pricing, Partnerships, Availability). See the Lambda review body and the ClusterMAX /criteria page for the full comparison framework.

Is Lambda recommended for LLM training?

Lambda is in a ClusterMAX tier that SemiAnalysis directly recommends for production GPU workloads (Platinum / Gold / Silver / Bronze). The Lambda review details which workload profiles fit best — large-scale pretraining, fine-tuning, on-demand experimentation, or inference — based on hands-on cluster testing.

All ClusterMAX™ 2.0 + 2.1 reviews