ClusterMAX 2.0Bronze

Lightning.ai

Meets minimum criteria. Last category we directly recommend; often with inconsistent support or networking gaps.

ByJordan NanosDaniel NishballDylan Patel
Published

Lightning.ai Quick Stats

ClusterMAX Tier
Bronze (2 / 5)
Source Rating Cycle
ClusterMAX 2.0
GPUs Offered
H200
Slurm Support
Discussed in review
Kubernetes Support
Discussed in review
SOC 2 Mentioned
Not flagged
NCCL Benchmarks
Not in review
Last Updated
Nov 06, 2025

Want to model Lightning.ai cluster cost? Calculate H100, H200, B200 & GB200 NVL72 TCO with the ClusterMAX calculator.

Lightning.ai (aka Lightning Cloud) is a broker for GPU machines in neoclouds and hyperscalers that provides useful MLOps features on top. The founding story of Lightning Cloud begins with the development of PyTorch Lightning, an open-source framework that organizes and simplifies boilerplate PyTorch code such as the training loop, logging, checkpointing, and distributed training. The lightning git repo seems to be the #1 way top-of-funnel sales start for Lightning Cloud.

Fast forward to today, with LLMs on the rise, there is a split in the market. Older frameworks like NVIDIA NeMo use Lightning under the hood, while new frameworks that we use in our testing such as torchtitan, verifiers and Megatron-LM do not. The open source pytorch-lightning and lightning packages are still growing rapidly:

Source: Lightning.ai, data from pypi

Functionally, the Lightning Cloud product offers a simple way to track who’s using what across multiple clouds. We had a chance to test the Lightning Studio, which provides access to GPUs in a browser (VSCode, Jupyter notebook) or remote SSH (VSCode, Cursor, Windsurf, etc). Users can also submit batch jobs and “mmt” (multi-machine training) jobs to individual machines or clusters that they get access to on demand. Our testing of clusters is coming soon.

Source: our lightning.ai homepage

Notably, these multi-GPU studios, batch jobs, and mmt training jobs are restricted to users on a Pro, Teams or Enterprise Custom payment tier. Lightning is the only neocloud we have seen charging a per-seat price, and translating that into GPU-hrs behind the scenes on clusters that they manage for the customer.

Source: lightning.ai/pricing

Interestingly, there is an easy way to attach/detach GPUs to existing “studios” (i.e. notebooks or remote shells) and auto-sleep them if unused. This means that users only paying for what they use. Lightning also forecasts the wait times associated with spinning up a GPU from a given provider, such as AWS, Google, Lambda, Voltage Park, or Nebius. The worst wait time is for an 8x H200 machine in AWS, estimated at 3hrs. Unfortunately, despite what the website says, there are no GPUs available from NScale.

Using a VSCode notebook in Lightning.ai

Another piece that jumped out during testing is that notebooks have full CLI access including docker, meaning the notebook is running directly on a VM under the hood. This leaves users with full flexibility in the environment.

Overall we have our doubts about the utility of remote developer environments where cluster access is abstracted away from users, especially at the high-end of the market. The largest buyers of GPU compute do not have a problem spinning up a notebook on kubernetes with a simple manifest.yaml, or accessing a single machine via srun -N1 —gpus-per-node=8 —pty bash in a slurm cluster.

We find it hard to see a path forward for Lightning Cloud if the industry moves beyond the lightning framework and the GPU marketplace business continues to focus on taking a margin on top of expensive hyperscalers, with no third party compute. As for the ClusterMAX rating system, we look forward to testing Lightning Cloud’s mmt training, and kubernetes in the future. We encourage Lightning to consider building a slurm offering, adding monitoring dashboards for underlying cluster health that integrates with job logs and performance profiling, adding integration with active/passive health checks on clusters, and customization options for high performance storage and networking.

Lightning.ai GPU Cloud FAQ

What tier is Lightning.ai in ClusterMAX?

Lightning.ai is rated Bronze tier in the ClusterMAX 2.0 GPU cloud rating system by SemiAnalysis (with the ClusterMAX 2.1 Update applied April 2026). Bronze is the lowest tier ClusterMAX directly recommends. Meets minimum criteria. Last category we directly recommend; often with inconsistent support or networking gaps.

Is Lightning.ai SOC 2 Type II certified?

Lightning.ai's ClusterMAX review does not flag a SOC 2 Type II attestation as confirmed. SemiAnalysis treats SOC 2 Type II as a baseline expectation for any GPU cloud serving enterprise or regulated AI workloads — see the ClusterMAX criteria page for the full security baseline.

Does Lightning.ai support Slurm?

Yes. The Lightning.ai review on ClusterMAX covers their Slurm offering — including whether it is managed, self-managed, or runs as Slurm-on-Kubernetes (SUNK, Soperator, or Slinky). See the Orchestration section of the review for the specific Slurm flavor offered and SemiAnalysis' hands-on experience.

Does Lightning.ai support Kubernetes?

Yes. The Lightning.ai review on ClusterMAX covers their Kubernetes offering — whether managed Kubernetes is provided, what control plane is used, and how GPU operator, networking, and storage integrate. See the Orchestration and Storage sections of the review for details.

What GPUs does Lightning.ai offer?

Based on the SemiAnalysis hands-on review, Lightning.ai offers (or has been publicly tied to) the following NVIDIA / AMD GPU SKUs: H200. Specific inventory, region availability, and on-demand vs reserved access are detailed in the Lightning.ai ClusterMAX review.

What is the NCCL all-reduce performance on Lightning.ai?

Lightning.ai's ClusterMAX review does not yet publish hands-on NCCL all-reduce results. NCCL all-reduce bandwidth is the standard SemiAnalysis benchmark for InfiniBand / RoCE health on GPU clusters — see the ClusterMAX /health-checks page for the full benchmark methodology.

How does Lightning.ai compare to CoreWeave?

CoreWeave is the only ClusterMAX Platinum provider, while Lightning.ai is rated Bronze. The Lightning.ai review documents the specific gaps versus CoreWeave across the 10 ClusterMAX criteria (Security, Lifecycle, Orchestration, Storage, Networking, Reliability, Monitoring, Pricing, Partnerships, Availability). See the Lightning.ai review body and the ClusterMAX /criteria page for the full comparison framework.

Is Lightning.ai recommended for LLM training?

Lightning.ai is in a ClusterMAX tier that SemiAnalysis directly recommends for production GPU workloads (Platinum / Gold / Silver / Bronze). The Lightning.ai review details which workload profiles fit best — large-scale pretraining, fine-tuning, on-demand experimentation, or inference — based on hands-on cluster testing.

All ClusterMAX™ 2.0 + 2.1 reviews