Neysa is an emerging provider operating in the Indian market. They have recently signed an MoU with NTT Data and the Telangana government to build a 400MW, 25k GPU facility in Hyderabad, and currently operate a fleet of H100, H200, and soon MI300X AMD GPUs. However, our testing revealed that their current platform has gaps in security and usability when compared to international competitors.
The onboarding process raised concerns on security for us. Access is managed via username and password-based SSH, with manual IP address filtering and a fragmented user account system. We had no way to create new users for others on the team to test, implying that it would be difficult to support RBAC with an external IAM provider.
The SLURM environment itself also suffered from basic configuration errors. Jobs fail to run initially as no default partition is configured, requiring manual specification for every submission. In addition there was no topology.conf configured. If Neya is going to run a 25k GPU cluster in the future, topology aware scheduling is going to be critical. Also, monitoring and health checks are effectively non-existent. The provided Grafana dashboard was non-functional during our testing and appeared to be missing some expected exporters for health checks or performance monitoring to work. On a more positive note, the software stack for containerized workloads is modern. We found an up-to-date NVIDIA container toolkit, and both pyxis and enroot were installed.
At the time of testing, Neysa did not have a Kubernetes offering available for us to test. We look forward to testing it in the future. We expect Neysa to benefit from compliance with Indian regulations such as the DPDP, but we find it unlikely that they are able expand beyond their domestic market at this time. We encourage Neysa to improve their default experience: a better security posture, user management, proactive support experience, default monitoring systems, and health checks.