Run GPU workloads without managing infrastructure. Scale to zero, scale to thousands. Pay only for what you use.
Push your container or Python function. We handle the rest.
Set min/max replicas, concurrency limits, and scaling triggers.
Send requests via API. We scale automatically based on load.
Pre-warmed GPU instances ready to serve requests instantly. No waiting for containers to spin up.
Automatically scale from zero to thousands of GPUs based on demand. Pay only for active compute.
Granular billing with no minimum commitments. Perfect for variable or unpredictable workloads.
No servers to manage. We handle provisioning, monitoring, and maintenance for you.
Deploy ML models as APIs with automatic scaling. Handle millions of requests with low latency.
Process large datasets in parallel. Scale up for heavy workloads, scale down when done.
Run distributed training jobs on-demand. No need to reserve capacity ahead of time.
Trigger AI workloads from events, webhooks, or schedules. Perfect for async processing.
Choose the right GPU for your workload. All GPUs available on-demand.
24GB VRAM
$0.20/hr
Inference & Fine-tuning
80GB VRAM
$1.84/hr
Large Model Training
141GB VRAM
$2.28/hr
LLM & Distributed Training
180GB VRAM
$3.38/hr
Frontier Model Training
| Feature | Serverless GPU | Traditional |
|---|---|---|
| Cold Start Time | < 1 second | 30-120 seconds |
| Scaling | Automatic | Manual configuration |
| Minimum Cost | $0 | $100+/month |
| Management | Fully managed | Self-managed |
| Billing | Per-second | Per-hour |
Deploy and invoke GPU functions with just a few lines of code
from lumin import serverless
@serverless.function(gpu="rtx-4090")
def predict(image_url: str):
# Your inference code here
model = load_model()
result = model.predict(image_url)
return result
# Deploy with one command
# lumin deploy predict.pyGet $100 free credits. No credit card required.