New: Serverless GPU now supports H200 and B200

Serverless GPU

Run GPU workloads without managing infrastructure. Scale to zero, scale to thousands. Pay only for what you use.

How It Works

1

Deploy Your Code

Push your container or Python function. We handle the rest.

2

Configure Scaling

Set min/max replicas, concurrency limits, and scaling triggers.

3

Start Processing

Send requests via API. We scale automatically based on load.

Why Serverless GPU?

Zero Cold Starts

Pre-warmed GPU instances ready to serve requests instantly. No waiting for containers to spin up.

Auto-Scaling

Automatically scale from zero to thousands of GPUs based on demand. Pay only for active compute.

Pay Per Second

Granular billing with no minimum commitments. Perfect for variable or unpredictable workloads.

Managed Infrastructure

No servers to manage. We handle provisioning, monitoring, and maintenance for you.

Use Cases

Model Inference

Deploy ML models as APIs with automatic scaling. Handle millions of requests with low latency.

Batch Processing

Process large datasets in parallel. Scale up for heavy workloads, scale down when done.

Training Jobs

Run distributed training jobs on-demand. No need to reserve capacity ahead of time.

Event-Driven AI

Trigger AI workloads from events, webhooks, or schedules. Perfect for async processing.

Available GPUs

Choose the right GPU for your workload. All GPUs available on-demand.

RTX 4090

24GB VRAM

$0.20/hr

Inference & Fine-tuning

H100

80GB VRAM

$1.84/hr

Large Model Training

H200

141GB VRAM

$2.28/hr

LLM & Distributed Training

B200

180GB VRAM

$3.38/hr

Frontier Model Training

Serverless vs Traditional

FeatureServerless GPUTraditional
Cold Start Time< 1 second30-120 seconds
ScalingAutomaticManual configuration
Minimum Cost$0$100+/month
ManagementFully managedSelf-managed
BillingPer-secondPer-hour

Simple API

Deploy and invoke GPU functions with just a few lines of code

from lumin import serverless

@serverless.function(gpu="rtx-4090")
def predict(image_url: str):
    # Your inference code here
    model = load_model()
    result = model.predict(image_url)
    return result

# Deploy with one command
# lumin deploy predict.py

Start building with Serverless GPU

Get $100 free credits. No credit card required.