Serverless Inference with LLM
What You Will Build
Prerequisites
Architecture Overview
Your Client
│
│ POST /v2/serverless/{id}/tasks
▼
YottaLabs Platform ──────────────────────────────────────────────────┐
│ │
│ Routes request to an available worker │
▼ │
Worker (RTX 5090 GPU) │
│ │
│ vLLM serving Qwen2.5-7B-Instruct │
│ OpenAI-compatible API on port 8000 │
│ │
└──────────────────────────────────────────────────────────────────┘Managing Your Endpoint
Common Issues and Fixes
GPU and Model Selection Guide
Model
VRAM Required
Recommended GPU
Next Steps
Was this helpful?