Elastic Deployment
Elastic Endpoints
Create Endpoint
POST /v2/serverless{
"name": "llama-inference",
"image": "vllm/vllm-openai:latest",
"containerRegistryAuthId": 123,
"resources": [
{
"region": "us-east-1",
"gpuType": "NVIDIA_RTX_4090_24G",
"gpuCount": 1
}
],
"workers": 2,
"containerVolumeInGb": 120,
"environmentVars": [
{ "key": "MODEL", "value": "llama-3-8b" }
],
"expose": {
"port": 8000,
"protocol": "http"
},
"serviceMode": "QUEUE",
"webhook": "https://webhook.example.com/status"
}Get Endpoint by ID
List Endpoints
Update Endpoint
Endpoint Actions
Scale Workers
List Workers
Tasks API (QUEUE mode)
Last updated
Was this helpful?