Serverless
Elastic Endpoints
Create Endpoint
POST /v2/serverless{
"name": "llama-inference",
"image": "vllm/vllm-openai:latest",
"containerRegistryAuthId": 123,
"resources": [
{
"region": "us-east-1",
"gpuType": "NVIDIA_RTX_4090_24G",
"gpuCount": 1
}
],
"workers": 2,
"containerVolumeInGb": 120,
"environmentVars": [
{ "key": "MODEL", "value": "llama-3-8b" }
],
"expose": {
"port": 8000,
"protocol": "http"
},
"serviceMode": "QUEUE",
"webhook": "https://webhook.example.com/status"
}FIELD
TYPE
REQUIRED
DESCRIPTION
Get Endpoint by ID
List Endpoints
Update Endpoint
FIELD
TYPE
REQUIRED
DESCRIPTION
Endpoint Actions
Scale Workers
List Workers
Tasks API (QUEUE mode)
FIELD
TYPE
REQUIRED
DESCRIPTION
STATUS
DESCRIPTION
STATUS
DESCRIPTION
PARAMETER
TYPE
DEFAULT
MAX
DESCRIPTION
Last updated
Was this helpful?