Elastic Deployment

Create Endpoint

post

Create a new endpoint

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Body

Elastic Endpoint Create Request v2

namestring · min: 1Required

Deployment name

Pattern: ^(?=[A-Za-z])[A-Za-z0-9@._-]{1,20}$
imageRegistrystring · max: 255Optional

Docker registry URL

imagestring · min: 1 · max: 255Required

Docker image name

Example: vllm/vllm-openai:latest
minSingleCardVramInGbinteger · int32 · max: 1536Optional

Minimum GPU single card VRAM in GB

minSingleCardVcpuinteger · int32Optional

Minimum GPU single card vCPU count

minSingleCardRamInGbinteger · int32 · max: 1536Optional

Minimum GPU single card RAM in GB

workersinteger · int32 · min: 1Required

Number of workers

credentialIdinteger · int64Optional

Credential ID

containerVolumeInGbinteger · int32 · min: 20Required

Container volume in GB

initializationCommandstringOptional

Initialization command

serviceModestring · min: 1Required

Service mode: ALB, QUEUE, CUSTOM

Example: QUEUE
webhookstring · max: 512Optional

Webhook URL for receiving task results

Pattern: ^https?://.*
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

post
/v2/serverless

Submit Task

post

Submit a task to a QUEUE-mode endpoint

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
Body

Elastic Endpoint Submit Task Request v2

taskIdstring · max: 255Optional

User-defined task ID. Auto-generated UUID if omitted

Pattern: ^[A-Za-z0-9_]*$
inputanyRequired

Task input data

workerPortinteger · int32 · min: 1 · max: 65535Required

Worker port (1-65535)

Example: 8000
processUristring · max: 255Required

Process URI on the worker

Example: /v1/chat/completions
webhookstring · max: 512Optional

Webhook URL for result delivery

Pattern: ^https?://.*
webhookAuthKeystring · max: 255Optional

Webhook authentication key

Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

post
/v2/serverless/{id}/tasks

Start Endpoint

post

Start or resume a stopped endpoint

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

post
/v2/serverless/{id}/start

Stop Endpoint

post

Stop a running endpoint

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

post
/v2/serverless/{id}/stop

Scale Workers

put

Scale the number of workers for an endpoint

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
Query parameters
countinteger · int32Required
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

put
/v2/serverless/{id}/workers

Update Endpoint

patch

Update a specific endpoint

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
Body

Elastic Endpoint Update Request v2

namestring · min: 1Required

Deployment name

Pattern: ^(?=[A-Za-z])[A-Za-z0-9@._-]{1,20}$
minSingleCardVramInGbinteger · int32Optional

Minimum GPU single card VRAM in GB

minSingleCardVcpuinteger · int32Optional

Minimum GPU single card vCPU count

minSingleCardRamInGbinteger · int32Optional

Minimum GPU single card RAM in GB

workersinteger · int32 · min: 1Required

Number of workers

credentialIdinteger · int64Optional

Credential ID

containerVolumeInGbinteger · int32 · min: 20Required

Container volume in GB

initializationCommandstringOptional

Initialization command

webhookstring · max: 512Optional

Webhook URL for receiving task results

Pattern: ^https?://.*
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

patch
/v2/serverless/{id}

Delete Endpoint

delete

Delete an endpoint

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

delete
/v2/serverless/{id}

List Endpoints

get

Get all elastic endpoints

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Query parameters
statusListstring[]Optional
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

get
/v2/serverless

Get Endpoint

get

Get details of a specific endpoint

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

get
/v2/serverless/{id}

Get Task

get

Get details of a specific task

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
taskIdstringRequired
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

get
/v2/serverless/{id}/tasks/{taskId}

Get Task Count

get

Get task statistics grouped by status

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

get
/v2/serverless/{id}/tasks/count

List Workers

get

Get all workers of an endpoint

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Path parameters
idinteger · int64Required
Query parameters
statusListstring[]Optional
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

dataanyOptional

data

get
/v2/serverless/{id}/workers

Get GPU resource list V2

post

Get GPU resource list with supply capacity

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Body

ResourceGpuListSearchRequest

queryTypeinteger · int32Optional

Query Type (1:Single,2:Multiple)

Default: 2:Multiple
regionsstring[]Optional

Regions

gpuTypesstring[]Optional

GPU Types

minSingleCardVraminteger · int32 · max: 1536Required

Min single card vram unit: GB

maxSingleCardVraminteger · int32 · max: 1536Required

Max single card vram unit: GB

minSingleCardRaminteger · int32 · max: 1536Optional

Min single card ram unit: GB

maxSingleCardRaminteger · int32 · max: 1536Optional

Max single card ram unit: GB

podModelTypeinteger · int32Optional
Responses
chevron-right
200

OK

*/*
messagestringOptional

message

codeinteger · int32Optional

code

post
/api/v2/resource/gpu/list

Last updated

Was this helpful?