Elastic Deployment

Elastic Endpoints

Create Endpoint

POST /v2/serverless

Request Body:

{
  "name": "llama-inference",
  "image": "vllm/vllm-openai:latest",
  "containerRegistryAuthId": 123,
  "resources": [
    {
      "region": "us-east-1",
      "gpuType": "NVIDIA_RTX_4090_24G",
      "gpuCount": 1
    }
  ],
  "workers": 2,
  "containerVolumeInGb": 120,
  "environmentVars": [
    { "key": "MODEL", "value": "llama-3-8b" }
  ],
  "expose": {
    "port": 8000,
    "protocol": "http"
  },
  "serviceMode": "QUEUE",
  "webhook": "https://webhook.example.com/status"
}

FIELD

TYPE

REQUIRED

DESCRIPTION

name

string

Yes

Endpoint name (max 20 chars, letters first)

imageRegistry

string

No

Docker registry URL (default: Docker Hub)

image

string

Yes

Docker image

containerRegistryAuthId

long

No

Container registry credential ID (for private images)

resources

array

Yes

GPU resources

workers

integer

Yes

Number of workers

containerVolumeInGb

integer

Yes

Min 20 GB

environmentVars

array

No

Environment variables

expose

object

No

Port exposure (see below)

expose.port

integer

Yes*

Container port (1-65535)

expose.protocol

string

Yes*

Protocol type (e.g., "http", "tcp")

serviceMode

string

Yes

ALB, QUEUE, or CUSTOM

webhook

string

No

Webhook URL for worker status notifications (max 512 chars)

* Required when expose is provided.

Response:

Get Endpoint by ID

Response: Same as create

List Endpoints

Query Parameters:

  • statusList - Filter by status (comma-separated)

Update Endpoint

Request Body:

FIELD

TYPE

REQUIRED

DESCRIPTION

name

string

Yes

Endpoint name (max 20 chars, letters first)

resources

array

Yes

GPU resources

workers

integer

Yes

Number of workers (min 1)

containerVolumeInGb

integer

Yes

Min 20 GB

minSingleCardVramInGb

integer

No

Minimum GPU single card VRAM in GB

minSingleCardVcpu

integer

No

Minimum GPU single card vCPU count

minSingleCardRamInGb

integer

No

Minimum GPU single card RAM in GB

credentialId

integer

No

Container registry credential ID

initializationCommand

string

No

Initialization command

environmentVars

array

No

Environment variables

expose

object

No

Port exposure (see Section 4.1)

webhook

string

No

Webhook URL for worker status notifications (max 512 chars)

Endpoint Actions

Scale Workers

List Workers

Query Parameters:

  • statusList - Filter by worker status (comma-separated)

Tasks API (QUEUE mode)

The Tasks API enables QUEUE-mode endpoints to process asynchronous workloads. Tasks are submitted to an endpoint and processed by available workers.

Submit Task

Submit a task to a QUEUE-mode endpoint. The task will be queued and picked up by an available worker.

Request Body:

FIELD

TYPE

REQUIRED

DESCRIPTION

taskId

string

No

User-defined task ID (alphanumeric + underscore, max 255). Auto-generated UUID if omitted

input

object

Yes

Task input data (any JSON structure)

workerPort

integer

Yes

Worker port to forward the task to (1-65535)

processUri

string

Yes

Process URI on the worker (max 255 chars)

webhook

string

No

Webhook URL for async result delivery (max 512 chars)

webhookAuthKey

string

No

Webhook authentication key (max 255 chars)

headers

map

No

Headers to forward with the task request

Response:

Note: Only QUEUE-mode endpoints accept task submission. Submitting to a non-QUEUE endpoint returns a parameter error.

Get Task by ID

Retrieve full details of a specific task, including input/output data and headers.

Response:

List Tasks

Query Parameters:

  • status - Filter by task status: PROCESSING, DELIVERED, SUCCESS, FAILED

  • pageNumber - Page number (default: 1)

  • pageSize - Items per page (default: 10)

Response:

Get Task Count

Response:

Note: Currently only processing count is available. Additional status counts (total, delivered, success, failed) will be added in a future release.

Task Status Values:

STATUS

DESCRIPTION

PROCESSING

Task is being executed

DELIVERED

Result delivered to webhook

SUCCESS

Task completed successfully

FAILED

Task execution failed

Delivery Status Values:

STATUS

DESCRIPTION

INIT

Not yet sent

SUCCESS

Webhook delivered successfully

FAILED

Webhook delivery failed

MAX_RETRIES_EXCEEDED

Exceeded maximum retry attempts

Worker Logs API (Yotta Extension)

Retrieves logs from a specific worker (pod) belonging to an endpoint.

Query Parameters:

PARAMETER

TYPE

DEFAULT

MAX

DESCRIPTION

pageSize

integer

100

1000

Number of log entries to return

keyword

string

-

-

Filter logs by keyword

startTime

string

-

-

Start time (epoch ms or ISO 8601)

endTime

string

-

-

End time (epoch ms or ISO 8601)

searchAfterTime

string

-

-

Pagination token (timestamp)

searchAfterOffset

long

-

-

Pagination token (offset)

direction

string

Forward

-

"Forward" or "Backward"

Response:

Notes:

  • This is a Yotta-specific extension for log retrieval

  • Supports pagination with search_after tokens for efficient log traversal

  • Use direction=Backward with nextSearchAfter* tokens to page through older logs

  • Use direction=Forward with nextSearchAfter* tokens to page through newer logs

  • Timestamps support both epoch milliseconds and ISO 8601 format

Last updated

Was this helpful?