Elastic Deployment

Elastic Endpoints

Create Endpoint

POST /v2/serverless

Request Body:

{
  "name": "llama-inference",
  "image": "vllm/vllm-openai:latest",
  "containerRegistryAuthId": 123,
  "resources": [
    {
      "region": "us-east-1",
      "gpuType": "NVIDIA_RTX_4090_24G",
      "gpuCount": 1
    }
  ],
  "workers": 2,
  "containerVolumeInGb": 120,
  "environmentVars": [
    { "key": "MODEL", "value": "llama-3-8b" }
  ],
  "expose": {
    "port": 8000,
    "protocol": "http"
  },
  "serviceMode": "QUEUE",
  "webhook": "https://webhook.example.com/status"
}

FIELD

TYPE

REQUIRED

DESCRIPTION

name

string

Yes

Endpoint name (max 20 chars, letters first)

imageRegistry

string

Docker registry URL (default: Docker Hub)

image

string

Yes

Docker image

containerRegistryAuthId

long

Container registry credential ID (for private images)

resources

array

Yes

GPU resources

workers

integer

Yes

Number of workers

containerVolumeInGb

integer

Yes

Min 20 GB

environmentVars

array

Environment variables

expose

object

Port exposure (see below)

expose.port

integer

Yes*

Container port (1-65535)

expose.protocol

string

Yes*

Protocol type (e.g., "http", "tcp")

serviceMode

string

Yes

ALB, QUEUE, or CUSTOM

webhook

string

Webhook URL for worker status notifications (max 512 chars)

* Required when expose is provided.

Response:

{
  "message": "success",
  "code": 10000,
  "data": {
    "id": "378888638324150969",
    "name": "llama-inference",
    "image": "vllm/vllm-openai:latest",
    "status": "RUNNING",
    "totalWorkers": 2,
    "runningWorkers": 2,
    "cost": 1.5,
    "serviceMode": "QUEUE",
    "webhook": null
  }
}

Get Endpoint by ID

GET /v2/serverless/{id}

Response: Same as create

List Endpoints

GET /v2/serverless

Query Parameters:

statusList - Filter by status (comma-separated)

Update Endpoint

PATCH /v2/serverless/{id}

Request Body:

{
  "name": "updated-name",
  "resources": [
    {
      "region": "us-east-1",
      "gpuType": "NVIDIA_RTX_4090_24G",
      "gpuCount": 1
    }
  ],
  "workers": 4,
  "containerVolumeInGb": 120,
  "envVars": [
    { "key": "MODEL", "value": "llama-3-70b" }
  ]
}

FIELD

TYPE

REQUIRED

DESCRIPTION

name

string

Yes

Endpoint name (max 20 chars, letters first)

resources

array

Yes

GPU resources

workers

integer

Yes

Number of workers (min 1)

containerVolumeInGb

integer

Yes

Min 20 GB

minSingleCardVramInGb

integer

Minimum GPU single card VRAM in GB

minSingleCardVcpu

integer

Minimum GPU single card vCPU count

minSingleCardRamInGb

integer

Minimum GPU single card RAM in GB

credentialId

integer

Container registry credential ID

initializationCommand

string

Initialization command

environmentVars

array

Environment variables

expose

object

Port exposure (see Section 4.1)

webhook

string

Webhook URL for worker status notifications (max 512 chars)

Endpoint Actions

POST /v2/serverless/{id}/stop
POST /v2/serverless/{id}/start
DELETE /v2/serverless/{id}

Scale Workers

PUT /v2/serverless/{id}/workers?count=4

List Workers

GET /v2/serverless/{id}/workers

Query Parameters:

statusList - Filter by worker status (comma-separated)

Tasks API (QUEUE mode)

The Tasks API enables QUEUE-mode endpoints to process asynchronous workloads. Tasks are submitted to an endpoint and processed by available workers.

Submit Task

POST /v2/serverless/{id}/tasks

Submit a task to a QUEUE-mode endpoint. The task will be queued and picked up by an available worker.

Request Body:

{
  "taskId": "my_task_001",
  "input": { "prompt": "Hello, world!" },
  "workerPort": 8000,
  "processUri": "/v1/chat/completions",
  "webhook": "https://webhook.example.com/callback",
  "webhookAuthKey": "my-secret-key",
  "headers": {
    "Authorization": "Bearer token123"
  }
}

FIELD

TYPE

REQUIRED

DESCRIPTION

taskId

string

User-defined task ID (alphanumeric + underscore, max 255). Auto-generated UUID if omitted

input

object

Yes

Task input data (any JSON structure)

workerPort

integer

Yes

Worker port to forward the task to (1-65535)

processUri

string

Yes

Process URI on the worker (max 255 chars)

webhook

string

Webhook URL for async result delivery (max 512 chars)

webhookAuthKey

string

Webhook authentication key (max 255 chars)

headers

map

Headers to forward with the task request

Response:

{
  "message": "success",
  "code": 10000,
  "data": {
    "taskId": "my_task_001"
  }
}

Note: Only QUEUE-mode endpoints accept task submission. Submitting to a non-QUEUE endpoint returns a parameter error.

Get Task by ID

GET /v2/serverless/{id}/tasks/{taskId}

Retrieve full details of a specific task, including input/output data and headers.

Response:

{
  "message": "success",
  "code": 10000,
  "data": {
    "taskId": "my_task_001",
    "endpointId": 456,
    "endpointName": "llama-inference",
    "status": "SUCCESS",
    "workerUrl": "https://worker.example.com",
    "webhook": "https://webhook.example.com/callback",
    "deliveryStatus": "SUCCESS",
    "deliveryAttempts": 1,
    "error": null,
    "input": { "prompt": "Hello, world!" },
    "output": { "response": "Hi there!" },
    "headers": { "Authorization": "Bearer token123" },
    "createdAt": 1705306200000,
    "updatedAt": 1705306260000,
    "deliveredAt": 1705306260000
  }
}

List Tasks

GET /v2/serverless/{id}/tasks

Query Parameters:

status - Filter by task status: PROCESSING, DELIVERED, SUCCESS, FAILED
pageNumber - Page number (default: 1)
pageSize - Items per page (default: 10)

Response:

{
  "message": "success",
  "code": 10000,
  "data": {
    "items": [
      {
        "taskId": "my_task_001",
        "endpointId": 456,
        "endpointName": "llama-inference",
        "status": "SUCCESS",
        "workerUrl": "https://worker.example.com",
        "webhook": "https://webhook.example.com/callback",
        "deliveryStatus": "SUCCESS",
        "deliveryAttempts": 1,
        "error": null,
        "createdAt": 1705306200000,
        "deliveredAt": 1705306260000,
        "updatedAt": 1705306260000
      }
    ],
    "page": 1,
    "size": 10,
    "total": 42,
    "pages": 5
  }
}

Get Task Count

GET /v2/serverless/{id}/tasks/count

Response:

{
  "message": "success",
  "code": 10000,
  "data": {
    "processing": 10
  }
}

Note: Currently only processing count is available. Additional status counts (total, delivered, success, failed) will be added in a future release.

Task Status Values:

STATUS

DESCRIPTION

PROCESSING

Task is being executed

DELIVERED

Result delivered to webhook

SUCCESS

Task completed successfully

FAILED

Task execution failed

Delivery Status Values:

STATUS

DESCRIPTION

INIT

Not yet sent

SUCCESS

Webhook delivered successfully

FAILED

Webhook delivery failed

MAX_RETRIES_EXCEEDED

Exceeded maximum retry attempts

Worker Logs API (Yotta Extension)

GET /v2/serverless/{id}/workers/{workerId}/logs

Retrieves logs from a specific worker (pod) belonging to an endpoint.

Query Parameters:

PARAMETER

TYPE

DEFAULT

MAX

DESCRIPTION

pageSize

integer

100

1000

Number of log entries to return

keyword

string

Filter logs by keyword

startTime

string

Start time (epoch ms or ISO 8601)

endTime

string

End time (epoch ms or ISO 8601)

searchAfterTime

string

Pagination token (timestamp)

searchAfterOffset

long

Pagination token (offset)

direction

string

Forward

"Forward" or "Backward"

Response:

{
  "message": "success",
  "code": 10000,
  "data": {
    "logs": [
      {
        "timestamp": "2024-01-15T10:30:15.123Z",
        "log": "Starting inference service...",
        "offset": 12345
      }
    ],
    "hasMore": true,
    "nextSearchAfterTime": "1705306215123",
    "nextSearchAfterOffset": "12400"
  }
}

Notes:

This is a Yotta-specific extension for log retrieval
Supports pagination with search_after tokens for efficient log traversal
Use direction=Backward with nextSearchAfter* tokens to page through older logs
Use direction=Forward with nextSearchAfter* tokens to page through newer logs
Timestamps support both epoch milliseconds and ISO 8601 format

Last updated 6 days ago

Was this helpful?

hashtagElastic Endpoints

hashtagCreate Endpoint

hashtagGet Endpoint by ID

hashtagList Endpoints

hashtagUpdate Endpoint

hashtagEndpoint Actions

hashtagScale Workers

hashtagList Workers

hashtagTasks API (QUEUE mode)

Elastic Endpoints

Create Endpoint

Get Endpoint by ID

List Endpoints

Update Endpoint

Endpoint Actions

Scale Workers

List Workers

Tasks API (QUEUE mode)