Elastic Deployment

Elastic Deployment is a flexible orchestration feature provided by the Yotta SaaS platform that enables users to quickly create, scale, and manage GPU-powered workloads based on custom images.

API List

Create Elastic Deployment

Endpoint: /openapi/v1/elastic/deploy/create

Method: POST

Description: Creates a new Elastic Deployment on the platform.

Authorizations

Name

Location

Type

Required

Description

X-API-Key

header

string

required

Your API Key

Request Body (JSON)

Name

Type

Required

Description

name

string

required

Elastic Deployment name

image

string

required

Docker Image

imageRegistry

string

optional

Image registry URL; defaults to Docker Hub. Example: https://registry.example.com

resources

Resource[]

required

GPU-related resources. Workers will be deployed on instances satisfying the specified resources. Multiple resource configurations can be specified. Note: Each Resource must specify the same gpuCount.

minSingleCardVramInGb

number

optional

Minimum VRAM required per single card (GB)

minSingleCardVcpu

number

optional

Minimum VCPU required per single card

minSingleCardRamInGb

number

optional

Minimum RAM required per single card (GB)

containerVolumeInGb

number

required

Container ephemeral volume size (GB)

credentialId

string

optional

Required if using a private image. To create a credential, refer to [Credential Module -> Create Credential].

workers

number

required

Number of workers to deploy

serviceMode

string

required

Supports three Service Modes: ALB: Provides service via HTTP requests. QUEUE: Tasks are submitted to the Yotta Queue service for asynchronous processing. CUSTOM: Neither HTTP requests nor the Queue service are required.

initializationCommand

string

required

Initialization command executed upon worker startup

environmentVars

EnvironmentVar[]

optional

Environment variables required for worker startup.

expose

Expose

optional

Port information exposed by the worker. If serviceMode=ALB, this is mandatory for your service's HTTP port. If serviceMode=ALB or CUSTOM, this is optional for your service's health check port.

Example — curl

curl --request POST \
--url 'https://api.yottalabs.ai/openapi/v1/elastic/deploy/create' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: <YOUR_API_KEY>' \
--data '{
    "name": "Llama-3.2-3B",
    "image": "vllm/vllm-openai:latest",
    "resources": [
        {
            "region": "us-west-1",
            "gpuType": "NVIDIA_L4_24G",
            "gpuCount": 1
        },
        {
            "region": "us-central-3",
            "gpuType": "NVIDIA_GeForce_RTX_5090_32G",
            "gpuCount": 1
        }
    ],
    "minSingleCardVramInGb": 32,
    "minSingleCardVcpu": 15,
    "minSingleCardRamInGb": 124,
    "containerVolumeInGb": 120,
    "workers": 1,
    "serviceMode": "ALB",
    "initializationCommand": "vllm serve meta-llama/Llama-3.2-3B-Instruct --served-model-name meta-llama/Llama-3.2-3B-Instruct --max-model-len 20480 --port 8000 --dtype half --gpu-memory-utilization 0.90 --tensor-parallel-size 1 --chat-template /vllm-workspace/examples/tool_chat_template_llama3.2_json.jinja",
    "environmentVars": [
        {
            "key": "HUGGING_FACE_HUB_TOKEN",
            "value": "<Your-Hugging_Face_Hub_Token>"
        },
        {
            "key": "VLLM_PORT",
            "value": "8000"
        },
        {
            "key": "CUDA_VISIBLE_DEVICES",
            "value": "0"
        }
    ],
    "expose": {
        "port": 8000,
        "protocol": "http"
    }
}'

Response

Returns an object containing data of type ElasticDeploymentDetail. (See Common Models below)

List Elastic Deployments

Endpoint: /openapi/v1/elastic/deploy/list

Method: GET

Description: Lists all Elastic Deployments under the current organization.

Authorizations

Name

Location

Type

Required

Description

X-API-Key

header

string

required

Your API Key

Query Parameters

Name

Type

Required

Description

statusList

string[]

optional

Filter by ElasticDeploymentStatus. Multiple values allowed: statusList=INITIALIZING&statusList=RUNNING&statusList=STOPPED

Example — curl

curl --request GET \
--url 'https://api.yottalabs.ai/openapi/v1/elastic/deploy/list?statusList=INITIALIZING&statusList=RUNNING&statusList=STOPPED' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: <YOUR_API_KEY>'

Response

Returns an object containing data as a list of ElasticDeploymentDetail. (See Common Models below)

Get Elastic Deployment Details

Endpoint: /openapi/v1/elastic/deploy/{id}

Method: GET

Description: Retrieves details of a specific Elastic Deployment.

Authorizations

Name

Location

Type

Required

Description

X-API-Key

header

string

required

Your API Key

Path Parameters

Name

Type

Required

Description

string

required

Elastic Deployment ID

Example — curl

curl --request GET \
--url 'https://api.yottalabs.ai/openapi/v1/elastic/deploy/384881505694897047' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: <YOUR_API_KEY>'

Response

Returns an object containing data of type ElasticDeploymentDetail. (See Common Models below)

Update Elastic Deployment

Endpoint: /openapi/v1/elastic/deploy/{id}/update

Method: POST

Description: Updates a specific Elastic Deployment. Currently, only Elastic Deployments in the STOPPED state can be updated.

Authorizations

Name

Location

Type

Required

Description

X-API-Key

header

string

required

Your API Key

Path Parameters

Name

Type

Required

Description

string

required

Elastic Deployment ID

Request Body (JSON)

Name

Type

Required

Description

name

string

required

Elastic Deployment name

resources

Resource[]

required

GPU-related resources. Workers will be deployed on instances satisfying the specified resources. Note: Each Resource must specify the same gpuCount.

minSingleCardVramInGb

number

optional

Minimum VRAM required per single card (GB)

minSingleCardVcpu

number

optional

Minimum VCPU required per single card

minSingleCardRamInGb

number

optional

Minimum RAM required per single card (GB)

containerVolumeInGb

number

required

Container ephemeral volume size (GB)

credentialId

string

optional

Required if using a private image.

workers

number

required

Number of workers to deploy

initializationCommand

string

optional

Initialization command executed upon worker startup

environmentVars

EnvironmentVar[]

optional

Environment variables required for worker startup.

expose

Expose

optional

Port information exposed by the worker.

Example — curl

curl --request POST \
--url 'https://api.yottalabs.ai/openapi/v1/elastic/deploy/386179372792811712/update' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: <YOUR_API_KEY>' \
--data '{
    "name": "Llama-3.2-3B",
    "resources": [
        {
            "region": "us-central-3",
            "gpuType": "NVIDIA_GeForce_RTX_5090_32G",
            "gpuCount": 1
        }
    ],
    "workers": 1,
    "minSingleCardRamInGb": 43,
    "containerVolumeInGb": 20,
    "initializationCommand": "vllm serve...",
    "environmentVars": [
        { "key": "HUGGING_FACE_HUB_TOKEN", "value": "hf_***" }
    ],
    "expose": {
        "port": 8000,
        "protocol": "http"
    }
  }'

Response

Returns an object containing data of type ElasticDeploymentDetail. (See Common Models below)

Stop Elastic Deployment

Endpoint: /openapi/v1/elastic/deploy/{id}/stop

Method: POST

Description: Stops the Elastic Deployment with the specified ID.

Authorizations

Name

Location

Type

Required

Description

X-API-Key

header

string

required

Your API Key

Path Parameters

Name

Type

Required

Description

string

required

Elastic Deployment ID

Example — curl

curl --request POST \
--url 'https://api.yottalabs.ai/openapi/v1/elastic/deploy/384812561523806429/stop' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: <YOUR_API_KEY>'

Response

Returns data: null on success.

Start Elastic Deployment

Endpoint: /openapi/v1/elastic/deploy/{id}/start

Method: POST

Description: Starts the Elastic Deployment with the specified ID.

Authorizations

Name

Location

Type

Required

Description

X-API-Key

header

string

required

Your API Key

Path Parameters

Name

Type

Required

Description

string

required

Elastic Deployment ID

Example — curl

curl --request POST \
--url 'https://api.yottalabs.ai/openapi/v1/elastic/deploy/384812561523806429/start' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: <YOUR_API_KEY>'

Response

Returns data: null on success.

Delete Elastic Deployment

Endpoint: /openapi/v1/elastic/deploy/{id}

Method: DELETE

Description: Deletes the Elastic Deployment with the specified ID. Note: The deployment must be in a STOPPED state to be deleted.

Authorizations

Name

Location

Type

Required

Description

X-API-Key

header

string

required

Your API Key

Path Parameters

Name

Type

Required

Description

string

required

Elastic Deployment ID

Example — curl

curl --request DELETE \
--url 'https://api.yottalabs.ai/openapi/v1/elastic/deploy/384812561523806429' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: <YOUR_API_KEY>'

Response

Returns data: null on success. If deletion is not allowed (e.g., status is not STOPPED), returns error code 24002.

Adjust Elastic Deployment Worker Count

Endpoint: /openapi/v1/elastic/deploy/{id}/workers

Method: POST

Description: Adjusts the number of Workers for the specified Elastic Deployment.

Authorizations

Name

Location

Type

Required

Description

X-API-Key

header

string

required

Your API Key

Path Parameters

Name

Type

Required

Description

string

required

Elastic Deployment ID

Request Body (JSON)

Name

Type

Required

Description

workers

number

required

Target worker count

Example — curl

curl --request POST \
--url 'https://api.yottalabs.ai/openapi/v1/elastic/deploy/384812561523806429/workers' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: <YOUR_API_KEY>' \
--data '{
    "workers": 2
}'

Response

Returns data: null on success.

List Elastic Deployment Workers

Endpoint: /openapi/v1/elastic/deploy/{id}/workers

Method: GET

Description: Retrieves the list of Workers for the specified Elastic Deployment.

Authorizations

Name

Location

Type

Required

Description

X-API-Key

header

string

required

Your API Key

Path Parameters

Name

Type

Required

Description

string

required

Elastic Deployment ID

Query Parameters

Name

Type

Required

Description

statusList

string[]

optional

Filter by WorkerStatus. Multiple values allowed: statusList=INITIALIZING&statusList=RUNNING

Example — curl

curl --request GET \
--url 'https://api.yottalabs.ai/openapi/v1/elastic/deploy/384812561523806429/workers?statusList=INITIALIZE&statusList=RUNNING' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: <YOUR_API_KEY>'

Response

Returns an object containing data as a list of ElasticDeploymentWorkerDetail. (See Common Models below)

Common Models

Resource

Name

Type

Required

Description

region

string

required

The region where the worker is deployed.

gpuType

string

required

The type of GPU required by the worker.

gpuCount

number

required

The number of GPU cards per worker.

ElasticDeploymentDetail

Name

Type

Description

string

Elastic Deployment ID (e.g., "378888638324150969")

name

string

Elastic Deployment Name (e.g., "Llama-3.2-3B")

creator

string

Creator's email

domain

string

ALB Gateway Domain; only exists for Deployments with ServiceMode=ALB.

imageRegistry

string

Image Registry URL; default is Docker Hub.

image

string

Image Name

resources

ResourceResponse[]

Region and GPU resource configuration.

minSingleCardVramInGb

number

Minimum VRAM required per single card (GB)

minSingleCardVcpu

number

Minimum VCPU required per single card

minSingleCardRamInGb

number

Minimum RAM required per single card (GB)

containerVolumeInGb

number

Container ephemeral volume size (GB)

credentialId

string

Credential ID

initializationCommand

string

Service startup command

environmentVars

EnvironmentVar[]

List of environment variables.

expose

Expose

Port exposure configuration.

totalWorkers

number

Total number of Workers

runningWorkers

number

Number of currently running Workers

cost

string

Accumulated cost

perSecondPrice

string

Price per second

perHourPrice

string

Price per hour

serviceMode

string

ALB, QUEUE, or CUSTOM

status

string

INITIALIZING, RUNNING, STOPPING, STOPPED, FAILED

ResourceResponse

Name

Type

Description

region

string

Worker's Region

regionDisplayName

string

Worker's Region Display Name

gpuType

string

GPU Type

gpuDisplayName

string

GPU Type Display Name

gpuCount

number

GPU Card Count

singleCardVramInGb

number

VRAM per single GPU card (GB)

singleCardVcpu

number

VCPU count per single GPU card

singleCardRamInGb

number

RAM per single GPU card (GB)

EnvironmentVar

Name

Type

Required

Description

key

string

required

Name of the environment variable

value

string

required

Value of the environment variable

Expose

Name

Type

Required

Description

port

number

required

Port number, range: 1 ~ 65535

protocol

string

required

Protocol type, supports: HTTP

ElasticDeploymentWorkerDetail

Name

Type

Description

string

Worker ID

region

string

Worker's Region

regionDisplayName

string

Worker's Region Display Name

gpuType

string

GPU Type

gpuDisplayName

string

GPU Type Display Name

gpuCount

number

GPU Card Count

singleCardVramInGb

number

VRAM per single GPU card (GB)

singleCardVcpu

number

VCPU count per single GPU card

singleCardRamInGb

number

RAM per single GPU card (GB)

uptime

string

Worker uptime (milliseconds)

cost

string

Cost incurred up to the current time

status

string

INITIALIZE, RUNNING, TERMINATING, TERMINATED, FAILED

Enums

ServiceMode

Name

Description

ALB

Service provided via HTTP requests, includes load balancing capabilities.

QUEUE

Tasks are submitted to the Yotta Queue service for asynchronous processing.

CUSTOM

Neither HTTP requests nor the Queue service are required.

ElasticDeploymentStatus

Name

Description

INITIALIZING

Initializing, starting up.

RUNNING

Running, service is operating normally.

STOPPING

Stopping.

STOPPED

Stopped.

FAILED

Failed.

WorkerStatus

Name

Description

INITIALIZE

Initializing, starting up.

RUNNING

Running, service is operating normally.

TERMINATING

Terminating.

TERMINATED

Terminated.

FAILED

Failed.

Response Codes

Code

Message

Description

10000

success

Request successful

10001

parameter error

Invalid parameter

24001

Elastic does not exist

Elastic Deployment does not exist

24002

Elastic action not allowed

The state of the Elastic Deployment does not allow this operation

24003

You’ve reached the limit of %s deployment

Organization deployment limit reached

24009

Insufficient GPUs available...

Insufficient GPUs available to start the requested number of workers.

Last updated 21 days ago

Was this helpful?

hashtagAPI List

hashtagCreate Elastic Deployment

hashtagList Elastic Deployments

hashtagGet Elastic Deployment Details

hashtagUpdate Elastic Deployment

hashtagStop Elastic Deployment

hashtagStart Elastic Deployment

hashtagDelete Elastic Deployment

hashtagAdjust Elastic Deployment Worker Count

hashtagList Elastic Deployment Workers

hashtagCommon Models

hashtagEnums

hashtagResponse Codes

API List

Create Elastic Deployment

List Elastic Deployments

Get Elastic Deployment Details

Update Elastic Deployment

Stop Elastic Deployment

Start Elastic Deployment

Delete Elastic Deployment

Adjust Elastic Deployment Worker Count

List Elastic Deployment Workers

Common Models

Enums

Response Codes