Elastic Deployment
Elastic Deployment is a flexible orchestration feature provided by the Yotta SaaS platform that enables users to quickly create, scale, and manage GPU-powered workloads based on custom images.
API List
Create Elastic Deployment
Endpoint: /openapi/v1/elastic/deploy/create
Method: POST
Description: Creates a new Elastic Deployment on the platform.
Authorizations
X-API-Key
header
string
required
Your API Key
Request Body (JSON)
name
string
required
Elastic Deployment name
image
string
required
Docker Image
imageRegistry
string
optional
Image registry URL; defaults to Docker Hub.
Example: https://registry.example.com
resources
Resource[]
required
GPU-related resources. Workers will be deployed on instances satisfying the specified resources. Multiple resource configurations can be specified.
Note: Each Resource must specify the same gpuCount.
minSingleCardVramInGb
number
optional
Minimum VRAM required per single card (GB)
minSingleCardVcpu
number
optional
Minimum VCPU required per single card
minSingleCardRamInGb
number
optional
Minimum RAM required per single card (GB)
containerVolumeInGb
number
required
Container ephemeral volume size (GB)
credentialId
string
optional
Required if using a private image. To create a credential, refer to [Credential Module -> Create Credential].
workers
number
required
Number of workers to deploy
serviceMode
string
required
Supports three Service Modes: ALB: Provides service via HTTP requests. QUEUE: Tasks are submitted to the Yotta Queue service for asynchronous processing. CUSTOM: Neither HTTP requests nor the Queue service are required.
initializationCommand
string
required
Initialization command executed upon worker startup
environmentVars
EnvironmentVar[]
optional
Environment variables required for worker startup.
expose
Expose
optional
Port information exposed by the worker.
If serviceMode=ALB, this is mandatory for your service's HTTP port.
If serviceMode=ALB or CUSTOM, this is optional for your service's health check port.
Example — curl
Response
Returns an object containing data of type ElasticDeploymentDetail. (See Common Models below)
List Elastic Deployments
Endpoint: /openapi/v1/elastic/deploy/list
Method: GET
Description: Lists all Elastic Deployments under the current organization.
Authorizations
X-API-Key
header
string
required
Your API Key
Query Parameters
statusList
string[]
optional
Filter by ElasticDeploymentStatus. Multiple values allowed:
statusList=INITIALIZING&statusList=RUNNING&statusList=STOPPED
Example — curl
Response
Returns an object containing data as a list of ElasticDeploymentDetail. (See Common Models below)
Get Elastic Deployment Details
Endpoint: /openapi/v1/elastic/deploy/{id}
Method: GET
Description: Retrieves details of a specific Elastic Deployment.
Authorizations
X-API-Key
header
string
required
Your API Key
Path Parameters
id
string
required
Elastic Deployment ID
Example — curl
Response
Returns an object containing data of type ElasticDeploymentDetail. (See Common Models below)
Update Elastic Deployment
Endpoint: /openapi/v1/elastic/deploy/{id}/update
Method: POST
Description: Updates a specific Elastic Deployment. Currently, only Elastic Deployments in the STOPPED state can be updated.
Authorizations
X-API-Key
header
string
required
Your API Key
Path Parameters
id
string
required
Elastic Deployment ID
Request Body (JSON)
name
string
required
Elastic Deployment name
resources
Resource[]
required
GPU-related resources. Workers will be deployed on instances satisfying the specified resources.
Note: Each Resource must specify the same gpuCount.
minSingleCardVramInGb
number
optional
Minimum VRAM required per single card (GB)
minSingleCardVcpu
number
optional
Minimum VCPU required per single card
minSingleCardRamInGb
number
optional
Minimum RAM required per single card (GB)
containerVolumeInGb
number
required
Container ephemeral volume size (GB)
credentialId
string
optional
Required if using a private image.
workers
number
required
Number of workers to deploy
initializationCommand
string
optional
Initialization command executed upon worker startup
environmentVars
EnvironmentVar[]
optional
Environment variables required for worker startup.
expose
Expose
optional
Port information exposed by the worker.
Example — curl
Response
Returns an object containing data of type ElasticDeploymentDetail. (See Common Models below)
Stop Elastic Deployment
Endpoint: /openapi/v1/elastic/deploy/{id}/stop
Method: POST
Description: Stops the Elastic Deployment with the specified ID.
Authorizations
X-API-Key
header
string
required
Your API Key
Path Parameters
id
string
required
Elastic Deployment ID
Example — curl
Response
Returns data: null on success.
Start Elastic Deployment
Endpoint: /openapi/v1/elastic/deploy/{id}/start
Method: POST
Description: Starts the Elastic Deployment with the specified ID.
Authorizations
X-API-Key
header
string
required
Your API Key
Path Parameters
id
string
required
Elastic Deployment ID
Example — curl
Response
Returns data: null on success.
Delete Elastic Deployment
Endpoint: /openapi/v1/elastic/deploy/{id}
Method: DELETE
Description: Deletes the Elastic Deployment with the specified ID. Note: The deployment must be in a STOPPED state to be deleted.
Authorizations
X-API-Key
header
string
required
Your API Key
Path Parameters
id
string
required
Elastic Deployment ID
Example — curl
Response
Returns data: null on success. If deletion is not allowed (e.g., status is not STOPPED), returns error code 24002.
Adjust Elastic Deployment Worker Count
Endpoint: /openapi/v1/elastic/deploy/{id}/workers
Method: POST
Description: Adjusts the number of Workers for the specified Elastic Deployment.
Authorizations
X-API-Key
header
string
required
Your API Key
Path Parameters
id
string
required
Elastic Deployment ID
Request Body (JSON)
workers
number
required
Target worker count
Example — curl
Response
Returns data: null on success.
List Elastic Deployment Workers
Endpoint: /openapi/v1/elastic/deploy/{id}/workers
Method: GET
Description: Retrieves the list of Workers for the specified Elastic Deployment.
Authorizations
X-API-Key
header
string
required
Your API Key
Path Parameters
id
string
required
Elastic Deployment ID
Query Parameters
statusList
string[]
optional
Filter by WorkerStatus. Multiple values allowed:
statusList=INITIALIZING&statusList=RUNNING
Example — curl
Response
Returns an object containing data as a list of ElasticDeploymentWorkerDetail. (See Common Models below)
Common Models
Resource
region
string
required
The region where the worker is deployed.
gpuType
string
required
The type of GPU required by the worker.
gpuCount
number
required
The number of GPU cards per worker.
ElasticDeploymentDetail
id
string
Elastic Deployment ID (e.g., "378888638324150969")
name
string
Elastic Deployment Name (e.g., "Llama-3.2-3B")
creator
string
Creator's email
domain
string
ALB Gateway Domain; only exists for Deployments with ServiceMode=ALB.
imageRegistry
string
Image Registry URL; default is Docker Hub.
image
string
Image Name
resources
ResourceResponse[]
Region and GPU resource configuration.
minSingleCardVramInGb
number
Minimum VRAM required per single card (GB)
minSingleCardVcpu
number
Minimum VCPU required per single card
minSingleCardRamInGb
number
Minimum RAM required per single card (GB)
containerVolumeInGb
number
Container ephemeral volume size (GB)
credentialId
string
Credential ID
initializationCommand
string
Service startup command
environmentVars
EnvironmentVar[]
List of environment variables.
expose
Expose
Port exposure configuration.
totalWorkers
number
Total number of Workers
runningWorkers
number
Number of currently running Workers
cost
string
Accumulated cost
perSecondPrice
string
Price per second
perHourPrice
string
Price per hour
serviceMode
string
ALB, QUEUE, or CUSTOM
status
string
INITIALIZING, RUNNING, STOPPING, STOPPED, FAILED
ResourceResponse
region
string
Worker's Region
regionDisplayName
string
Worker's Region Display Name
gpuType
string
GPU Type
gpuDisplayName
string
GPU Type Display Name
gpuCount
number
GPU Card Count
singleCardVramInGb
number
VRAM per single GPU card (GB)
singleCardVcpu
number
VCPU count per single GPU card
singleCardRamInGb
number
RAM per single GPU card (GB)
EnvironmentVar
key
string
required
Name of the environment variable
value
string
required
Value of the environment variable
Expose
port
number
required
Port number, range: 1 ~ 65535
protocol
string
required
Protocol type, supports: HTTP
ElasticDeploymentWorkerDetail
id
string
Worker ID
region
string
Worker's Region
regionDisplayName
string
Worker's Region Display Name
gpuType
string
GPU Type
gpuDisplayName
string
GPU Type Display Name
gpuCount
number
GPU Card Count
singleCardVramInGb
number
VRAM per single GPU card (GB)
singleCardVcpu
number
VCPU count per single GPU card
singleCardRamInGb
number
RAM per single GPU card (GB)
uptime
string
Worker uptime (milliseconds)
cost
string
Cost incurred up to the current time
status
string
INITIALIZE, RUNNING, TERMINATING, TERMINATED, FAILED
Enums
ServiceMode
ALB
Service provided via HTTP requests, includes load balancing capabilities.
QUEUE
Tasks are submitted to the Yotta Queue service for asynchronous processing.
CUSTOM
Neither HTTP requests nor the Queue service are required.
ElasticDeploymentStatus
INITIALIZING
Initializing, starting up.
RUNNING
Running, service is operating normally.
STOPPING
Stopping.
STOPPED
Stopped.
FAILED
Failed.
WorkerStatus
INITIALIZE
Initializing, starting up.
RUNNING
Running, service is operating normally.
TERMINATING
Terminating.
TERMINATED
Terminated.
FAILED
Failed.
Response Codes
10000
success
Request successful
10001
parameter error
Invalid parameter
24001
Elastic does not exist
Elastic Deployment does not exist
24002
Elastic action not allowed
The state of the Elastic Deployment does not allow this operation
24003
You’ve reached the limit of %s deployment
Organization deployment limit reached
24009
Insufficient GPUs available...
Insufficient GPUs available to start the requested number of workers.
Last updated
Was this helpful?