Elastic Deployment
Overview
Elastic Deployment is a flexible orchestration feature provided by the Yotta SaaS platform that enables users to quickly create, scale, and manage GPU-powered workloads based on custom images. It is designed to deliver elastic scaling, multi-region deployment, and disaster recovery, ensuring that your applications remain highly available, fault-tolerant, and performant.
Key Features
🧩 Custom Image Deployment Launch Pods directly from your own container images to instantly start the computing environment you need.
⚙️ Multi-Worker Deployment Create multiple Worker Instances with a single click to handle high-concurrency or large-scale computational workloads.
🌍 Multi-Region Scheduling Pods can automatically distribute across different Regions, enabling cross-region deployment and improving overall system stability and fault tolerance.
📈 Elastic Scaling Adjust the number of Workers at any time based on workload demand — scale up for high-load periods and scale down to reduce cost.
Typical Use Cases
AI Training & Inference Deploy multiple workers across regions to accelerate distributed AI workloads and improve throughput.
High-Availability Service Deployment Distribute services across multiple regions to eliminate single points of failure.
Data Processing & Computation Dynamically expand worker nodes to support distributed processing and failover resilience.
Core Advantages
🌍 Cross-Region Elastic Distribution Supports multi-region resource scheduling for high availability and global workload balancing.
🛡 Built-In Disaster Recovery Automatically fails over to available regions in case of a regional outage, ensuring continuous business operations.
Pricing and Billing
Elastic Deployment charges are calculated hourly, based on the actual compute and storage resources consumed.
Billing Formula
Each hour’s cost includes:
GPU usage cost
Storage cost
Note: We don't have extra charge for network bandwidth
Example
If a user deploys the following resources:
2 Workers, each configured with:
GPU: H100 × 2
Disk: 100 GB × 2
H100 GPU unit price: $1.85 / GPU / hour
Disk unit price: $0.001 / GB / hour
Total hourly cost = (2 × 2 × $1.85) + (100 × $0.001 + 100 × $0.001) = $8 + $2 = $7.6 / hour
Billing Characteristics
Pay-as-you-go: Billing stops immediately once resources are released.
Unified multi-region billing: The system aggregates resource usage across all regions.
Transparent reporting: Detailed per-worker billing breakdowns are available in the Details page of each Elastic Deployment.
Deductions and Balance Policy
Charges are deducted as soon as an Elastic Deployment starts running.
When your account balance approaches $0, all running Elastic Deployments will be automatically terminated.
Managing Elastic Deployments in the Console
Open the Yotta Console and navigate to: Compute → Elastic Deployment
The page lists all In Progress Elastic Deployments with the following statuses:
StatusDescriptionInitializing
Resources are being provisioned
Running
Deployment is active and accessible via endpoint
Stopping
Resources are being reclaimed
Stopped
Deployment is paused
You can search deployments by name (fuzzy search supported).

Click a deployment card to view its details.


Launching an Elastic Deployment
Navigate to Compute → Elastic Deployment in the left menu.

Click the Deploy button in the top-right corner to start the deployment wizard.

In the configuration form:
Fill in all required fields (marked with *)
Optionally set additional parameters for custom needs
Click Add GPU to select the GPU type for your workload


Recommendation: choose GPUs from multiple regions to maximize redundancy and disaster recovery

Click Deploy to launch your Elastic Deployment.

Accessing a Running Deployment
Click the Running card to open the Deployment Details page and find the endpoint URL.


Example: Access via curl
Each request must include the header: Authorization: Bearer <YOUR_API_KEY>
API keys can be found in Settings → Access Keys.
curl --location 'https://2t5y1srid9mo.yottadeos.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"temperature": 0.5,
"top_p": 0.9,
"max_tokens": 256,
"frequency_penalty": 0.3,
"presence_penalty": 0.2,
"repetition_penalty": 1.2,
"model": "meta-llama/Llama-3.2-3B-Instruct",
"messages": [
{
"role": "user",
"content": "Will the Federal Reserve continue cutting rates in Q4?"
}
],
"stream": false
}'Editing an Elastic Deployment
Only deployments in the Stopped state can be edited.
To edit:
Click the ⋮ (Menu) on the top-right corner of the card.
Select Edit to open the configuration modal.
Modify parameters as needed and click Save.
Restart the deployment via the Run button to apply the changes.



Scaling Workers Quickly
There are two ways to adjust workers' number to scale in/out your deployment. All scaling changes will be applied immediately to running deployments.
From the card’s top-right menu: select Scale Workers. Adjust the number of workers on the detail page and save.


From the Details page’s top-right menu: Click the ⋮ (Menu) and select Scale Workers. Then, adjust the number of workers on the detail page and save.
Scaling operations apply immediately to running deployments.
Viewing Billing Details
Navigate to Billing from the left menu.
The Billing page displays all Elastic Deployment usage and cost details, including per-worker hourly charges and historical summaries.

Summary
Elastic Deployment provides Yotta users with a powerful, flexible way to launch, scale, and manage distributed GPU workloads across multiple regions. Whether for AI model training, inference services, or large-scale data processing, Elastic Deployment ensures performance, reliability, and efficiency at global scale.
Last updated
Was this helpful?