Elastic Deployment

Overview

Elastic Deployment is a flexible orchestration feature provided by the Yotta SaaS platform that enables users to quickly create, scale, and manage GPU-powered workloads based on custom images. It is designed to deliver elastic scaling, multi-region deployment, and disaster recovery, ensuring that your applications remain highly available, fault-tolerant, and performant.

Key Features

  • 🧩 Custom Image Deployment Launch Pods directly from your own container images to instantly start the computing environment you need.

  • ⚙️ Multi-Worker Deployment Create multiple Worker Instances with a single click to handle high-concurrency or large-scale computational workloads.

  • 🌍 Multi-Region Scheduling Pods can automatically distribute across different Regions, enabling cross-region deployment and improving overall system stability and fault tolerance.

  • 📈 Elastic Scaling Adjust the number of Workers at any time based on workload demand — scale up for high-load periods and scale down to reduce cost.

Typical Use Cases

  • AI Training & Inference Deploy multiple workers across regions to accelerate distributed AI workloads and improve throughput.

  • High-Availability Service Deployment Distribute services across multiple regions to eliminate single points of failure.

  • Data Processing & Computation Dynamically expand worker nodes to support distributed processing and failover resilience.

Core Advantages

  • 🌍 Cross-Region Elastic Distribution Supports multi-region resource scheduling for high availability and global workload balancing.

  • 🛡 Built-In Disaster Recovery Automatically fails over to available regions in case of a regional outage, ensuring continuous business operations.

Pricing and Billing

Elastic Deployment charges are calculated hourly, based on the actual compute and storage resources consumed.

Total Cost=(GPUrate×GPUcount+Storagerate×Storagesize)Total\ Cost= \sum(GPU_{rate} \times GPU_{count} + Storage_{rate}\times Storage_{size})

Billing Formula

Each hour’s cost includes:

  • GPU usage cost

  • Storage cost

Note: We don't have extra charge for network bandwidth

Example

If a user deploys the following resources:

  • 2 Workers, each configured with:

    • GPU: H100 × 2

    • Disk: 100 GB × 2

  • H100 GPU unit price: $1.85 / GPU / hour

  • Disk unit price: $0.001 / GB / hour

Total hourly cost = (2 × 2 × $1.85) + (100 × $0.001 + 100 × $0.001) = $8 + $2 = $7.6 / hour

Billing Characteristics

  • Pay-as-you-go: Billing stops immediately once resources are released.

  • Unified multi-region billing: The system aggregates resource usage across all regions.

  • Transparent reporting: Detailed per-worker billing breakdowns are available in the Details page of each Elastic Deployment.

Deductions and Balance Policy

  • Charges are deducted as soon as an Elastic Deployment starts running.

  • When your account balance approaches $0, all running Elastic Deployments will be automatically terminated.

Managing Elastic Deployments in the Console

  • Open the Yotta Console and navigate to: Compute → Elastic Deployment

  • The page lists all In Progress Elastic Deployments with the following statuses:

    Status
    Description

    Initializing

    Resources are being provisioned

    Running

    Deployment is active and accessible via endpoint

    Stopping

    Resources are being reclaimed

    Stopped

    Deployment is paused

  • You can search deployments by name (fuzzy search supported).

  • Click a deployment card to view its details.

Launching an Elastic Deployment

  • Navigate to Compute → Elastic Deployment in the left menu.

  • Click the Deploy button in the top-right corner to start the deployment wizard.

  • In the configuration form:

    • Fill in all required fields (marked with *)

    • Optionally set additional parameters for custom needs

    • Click Add GPU to select the GPU type for your workload

  • Recommendation: choose GPUs from multiple regions to maximize redundancy and disaster recovery

  • Click Deploy to launch your Elastic Deployment.

Accessing a Running Deployment

Only deployments in the Running state can be accessed.

  • Click the Running card to open the Deployment Details page and find the endpoint URL.

Example: Access via curl

curl --location 'https://2t5y1srid9mo.yottadeos.com/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
  "temperature": 0.5,
  "top_p": 0.9,
  "max_tokens": 256,
  "frequency_penalty": 0.3,
  "presence_penalty": 0.2,
  "repetition_penalty": 1.2,
  "model": "meta-llama/Llama-3.2-3B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": "Will the Federal Reserve continue cutting rates in Q4?"
    }
  ],
  "stream": false
}'

The base URL is unique to your deployment (e.g. https://2t5y1srid9mo.yottadeos.com). Replace the localhost in the URL of your local deployment to get the full URL (e.g. in the above example, it would be https://2t5y1srid9mo.yottadeos.com/v1/chat/completions) of your service managed by the elastic deployment.

Editing an Elastic Deployment

  • To edit:

    • Click the ⋮ (Menu) on the top-right corner of the card.

    • Select Edit to open the configuration modal.

    • Modify parameters as needed and click Save.

    • Restart the deployment via the Run button to apply the changes.

Scaling Workers Quickly

There are two ways to adjust workers' number to scale in/out your deployment. All scaling changes will be applied immediately to running deployments.

  • From the card’s top-right menu: select Scale Workers. Adjust the number of workers on the detail page and save.

  • From the Details page’s top-right menu: Click the ⋮ (Menu) and select Scale Workers. Then, adjust the number of workers on the detail page and save.

Viewing Billing Details

  • Navigate to Billing from the left menu.

  • The Billing page displays all Elastic Deployment usage and cost details, including per-worker hourly charges and historical summaries.

Summary

Elastic Deployment provides Yotta users with a powerful, flexible way to launch, scale, and manage distributed GPU workloads across multiple regions. Whether for AI model training, inference services, or large-scale data processing, Elastic Deployment ensures performance, reliability, and efficiency at global scale.

Last updated

Was this helpful?