boltGPU Pods

Pods are the core compute units on Yotta Platform. They allow users to deploy, manage, and connect to isolated GPU workloads across various hardware through the Console or the API interface.

💻 Managing Pods via Console

Pod Page Overview

When entering the Pods page:

  • The system by default displays Pods in In Progress state, including:

    • Initializing – Resources are being allocated; the Pod is deploying.If you

    • Running – The Pod is running normally.

    • Terminating – The Pod is being terminated; resources are being reclaimed.

  • There are buttons you can use at the bottom:

    • 🔗 Connect You can connect your machine to specific ports, such as 8888 for Jupyter Notebook. We currently support both SSH and HTTP ports.

    • 🗒️ Log You can check the container logs to view its current status and identify any errors or issues.

    • 📈Metrics This provides real-time monitoring of GPU, CPU, memory, and storage usage to help you track system performance and resource utilization.

  • Click the History tab to view Pods that have completed within the last 24 hours, which includes:

    • Terminated – The Pod has been deleted.

    • Failed – Deployment failed. Common causes:

      • Insufficient system resources

      • Invalid image configuration

  • There is a search bar where you can use Pod name to find your Pod (fuzzy search supported). You can also use Pod Status or GPU Type to filter Pods.

⚙️ Deploying a Pod

Step-by-Step Guide

  • Navigate to: Compute → Pods

  • Click Deploy (top right). You’ll enter the GPU Selection page.

  • Select GPU Type

    • Choose a GPU model suitable for your workload.

  • Configure Pod

    • Fill in required parameters (fields marked with * are mandatory).

Image Requirements

Click Edit next to image name to further configure your image. We provided a list of official images compiled by Yotta Labs. Also, we allow users to select custom images including both Public Images and Private Images.

Here are are few requirements if you want to build your custom image:

  • Must be compiled for x86 architecture

  • Must be Debian/Ubuntu

  1. Deploy Click Deploy to complete the process.

💾 System Volume

The System Volume would automatically mount a list of system directories on the created Pod. This ensures that software, configurations, and data stored within these directories are persistent even if the Pod is edited or restarted.

Supported Directories

Read and write operations to the following directories will be persistent:

Directory

Brief Description

/home

User home directories; user-level configs and data.

/root

Root user home directory; scripts and temp data.

/var

Variable files (logs, caches, runtime data).

/run

Runtime status files (PIDs, sockets).

/etc

System and service configuration files.

/usr

System-level apps, libraries, and runtime components.

Size Requirements

To ensure that the Pod can launch and run smoothly, we recommend using the following rule to decide the size of your system volume:

The size of the system volume needs to ≥ Image Size × 3

Example:

  • Image: PyTorch base image (10 GiB)

  • Recommended System Volume Size: At least 30 GiB

circle-info

The "For Development" button is automatically turned on when you are creating a pod.

You can find it and change the settings beside the pod name bar.

System volume is set to 100GB by default.

  • Preserving Environments: Retaining toolchains or dependencies (e.g., pip packages) after a Pod rebuild.

  • Persisting Configurations: Saving changes made in /etc.

  • Retaining Logs: Keeping logs in /var for a specific period.

🔌 Connecting to Your Pod

Once the Pod is launched:

  • Click the Connect button on the Pod card to view exposed services.

  • Availability depends on the port configuration defined at deployment.

  • When the container port is Ready, the status will update automatically.

📜 Viewing Logs

  • Click Logs on the Pod card to view both:

    • System Logs (platform-level)

    • Container Logs (application-level)

This helps with debugging deployment or runtime issues.

🧊 Pausing or Terminating Pods

🔸 Pause

If you only need to suspend temporarily:

  • Click Pause on the Pod card.

  • Only Volume storage will continue to incur charges.

  • You can Run to restart anytime.

  • Pods can be edited while paused.

🔸 Terminate

If you want to remove the Pod completely:

  • Click the “...” on the Pod card → choose Terminate.

  • The Pod will be permanently deleted and no longer billed.

  • Terminated Pods cannot be edited or restarted.

✏️ Editing a Pod

  1. Go to Compute → Pods and locate the Pod.

  2. Click Pause and wait until the Pod enters Stopped state.

  3. Click “...” → Edit, modify configurations, and save.

  4. Click Run to restart the Pod with the new settings.

📈 Pod Status Reference

Status
Description

Initialize

Resource allocation in progress; Pod deploying

Running

Pod is running

Stopping

Pausing in progress; resources reclaiming

Stopped

Pod is paused

Terminating

Termination in progress; resources reclaiming

Terminated

Pod fully terminated

Failed

Deployment failed (insufficient resources / invalid image)

💰 Pricing & Billing

Formula

Deduction Rules

  • Billing starts once the Pod is Running.

  • When balance nears $0, all active Pods will be terminated automatically.

  • To avoid charges:

    • Use Pause to temporarily suspend (still charges for persistent volumes).

    • Use Terminate to completely stop billing.

Action
Billing Behavior

Pause

Charges continue for Volumes (Stopped state)

Terminate

No charges (Terminated state)


🧩 Managing Pods via OpenAPI

You can also manage Pods programmatically via Yotta Labs’ OpenAPI.

API Reference

Tip: Always review the API documentation before calling endpoints to avoid common request errors (invalid parameters, insufficient balance, etc.).

🧱 Example Use Cases

  • Automated Pod Deployment via Python SDK

  • Monitoring Pod Logs using API polling

  • Scaling Workloads across multiple GPU types

  • Integrating with CI/CD to trigger training jobs automatically


🪄 Best Practices

  • Use Pause instead of Terminate for short-term downtime.

  • Monitor balance regularly to prevent auto-termination.

  • Always verify image compatibility (x86 / Ubuntu-based).

  • For debugging, prefer checking container logs first.


Last updated

Was this helpful?