door-closedAI Gateway

If you've ever wished you could try out the latest AI models — text generation, image synthesis, video creation — without having to juggle five different API keys, five different docs, and five differ

AI Gateway on Yottalabs is a unified API aggregator that brings together models from Google DeepMind, ByteDance, Z.AI, and more under one roof.

The category tabs at the top of the model list let you quickly filter models by type — choose from All, LLM, Text-To-Image, Text-To-Video, Image-To-Video, or browse by Publishers using the dropdown.

Simply click any tab to instantly narrow down the model catalog to what's relevant for your use case.

Nano Banana Pro

Google DeepMind Text-To-Image

Nano Banana Pro is Google DeepMind's text-to-image model built for precision and versatility. Where a lot of image models struggle with complex, multi-element prompts or lose consistency across styles, Nano Banana Pro holds its ground — structural details stay sharp, fine visual elements render cleanly, and it handles everything from cinematic portraits to marketing assets well.


🍌Playground

The Playground is your zero-setup sandbox. No API key configuration, no environment setup — just type a prompt and run it!

  • Prompt: Write in natural language. The more context you give it, the more controlled the output.

    • Template 1 — Portrait / Character

      Example:

      A cinematic photograph of a young black woman wearing a casual t-shirt and shorts, standing still with a colorful beach cocktail in hand, grinning broadly at the camera. The background features a sun-drenched Hawaiian beach scattered with tropical flowers and coconuts.

      Lighting: bright, warm natural sunlight, creating an upbeat and joyful atmosphere.

      Lens: wide-angle, capturing the full environment around the subject to amplify the sense of openness and surprise.

      Color grade: oversaturated, vivid tropical palette — punchy greens, electric blues, warm yellows. High energy, vacation editorial style.

    • Template 2 — Scene / Environment

      Example:

      A photorealistic render of an abandoned greenhouse interior, during golden hour after light rain. Overgrown vines crawl across rusted iron frames, and puddles reflect warm light from broken roof panels. The atmosphere is quiet and melancholic. Color palette: amber, moss green, pale rust. Style reference: Gregory Crewdson.

    • Template 3 — Product / Marketing Visual

      Example:

      A clean overhead shot of a matte black coffee cup placed on a white marble surface. A single sprig of dried lavender rests beside it. Lighting: soft diffused natural light from the left. The overall tone is minimal and premium. No text, no watermark.

      Quick Reference — Power Words by Category

      Category
      Options

      Photo style

      cinematic, photorealistic, editorial, documentary, long-exposure

      Lighting

      golden hour, blue hour, hard rim light, soft diffused, neon-lit, candlelit

      Mood

      ethereal, gritty, melancholic, energetic, sterile, nostalgic

      Color grade

      desaturated, warm analog, high contrast B&W, teal & orange, pastel washed

      Composition

      wide establishing shot, tight close-up, bird's eye, Dutch angle, symmetrical

  • Advanced Settings: Expand this section to adjust parameters like output dimensions, sampling steps, or guidance scale, depending on what the provider exposes. Useful when you want to push quality or constrain style.

    • Aspect Ratio defines the shape of your image. 1:1 for social square posts, 9:16 for mobile/Stories, 16:9 for presentations and banners, 2:3 for portrait editorial, 21:9 for cinematic widescreen, and several others in between. The selected ratio is highlighted in black — default is 1:1.

    • Resolution sets the output quality: 1K, 2K, or 4K. Higher resolution means more detail and larger file size. For quick prototyping, 1K is fine. For anything going into production — print, large-format display, or high-DPI screens — go 2K or 4K.

    • Output Format is straightforward: PNG for lossless quality with transparency support, JPEG for smaller file sizes when you don't need a transparent background. When in doubt, PNG is the safer default.

  • Run: Hit Run to generate.

  • Output & Download: Generated images render directly in the panel. Use the download icon in the top-right corner of the output to save your result locally.

  • Pricing: Cost is shown transparently beneath the output — currently $0.14 per image. What you see is what you pay.


🍌Providers

Different providers may vary in latency, throughput, or price. You don't need to manage any of this manually — Yotta Labs automatically routes your requests to the most suitable provider based on your prompt and parameters.


🍌API

If you're integrating Nano Banana Pro into your own application, the API tab is your starting point. Authentication is handled via an X-API-KEY header — grab your key from your account settings and you're good to go.

1

Copy the SDK Code

Head to the API tab on the Nano Banana Pro model page. Copy the full Python SDK code provided.

2

Save It Locally

Open any text editor (Notepad, VS Code, or anything you have on hand). Paste the code in, then make two edits before saving:

  • Replace "MY API Key" with your actual API key from the Dashboard. See our official doc for API key herearrow-up-right

  • Replace the default prompt with your own

Save the file as run.py in a folder of your choice, for example:

3

Run It from the Command Line

Open Command Prompt (search "cmd" in the Windows Start menu). Navigate to the folder where you saved the file, then run it:

You'll see status updates printed in the terminal as the job processes:

4

Copy the Output URL and Save Image

Once the job completes, the terminal prints a URL starting with https://. Select and copy the full URL.

Paste the URL into your browser and hit Enter. The image will load directly in the browser.

🤖 Claude Sonnet

Anthropic LLM

Claude Sonnet is Anthropic's balanced large language model — sharp enough for nuanced reasoning and long-document analysis, fast enough for real-time applications. It excels at structured outputs, multi-step instruction following, code generation, and conversational tasks where both quality and latency matter. Via AI Gateway, it's accessible through the same unified API surface as every other model, with no per-provider credential setup required.


🍌 Playground

The Playground lets you send chat messages and inspect Claude Sonnet's responses in real time — no API key, no environment setup.

Key input fields:

Field
Description

System Prompt

Set the model's role, behavior, and tone. Example: "You are a senior product manager. Be concise and use bullet points." Leave blank to use Claude's default assistant persona.

User Message

Your query or instruction. Multi-turn conversation is supported — previous turns are preserved in context automatically.

Max Tokens

Controls maximum response length. 256–512 for quick answers; 2048+ for long-form drafts or analysis.

Temperature

0 for deterministic, fact-grounded responses. 0.7–1.0 for more varied, creative outputs. Default: 1.0.

Prompt templates:

Example: Analyze the following product spec and return your response in the format above. Focus on technical feasibility and missing user stories. [Paste spec here]

Tip: For long document tasks, paste the full content into the User Message field. Claude Sonnet supports a 200K token context window — enough for most large files.


🍌 Providers

Requests to Claude Sonnet are routed through Anthropic's inference endpoints. AI Gateway handles authentication and rate limit management automatically — no Anthropic API key is required on your end.


🍌 API

Quickstart:

  1. Get your X-API-KEY from the Dashboard

  2. Copy the SDK code from the API tab on the model page

  3. Set your system prompt and user message

  4. Run and parse the response

Authentication uses the same X-API-KEY header pattern as all other Gateway models. Set base_url to route through Yottalabs instead of Anthropic directly.


🎬 Kling v3 Standard

KlingAI Text-To-Video

Kling v3 Standard is KlingAI's text-to-video model designed for efficient, scalable video generation. It produces high-quality clips with stable motion and solid prompt adherence, while keeping generation speed and cost practical. The model supports common aspect ratios and flexible duration settings, making it well-suited for social media content, marketing materials, and everyday creative production.


🍌 Playground

Key settings:

Setting
Description

Multi Shot

Toggle on to segment your prompt into multiple consecutive shots. Best for narrative or multi-scene descriptions. Toggle off for a single continuous clip.

Shot Type — Intelligence

The model interprets your prompt and autonomously assigns shot pacing, framing, and transitions. Recommended for most users.

Shot Type — Customize

Manually define shot type, camera movement, and duration per segment. For users who need precise directorial control.

Prompt

Describe scene, characters, action, camera movement, and dialogue. The more specific, the closer the output matches your intent.

Aspect Ratio

16:9 for landscape/YouTube, 9:16 for Reels/TikTok, 1:1 for square posts.

Duration

Output length in seconds. Cost scales directly with duration.

Prompt structure:

Example: European villa outdoor terrace scene. A dining table covered with a blue-and-white checkered tablecloth. A young white woman sits barefoot beside the table, wearing a blue-and-white striped short-sleeve shirt, khaki shorts, and a brown belt. Opposite her sits a young white man in a white T-shirt. The camera slowly pushes in. The woman gently swirls a glass of juice, gazing toward the distant woods, and says, "These trees will turn yellow in a month, won't they?"

Pricing:

Mode
Standard rate
Gateway rate

No audio

$0.084 / s

$0.0714 / s

With audio

$0.126 / s

$0.1071 / s


🍌 Providers

Kling v3 Standard is served through KlingAI's inference endpoints. AI Gateway routes requests automatically and handles authentication on your behalf — no KlingAI account or API key is needed.


🍌 API

Submit your prompt and parameters → receive a job ID → poll the status endpoint → retrieve the completed video URL.

Video generation is asynchronous. The API returns a job ID on submission; poll the status endpoint until the state is completed, then retrieve the video URL from the response.


🐴 HappyHorse-1.0

Alibaba Image-To-Video

HappyHorse-1.0 in Image-to-Video mode animates a single still image according to your text prompt. Upload a photo — product, portrait, scene — and describe the motion you want: camera pull-back, subject gesture, environmental movement. The model preserves visual consistency from the source image while introducing controlled, natural motion. Ideal for bringing static product shots, editorial photos, or concept renders to life.


🍌 Playground

Key settings:

Setting
Description

Source Image (required)

Upload the still image you want to animate. Supports JPG and PNG. The model uses this as the visual anchor — output frames will closely match the source in color, composition, and subject identity.

Prompt

Describe the motion and camera behavior to apply. Focus on what should move and how — avoid re-describing what's already visible in the image.

Duration

Output clip length in seconds. 3–5s works best for subtle animations; longer clips benefit from more explicit motion descriptions.

Aspect Ratio

Defaults to match source image dimensions. Can be overridden in Advanced Settings if cropping is acceptable.

Prompt structure:

Example (product): The camera slowly orbits clockwise around the perfume bottle. Soft light catches the glass facets as they rotate. The background bokeh shifts gently. Elegant, luxury brand feel.

Example (portrait): Slight breeze lifts the subject's hair. Her gaze shifts subtly left. The camera holds steady with a very slow zoom in. Cinematic, golden hour warmth.

Tip: Do not describe the subject's appearance in the prompt — that's already defined by the source image. Use the prompt exclusively for motion direction and camera behavior.


🍌 Providers

HappyHorse inference is routed automatically through AI Gateway. No separate credentials required. The Providers tab shows available routing options and their current status.


🍌 API

The Image-to-Video API accepts a base64-encoded image or a publicly accessible image URL alongside the text prompt. The response is asynchronous — poll the job status endpoint to retrieve the final video URL once generation is complete.


✂️ WAN

Alibaba Video Edit

WAN is a video editing model that applies text-directed modifications to an existing video clip. Rather than generating from scratch, it takes your source footage and transforms specific visual elements based on your prompt — style transfer, subject re-clothing, background replacement, atmospheric changes, or motion enhancement. Source footage structure and timing are preserved; only the targeted visual elements are altered.


🍌 Playground

Key settings:

Setting
Description

Source Video (required)

Upload the video clip you want to edit. The model preserves original motion, timing, and scene structure. Recommended: clean, well-lit footage with minimal camera shake for best edit fidelity.

Edit Prompt

Describe what you want changed, not what should stay the same. Be specific about the target element and the desired transformation.

Strength / Intensity

Controls how aggressively the edit is applied. Lower values make subtle adjustments; higher values apply stronger transformations that may diverge more from the source.

Preserve Motion

When enabled, edits are constrained to visual appearance only — subject motion trajectory is not altered. Recommended on for most editing tasks.

Prompt structure:

Example (style transfer): Re-render this clip in the style of a 1970s film — add grain, warm color cast, slight vignette, and soft focus on edges. Keep all motion and subject positioning unchanged.

Example (wardrobe edit): Change the subject's outfit to a tailored navy blue blazer and white dress shirt. Keep the background, lighting, and all motion exactly as in the original.

Example (background replacement): Replace the background with a snowy mountain landscape at dusk. Maintain the original foreground subject and all motion. Lighting on the subject should reflect the new environment.

Tip: WAN works best on clips under 30 seconds with a single dominant subject. For multi-scene videos, split into segments and process each separately for more consistent results.


🍌 Providers

WAN is served through its dedicated inference infrastructure, accessed via AI Gateway. No separate account or API key is required. Gateway automatically manages routing, retries, and load balancing.


🍌 API

The Video Edit API accepts the source video as a base64-encoded payload or a pre-signed URL. Submit the edit prompt and parameters; generation is asynchronous. Retrieve the edited video URL from the completion callback or by polling the job status endpoint.


🎯 HappyHorse-1.0

Alibaba Reference-To-Video

HappyHorse-1.0 in Reference-to-Video mode generates a new video from scratch using multiple reference images as visual anchors. Unlike Image-to-Video (which animates a single source), Reference-to-Video synthesizes an entirely new scene while maintaining identity and visual consistency across your uploaded references — matching subject appearance, garment details, accessory design, and stylistic elements simultaneously. Best suited for fashion, e-commerce, and branded content production.


🍌 Playground

Key settings:

Setting
Description

Reference Images

Upload up to 3 reference images (Image1, Image2, Image3). Recommended: Image1 → primary subject or garment, Image2 → accessory or prop, Image3 → pose, gesture, or detail reference.

Prompt (with image tags)

Write a scene description and embed image tags inline to specify which reference corresponds to which element. Example: "A woman in a red cheongsam, [Image1]. Her tassel earrings, [Image3], sway as she unfolds a fan, [Image2]."

Image URL input

Reference images can also be supplied via public URLs instead of direct uploads — useful for assets already hosted in a CDN or library.

Duration & Aspect Ratio

Set in Advanced Settings. The generated video is a new synthesis — not constrained to the dimensions of any input image.

Prompt structure:

Example: A woman in a red cheongsam, [Image1]. The camera begins with a side medium shot, outlining the fitted tailoring and her elegant S-shaped silhouette. It then cuts to a low-angle shot, capturing the delicate movement of her tassel earrings, [Image3], swaying gently as she raises her hand and unfolds a folding fan, [Image2]. Finally, the camera pushes into a close-up of her face, lingering on the subtle charm in her fingertips lightly touching the fan ribs and the graceful flow of her gaze. Multiple perspectives comprehensively showcase the refined elegance and timeless oriental beauty.

Image tag placement guide:

Tag
Recommended role
Where to place in prompt

Image1

Primary subject / garment

After first subject mention

Image2

Prop / accessory

When the prop is first described in action

Image3

Detail / gesture / pose

At the detail shot description

Important: Tag order in the prompt must match upload order in the Media panel. Mismatched ordering (e.g. Image1 tag referring to the Image3 upload) will produce inconsistent results. Always verify tag-to-image correspondence before running.

Pricing: Billed per request. A 5-second output costs approximately $1.08. The exact cost is shown beneath the output panel before and after generation.


🍌 Providers

HappyHorse Reference-to-Video requests are routed through AI Gateway's managed inference layer. No separate HappyHorse credentials needed. The Providers tab shows real-time routing status and available fallback endpoints.


🍌 API

Quickstart:

  1. Get your X-API-KEY from the Dashboard

  2. Upload reference images or prepare public image URLs

  3. Submit prompt string with [Image1], [Image2], [Image3] tags embedded

  4. Poll the returned job ID for completion status

  5. Retrieve and download the video URL from the completed job response

The API accepts reference images as an array of base64 strings or public URLs, paired with the prompt. Generation is asynchronous — use the job ID returned on submission to poll for completion.

Last updated

Was this helpful?