AI Gateway
If you've ever wished you could try out the latest AI models — text generation, image synthesis, video creation — without having to juggle five different API keys, five different docs, and five differ
AI Gateway on Yottalabs is a unified API aggregator that brings together models from Google DeepMind, ByteDance, Z.AI, and more under one roof.

The category tabs at the top of the model list let you quickly filter models by type — choose from All, LLM, Text-To-Image, Text-To-Video, Image-To-Video, or browse by Publishers using the dropdown.


Simply click any tab to instantly narrow down the model catalog to what's relevant for your use case.
Nano Banana Pro
Google DeepMind Text-To-Image
Nano Banana Pro is Google DeepMind's text-to-image model built for precision and versatility. Where a lot of image models struggle with complex, multi-element prompts or lose consistency across styles, Nano Banana Pro holds its ground — structural details stay sharp, fine visual elements render cleanly, and it handles everything from cinematic portraits to marketing assets well.
🍌Playground

The Playground is your zero-setup sandbox. No API key configuration, no environment setup — just type a prompt and run it!
Prompt: Write in natural language. The more context you give it, the more controlled the output.
Template 1 — Portrait / Character
Example:
A cinematic photograph of a young black woman wearing a casual t-shirt and shorts, standing still with a colorful beach cocktail in hand, grinning broadly at the camera. The background features a sun-drenched Hawaiian beach scattered with tropical flowers and coconuts.
Lighting: bright, warm natural sunlight, creating an upbeat and joyful atmosphere.
Lens: wide-angle, capturing the full environment around the subject to amplify the sense of openness and surprise.
Color grade: oversaturated, vivid tropical palette — punchy greens, electric blues, warm yellows. High energy, vacation editorial style.

Template 2 — Scene / Environment
Example:
A photorealistic render of an abandoned greenhouse interior, during golden hour after light rain. Overgrown vines crawl across rusted iron frames, and puddles reflect warm light from broken roof panels. The atmosphere is quiet and melancholic. Color palette: amber, moss green, pale rust. Style reference: Gregory Crewdson.

Template 3 — Product / Marketing Visual
Example:
A clean overhead shot of a matte black coffee cup placed on a white marble surface. A single sprig of dried lavender rests beside it. Lighting: soft diffused natural light from the left. The overall tone is minimal and premium. No text, no watermark.

Quick Reference — Power Words by Category
CategoryOptionsPhoto style
cinematic, photorealistic, editorial, documentary, long-exposure
Lighting
golden hour, blue hour, hard rim light, soft diffused, neon-lit, candlelit
Mood
ethereal, gritty, melancholic, energetic, sterile, nostalgic
Color grade
desaturated, warm analog, high contrast B&W, teal & orange, pastel washed
Composition
wide establishing shot, tight close-up, bird's eye, Dutch angle, symmetrical
Advanced Settings: Expand this section to adjust parameters like output dimensions, sampling steps, or guidance scale, depending on what the provider exposes. Useful when you want to push quality or constrain style.
Aspect Ratio defines the shape of your image.
1:1for social square posts,9:16for mobile/Stories,16:9for presentations and banners,2:3for portrait editorial,21:9for cinematic widescreen, and several others in between. The selected ratio is highlighted in black — default is1:1.Resolution sets the output quality:
1K,2K, or4K. Higher resolution means more detail and larger file size. For quick prototyping, 1K is fine. For anything going into production — print, large-format display, or high-DPI screens — go 2K or 4K.Output Format is straightforward:
PNGfor lossless quality with transparency support,JPEGfor smaller file sizes when you don't need a transparent background. When in doubt, PNG is the safer default.

Run: Hit Run to generate.
Output & Download: Generated images render directly in the panel. Use the download icon in the top-right corner of the output to save your result locally.
Pricing: Cost is shown transparently beneath the output — currently $0.14 per image. What you see is what you pay.
🍌Providers

Different providers may vary in latency, throughput, or price. You don't need to manage any of this manually — Yotta Labs automatically routes your requests to the most suitable provider based on your prompt and parameters.
🍌API
If you're integrating Nano Banana Pro into your own application, the API tab is your starting point. Authentication is handled via an X-API-KEY header — grab your key from your account settings and you're good to go.
Copy the SDK Code
Head to the API tab on the Nano Banana Pro model page. Copy the full Python SDK code provided.
Save It Locally
Open any text editor (Notepad, VS Code, or anything you have on hand). Paste the code in, then make two edits before saving:
Replace
"MY API Key"with your actual API key from the Dashboard. See our official doc for API key hereReplace the default prompt with your own
Save the file as run.py in a folder of your choice, for example:
Run It from the Command Line
Open Command Prompt (search "cmd" in the Windows Start menu). Navigate to the folder where you saved the file, then run it:
You'll see status updates printed in the terminal as the job processes:

Copy the Output URL and Save Image
Once the job completes, the terminal prints a URL starting with https://. Select and copy the full URL.
Paste the URL into your browser and hit Enter. The image will load directly in the browser.

🤖 Claude Sonnet
Anthropic LLM
Claude Sonnet is Anthropic's balanced large language model — sharp enough for nuanced reasoning and long-document analysis, fast enough for real-time applications. It excels at structured outputs, multi-step instruction following, code generation, and conversational tasks where both quality and latency matter. Via AI Gateway, it's accessible through the same unified API surface as every other model, with no per-provider credential setup required.
🍌 Playground
The Playground lets you send chat messages and inspect Claude Sonnet's responses in real time — no API key, no environment setup.
Key input fields:
System Prompt
Set the model's role, behavior, and tone. Example: "You are a senior product manager. Be concise and use bullet points." Leave blank to use Claude's default assistant persona.
User Message
Your query or instruction. Multi-turn conversation is supported — previous turns are preserved in context automatically.
Max Tokens
Controls maximum response length. 256–512 for quick answers; 2048+ for long-form drafts or analysis.
Temperature
0 for deterministic, fact-grounded responses. 0.7–1.0 for more varied, creative outputs. Default: 1.0.
Prompt templates:
Example: Analyze the following product spec and return your response in the format above. Focus on technical feasibility and missing user stories. [Paste spec here]
Tip: For long document tasks, paste the full content into the User Message field. Claude Sonnet supports a 200K token context window — enough for most large files.
🍌 Providers
Requests to Claude Sonnet are routed through Anthropic's inference endpoints. AI Gateway handles authentication and rate limit management automatically — no Anthropic API key is required on your end.

🍌 API
Quickstart:
Get your
X-API-KEYfrom the DashboardCopy the SDK code from the API tab on the model page
Set your system prompt and user message
Run and parse the response
Authentication uses the same
X-API-KEYheader pattern as all other Gateway models. Setbase_urlto route through Yottalabs instead of Anthropic directly.
🎬 Kling v3 Standard
KlingAI Text-To-Video
Kling v3 Standard is KlingAI's text-to-video model designed for efficient, scalable video generation. It produces high-quality clips with stable motion and solid prompt adherence, while keeping generation speed and cost practical. The model supports common aspect ratios and flexible duration settings, making it well-suited for social media content, marketing materials, and everyday creative production.
🍌 Playground
Key settings:
Multi Shot
Toggle on to segment your prompt into multiple consecutive shots. Best for narrative or multi-scene descriptions. Toggle off for a single continuous clip.
Shot Type — Intelligence
The model interprets your prompt and autonomously assigns shot pacing, framing, and transitions. Recommended for most users.
Shot Type — Customize
Manually define shot type, camera movement, and duration per segment. For users who need precise directorial control.
Prompt
Describe scene, characters, action, camera movement, and dialogue. The more specific, the closer the output matches your intent.
Aspect Ratio
16:9 for landscape/YouTube, 9:16 for Reels/TikTok, 1:1 for square posts.
Duration
Output length in seconds. Cost scales directly with duration.
Prompt structure:
Example: European villa outdoor terrace scene. A dining table covered with a blue-and-white checkered tablecloth. A young white woman sits barefoot beside the table, wearing a blue-and-white striped short-sleeve shirt, khaki shorts, and a brown belt. Opposite her sits a young white man in a white T-shirt. The camera slowly pushes in. The woman gently swirls a glass of juice, gazing toward the distant woods, and says, "These trees will turn yellow in a month, won't they?"

Pricing:
No audio
$0.084 / s
$0.0714 / s
With audio
$0.126 / s
$0.1071 / s
🍌 Providers
Kling v3 Standard is served through KlingAI's inference endpoints. AI Gateway routes requests automatically and handles authentication on your behalf — no KlingAI account or API key is needed.
🍌 API
Submit your prompt and parameters → receive a job ID → poll the status endpoint → retrieve the completed video URL.
Video generation is asynchronous. The API returns a job ID on submission; poll the status endpoint until the state is
completed, then retrieve the video URL from the response.
🐴 HappyHorse-1.0
Alibaba Image-To-Video
HappyHorse-1.0 in Image-to-Video mode animates a single still image according to your text prompt. Upload a photo — product, portrait, scene — and describe the motion you want: camera pull-back, subject gesture, environmental movement. The model preserves visual consistency from the source image while introducing controlled, natural motion. Ideal for bringing static product shots, editorial photos, or concept renders to life.
🍌 Playground
Key settings:
Source Image (required)
Upload the still image you want to animate. Supports JPG and PNG. The model uses this as the visual anchor — output frames will closely match the source in color, composition, and subject identity.
Prompt
Describe the motion and camera behavior to apply. Focus on what should move and how — avoid re-describing what's already visible in the image.
Duration
Output clip length in seconds. 3–5s works best for subtle animations; longer clips benefit from more explicit motion descriptions.
Aspect Ratio
Defaults to match source image dimensions. Can be overridden in Advanced Settings if cropping is acceptable.
Prompt structure:
Example (product): The camera slowly orbits clockwise around the perfume bottle. Soft light catches the glass facets as they rotate. The background bokeh shifts gently. Elegant, luxury brand feel.
Example (portrait): Slight breeze lifts the subject's hair. Her gaze shifts subtly left. The camera holds steady with a very slow zoom in. Cinematic, golden hour warmth.

Tip: Do not describe the subject's appearance in the prompt — that's already defined by the source image. Use the prompt exclusively for motion direction and camera behavior.
🍌 Providers
HappyHorse inference is routed automatically through AI Gateway. No separate credentials required. The Providers tab shows available routing options and their current status.
🍌 API
The Image-to-Video API accepts a base64-encoded image or a publicly accessible image URL alongside the text prompt. The response is asynchronous — poll the job status endpoint to retrieve the final video URL once generation is complete.
✂️ WAN
Alibaba Video Edit
WAN is a video editing model that applies text-directed modifications to an existing video clip. Rather than generating from scratch, it takes your source footage and transforms specific visual elements based on your prompt — style transfer, subject re-clothing, background replacement, atmospheric changes, or motion enhancement. Source footage structure and timing are preserved; only the targeted visual elements are altered.
🍌 Playground
Key settings:
Source Video (required)
Upload the video clip you want to edit. The model preserves original motion, timing, and scene structure. Recommended: clean, well-lit footage with minimal camera shake for best edit fidelity.
Edit Prompt
Describe what you want changed, not what should stay the same. Be specific about the target element and the desired transformation.
Strength / Intensity
Controls how aggressively the edit is applied. Lower values make subtle adjustments; higher values apply stronger transformations that may diverge more from the source.
Preserve Motion
When enabled, edits are constrained to visual appearance only — subject motion trajectory is not altered. Recommended on for most editing tasks.
Prompt structure:
Example (style transfer): Re-render this clip in the style of a 1970s film — add grain, warm color cast, slight vignette, and soft focus on edges. Keep all motion and subject positioning unchanged.
Example (wardrobe edit): Change the subject's outfit to a tailored navy blue blazer and white dress shirt. Keep the background, lighting, and all motion exactly as in the original.
Example (background replacement): Replace the background with a snowy mountain landscape at dusk. Maintain the original foreground subject and all motion. Lighting on the subject should reflect the new environment.
Tip: WAN works best on clips under 30 seconds with a single dominant subject. For multi-scene videos, split into segments and process each separately for more consistent results.
🍌 Providers
WAN is served through its dedicated inference infrastructure, accessed via AI Gateway. No separate account or API key is required. Gateway automatically manages routing, retries, and load balancing.
🍌 API
The Video Edit API accepts the source video as a base64-encoded payload or a pre-signed URL. Submit the edit prompt and parameters; generation is asynchronous. Retrieve the edited video URL from the completion callback or by polling the job status endpoint.
🎯 HappyHorse-1.0
Alibaba Reference-To-Video
HappyHorse-1.0 in Reference-to-Video mode generates a new video from scratch using multiple reference images as visual anchors. Unlike Image-to-Video (which animates a single source), Reference-to-Video synthesizes an entirely new scene while maintaining identity and visual consistency across your uploaded references — matching subject appearance, garment details, accessory design, and stylistic elements simultaneously. Best suited for fashion, e-commerce, and branded content production.
🍌 Playground
Key settings:
Reference Images
Upload up to 3 reference images (Image1, Image2, Image3). Recommended: Image1 → primary subject or garment, Image2 → accessory or prop, Image3 → pose, gesture, or detail reference.
Prompt (with image tags)
Write a scene description and embed image tags inline to specify which reference corresponds to which element. Example: "A woman in a red cheongsam, [Image1]. Her tassel earrings, [Image3], sway as she unfolds a fan, [Image2]."
Image URL input
Reference images can also be supplied via public URLs instead of direct uploads — useful for assets already hosted in a CDN or library.
Duration & Aspect Ratio
Set in Advanced Settings. The generated video is a new synthesis — not constrained to the dimensions of any input image.
Prompt structure:
Example: A woman in a red cheongsam, [Image1]. The camera begins with a side medium shot, outlining the fitted tailoring and her elegant S-shaped silhouette. It then cuts to a low-angle shot, capturing the delicate movement of her tassel earrings, [Image3], swaying gently as she raises her hand and unfolds a folding fan, [Image2]. Finally, the camera pushes into a close-up of her face, lingering on the subtle charm in her fingertips lightly touching the fan ribs and the graceful flow of her gaze. Multiple perspectives comprehensively showcase the refined elegance and timeless oriental beauty.
Image tag placement guide:
Image1
Primary subject / garment
After first subject mention
Image2
Prop / accessory
When the prop is first described in action
Image3
Detail / gesture / pose
At the detail shot description
Important: Tag order in the prompt must match upload order in the Media panel. Mismatched ordering (e.g. Image1 tag referring to the Image3 upload) will produce inconsistent results. Always verify tag-to-image correspondence before running.
Pricing: Billed per request. A 5-second output costs approximately $1.08. The exact cost is shown beneath the output panel before and after generation.
🍌 Providers
HappyHorse Reference-to-Video requests are routed through AI Gateway's managed inference layer. No separate HappyHorse credentials needed. The Providers tab shows real-time routing status and available fallback endpoints.
🍌 API
Quickstart:
Get your
X-API-KEYfrom the DashboardUpload reference images or prepare public image URLs
Submit prompt string with
[Image1],[Image2],[Image3]tags embeddedPoll the returned job ID for completion status
Retrieve and download the video URL from the completed job response
The API accepts reference images as an array of base64 strings or public URLs, paired with the prompt. Generation is asynchronous — use the job ID returned on submission to poll for completion.
Last updated
Was this helpful?