Generating Images with Z-image Model Quickstart

Prerequisite:

Create a pod with our Pytorch 2.9.0 template.

1

Install Dependencies

Make sure your jupyter notebook is located in /workspace

!git clone https://github.com/Tongyi-MAI/Z-Image.git
%pip install -e ./Z-Image
!pwd #This should give an outcome like: /workspace
2

See an example of inference code

You will find an example file at workspace/Z-Image/inference.py

Let's break it down and find out what it's doing!🧐

import torch
from diffusers import ZImagePipeline

These bring in tools that helps computers work with AI models, especially for AI image generation.

print("Loading Z-Image-Turbo model...")
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")
print("Model loaded successfully!")
  • ZImagePipeline.from_pretrained() - Loads the pre-trained model weights from the Hugging Face repository

    • "Tongyi-MAI/Z-Image-Turbo" - The model identifier/checkpoint location

    • torch_dtype=torch.bfloat16 - Uses bfloat16 precision for reduced memory footprint while maintaining numerical stability

    • low_cpu_mem_usage=False - Disables CPU memory optimization, prioritizing loading speed

  • pipe.to("cuda") - Transfers the model to GPU memory for hardware-accelerated inference

Basically, this part loads the model architecture and weights, then deploying it to the GPU for efficient processing


prompt = "Young Chinese woman in red Hanfu, intricate embroidery..."

This prompt tells the model what kind of image we want to generate.

print("Generating image...")
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]
  • pipe() - Executes the diffusion process with specified parameters

    • prompt=prompt - Text conditioning input

    • height=1024, width=1024 - Output resolution (how big the generate image will be )

    • num_inference_steps=9 - Number of denoising iterations

    • guidance_scale=0.0 - Classifier-free guidance weight; set to 0 for distilled Turbo models

    • generator=torch.Generator("cuda").manual_seed(42) - Ensures reproducibility by controlling the random number generation with a fixed seed of 42

  • .images[0] - Extracts the first (and only) generated image from the batch

circle-info

Z-image Turbo models are optimized for denoising steps as few as 8 🎉

This part runs the generative model through its diffusion process to synthesize an image matching your prompt.


image.save("example.png")
print("Image saved as 'example.png'!")

Writes the generated image tensor to disk as a PNG file.

3

Run Inference

In our notebook, run:

!python Z-Image/inference.py

Now we successfully get an image named example.png generated by Z-image!

4

More to see: batch inference generation

If you look closer, you may find another file named batch_inference.py in Z-image repository.

Let's break down this batch inference script that generates multiple images from a list of prompts! 🎨

import os
from pathlib import Path
import time
import torch
from inference import ensure_weights
from utils import AttentionBackend, load_from_local_dir, set_attention_backend
from zimage import generate
  • os and Path - For file and directory operations

  • time - To measure how long each image takes to generate

  • torch - PyTorch for tensor operations and device management

  • Custom imports from the Z-Image project:

    • ensure_weights - Downloads model weights if needed

    • AttentionBackend and related - Manages different attention computation methods

    • generate - The core image generation function

def read_prompts(path: str) -> list[str]:
    """Read prompts from a text file (one per line, empty lines skipped)."""
    prompt_path = Path(path)
    if not prompt_path.exists():
        raise FileNotFoundError(f"Prompt file not found: {prompt_path}")
    
    with prompt_path.open("r", encoding="utf-8") as f:
        prompts = [line.strip() for line in f if line.strip()]
    
    if not prompts:
        raise ValueError(f"No prompts found in {prompt_path}")
    
    return prompts

PROMPTS = read_prompts(os.environ.get("PROMPTS_FILE", "prompts/prompt1.txt"))
  • read_prompts() - Reads a text file where each line is a prompt

    • Skips empty lines

    • Strips whitespace from each line

    • Validates that the file exists and contains prompts

  • PROMPTS - Loads prompts from a file specified by the PROMPTS_FILE environment variable, defaulting to "prompts/prompt1.txt"

Example prompts.txt:

A serene mountain landscape at sunset
Futuristic city with flying cars
Portrait of a wise old wizard

def slugify(text: str, max_len: int = 60) -> str:
    """Create a filesystem-safe slug from the prompt."""
    slug = "".join(ch.lower() if ch.isalnum() else "-" for ch in text)
    slug = "-".join(part for part in slug.split("-") if part)
    return slug[:max_len].rstrip("-") or "prompt"
  • Converts prompts into safe filenames by:

    • Converting to lowercase

    • Replacing non-alphanumeric characters with hyphens

    • Removing consecutive hyphens

    • Limiting to 60 characters

Example:

  • Input: "A serene mountain landscape at sunset!"

  • Output: "a-serene-mountain-landscape-at-sunset"


def select_device() -> str:
    """Choose the best available device without repeating detection logic."""
    if torch.cuda.is_available():
        print("Chosen device: cuda")
        return "cuda"
    
    try:
        import torch_xla.core.xla_model as xm
        device = xm.xla_device()
        print("Chosen device: tpu")
        return device
    except (ImportError, RuntimeError):
        if torch.backends.mps.is_available():
            print("Chosen device: mps")
            return "mps"
        
        print("Chosen device: cpu")
        return "cpu"
  • Automatically detects and selects the best available hardware in priority order:

    1. CUDA (NVIDIA GPU) - Fastest option

    2. TPU (Google Tensor Processing Unit) - For cloud environments

    3. MPS (Apple Metal Performance Shaders) - For Mac M1/M2/M3

    4. CPU - Fallback option (slowest)


def main():
    model_path = ensure_weights("ckpts/Z-Image-Turbo")
    dtype = torch.bfloat16
    compile = False
    height = 1024
    width = 1024
    num_inference_steps = 8
    guidance_scale = 0.0
    attn_backend = os.environ.get("ZIMAGE_ATTENTION", "_native_flash")
    output_dir = Path("outputs")
    output_dir.mkdir(exist_ok=True)
  • Model setup:

    • ensure_weights() - Downloads model if not present, returns path

    • dtype = torch.bfloat16 - Memory-efficient precision

    • compile = False - Disables PyTorch 2.0 compilation (can enable for speed)

  • Generation parameters:

    • height/width = 1024 - Square 1K resolution

    • num_inference_steps = 8 - Optimized for Turbo model

    • guidance_scale = 0.0 - Required for Turbo (guidance pre-baked)

  • Backend configuration:

    • attn_backend - Attention mechanism (Flash Attention by default)

    • output_dir - Creates "outputs" folder for saving images


    device = select_device()
    components = load_from_local_dir(model_path, device=device, dtype=dtype, compile=compile)
    
    AttentionBackend.print_available_backends()
    set_attention_backend(attn_backend)
    print(f"Chosen attention backend: {attn_backend}")
  • Selects the optimal device (GPU/TPU/CPU)

  • Loads all model components (transformer, VAE, text encoder, etc.)

  • Configures attention backend for performance optimization

  • Flash Attention is faster and more memory-efficient than standard attention


    for idx, prompt in enumerate(PROMPTS, start=1):
        output_path = output_dir / f"prompt-{idx:02d}-{slugify(prompt)}.png"
        seed = 42 + idx - 1
        generator = torch.Generator(device).manual_seed(seed)
        
        start_time = time.time()
        images = generate(
            prompt=prompt,
            **components,
            height=height,
            width=width,
            num_inference_steps=num_inference_steps,
            guidance_scale=guidance_scale,
            generator=generator,
        )
        elapsed = time.time() - start_time
        
        images[0].save(output_path)
        print(f"[{idx}/{len(PROMPTS)}] Saved {output_path} in {elapsed:.2f} seconds")
    
    print("Done.")
5

Run the batch inference script

!python Z-Image/batch_inference.py

🎨 Customizing Your Generation

Change Image Size

image = pipe(
    prompt=prompt,
    height=768,   # Adjust height
    width=768,    # Adjust width
    ...
).images[0]

Adjust Quality vs Speed

# Faster generation (lower quality)
num_inference_steps=5

# Higher quality (slower generation)
num_inference_steps=15

Use Different Seeds

Last updated

Was this helpful?