# Generating Images with Z-Image Model Quickstart

### Prerequisite:

Create a pod with our Pytorch 2.9.0 template.

{% stepper %}
{% step %}

#### Install Dependencies

Make sure your jupyter notebook is located in `/workspace`

```bash
!git clone https://github.com/Tongyi-MAI/Z-Image.git
%pip install -e ./Z-Image
!pwd #This should give an outcome like: /workspace
```

{% endstep %}

{% step %}

#### See an example of inference code

You will find an example file at `workspace/Z-Image/inference.py`

Let's break it down and find out what it's doing!:face\_with\_monocle:

```python
import torch
from diffusers import ZImagePipeline
```

These bring in tools that helps computers work with AI models, especially for AI image generation.

```python
print("Loading Z-Image-Turbo model...")
pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")
print("Model loaded successfully!")
```

* `ZImagePipeline.from_pretrained()` - Loads the pre-trained model weights from the Hugging Face repository
  * `"Tongyi-MAI/Z-Image-Turbo"` - The model identifier/checkpoint location
  * `torch_dtype=torch.bfloat16` - Uses bfloat16 precision for reduced memory footprint while maintaining numerical stability
  * `low_cpu_mem_usage=False` - Disables CPU memory optimization, prioritizing loading speed
* `pipe.to("cuda")` - Transfers the model to GPU memory for hardware-accelerated inference

Basically, this part loads the model architecture and weights, then deploying it to the GPU for efficient processing

***

```python
prompt = "Young Chinese woman in red Hanfu, intricate embroidery..."
```

This prompt tells the model what kind of image we want to generate.

```python
print("Generating image...")
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]
```

* `pipe()` - Executes the diffusion process with specified parameters
  * `prompt=prompt` - Text conditioning input
  * `height=1024, width=1024` - Output resolution (how big the generate image will be )
  * `num_inference_steps=9` - Number of denoising iterations
  * `guidance_scale=0.0` - Classifier-free guidance weight; set to 0 for distilled Turbo models
  * `generator=torch.Generator("cuda").manual_seed(42)` - Ensures reproducibility by controlling the random number generation with a fixed seed of 42
* `.images[0]` - Extracts the first (and only) generated image from the batch

{% hint style="info" %}
Z-image Turbo models are optimized for denoising steps as few as 8 :tada:
{% endhint %}

This part runs the generative model through its diffusion process to synthesize an image matching your prompt.

***

```python
image.save("example.png")
print("Image saved as 'example.png'!")
```

Writes the generated image tensor to disk as a PNG file.
{% endstep %}

{% step %}

#### Run Inference

In our notebook, run:

```bash
!python Z-Image/inference.py
```

Now we successfully get an image named `example.png` generated by Z-image!
{% endstep %}

{% step %}

#### More to see: batch inference generation

If you look closer, you may find another file named `batch_inference.py` in Z-image repository.

Let's break down this batch inference script that generates multiple images from **a list of prompts**! 🎨

```python
import os
from pathlib import Path
import time
import torch
from inference import ensure_weights
from utils import AttentionBackend, load_from_local_dir, set_attention_backend
from zimage import generate
```

* `os` and `Path` - For file and directory operations
* `time` - To measure how long each image takes to generate
* `torch` - PyTorch for tensor operations and device management
* Custom imports from the Z-Image project:
  * `ensure_weights` - Downloads model weights if needed
  * `AttentionBackend` and related - Manages different attention computation methods
  * `generate` - The core image generation function

```python
def read_prompts(path: str) -> list[str]:
    """Read prompts from a text file (one per line, empty lines skipped)."""
    prompt_path = Path(path)
    if not prompt_path.exists():
        raise FileNotFoundError(f"Prompt file not found: {prompt_path}")
    
    with prompt_path.open("r", encoding="utf-8") as f:
        prompts = [line.strip() for line in f if line.strip()]
    
    if not prompts:
        raise ValueError(f"No prompts found in {prompt_path}")
    
    return prompts

PROMPTS = read_prompts(os.environ.get("PROMPTS_FILE", "prompts/prompt1.txt"))

```

* `read_prompts()` - Reads a text file where each line is a prompt
  * Skips empty lines
  * Strips whitespace from each line
  * Validates that the file exists and contains prompts
* `PROMPTS` - Loads prompts from a file specified by the `PROMPTS_FILE` environment variable, defaulting to `"prompts/prompt1.txt"`

**Example prompts.txt:**

```
A serene mountain landscape at sunset
Futuristic city with flying cars
Portrait of a wise old wizard
```

***

```python
def slugify(text: str, max_len: int = 60) -> str:
    """Create a filesystem-safe slug from the prompt."""
    slug = "".join(ch.lower() if ch.isalnum() else "-" for ch in text)
    slug = "-".join(part for part in slug.split("-") if part)
    return slug[:max_len].rstrip("-") or "prompt"
```

* Converts prompts into safe filenames by:
  * Converting to lowercase
  * Replacing non-alphanumeric characters with hyphens
  * Removing consecutive hyphens
  * Limiting to 60 characters

**Example:**

* Input: `"A serene mountain landscape at sunset!"`
* Output: `"a-serene-mountain-landscape-at-sunset"`

***

```python
def select_device() -> str:
    """Choose the best available device without repeating detection logic."""
    if torch.cuda.is_available():
        print("Chosen device: cuda")
        return "cuda"
    
    try:
        import torch_xla.core.xla_model as xm
        device = xm.xla_device()
        print("Chosen device: tpu")
        return device
    except (ImportError, RuntimeError):
        if torch.backends.mps.is_available():
            print("Chosen device: mps")
            return "mps"
        
        print("Chosen device: cpu")
        return "cpu"
```

* Automatically detects and selects the best available hardware in priority order:
  1. **CUDA** (NVIDIA GPU) - Fastest option
  2. **TPU** (Google Tensor Processing Unit) - For cloud environments
  3. **MPS** (Apple Metal Performance Shaders) - For Mac M1/M2/M3
  4. **CPU** - Fallback option (slowest)

***

```python
def main():
    model_path = ensure_weights("ckpts/Z-Image-Turbo")
    dtype = torch.bfloat16
    compile = False
    height = 1024
    width = 1024
    num_inference_steps = 8
    guidance_scale = 0.0
    attn_backend = os.environ.get("ZIMAGE_ATTENTION", "_native_flash")
    output_dir = Path("outputs")
    output_dir.mkdir(exist_ok=True)
```

* **Model setup:**
  * `ensure_weights()` - Downloads model if not present, returns path
  * `dtype = torch.bfloat16` - Memory-efficient precision
  * `compile = False` - Disables PyTorch 2.0 compilation (can enable for speed)
* **Generation parameters:**
  * `height/width = 1024` - Square 1K resolution
  * `num_inference_steps = 8` - Optimized for Turbo model
  * `guidance_scale = 0.0` - Required for Turbo (guidance pre-baked)
* **Backend configuration:**
  * `attn_backend` - Attention mechanism (Flash Attention by default)
  * `output_dir` - Creates "outputs" folder for saving images

***

```python
    device = select_device()
    components = load_from_local_dir(model_path, device=device, dtype=dtype, compile=compile)
    
    AttentionBackend.print_available_backends()
    set_attention_backend(attn_backend)
    print(f"Chosen attention backend: {attn_backend}")
```

* Selects the optimal device (GPU/TPU/CPU)
* Loads all model components (transformer, VAE, text encoder, etc.)
* Configures attention backend for performance optimization
* Flash Attention is faster and more memory-efficient than standard attention

***

```python
    for idx, prompt in enumerate(PROMPTS, start=1):
        output_path = output_dir / f"prompt-{idx:02d}-{slugify(prompt)}.png"
        seed = 42 + idx - 1
        generator = torch.Generator(device).manual_seed(seed)
        
        start_time = time.time()
        images = generate(
            prompt=prompt,
            **components,
            height=height,
            width=width,
            num_inference_steps=num_inference_steps,
            guidance_scale=guidance_scale,
            generator=generator,
        )
        elapsed = time.time() - start_time
        
        images[0].save(output_path)
        print(f"[{idx}/{len(PROMPTS)}] Saved {output_path} in {elapsed:.2f} seconds")
    
    print("Done.")

```

{% endstep %}

{% step %}

#### **Run the batch inference script**

```bash
!python Z-Image/batch_inference.py
```

{% endstep %}
{% endstepper %}

***

### 🎨 Customizing Your Generation

#### Change Image Size

```python
image = pipe(
    prompt=prompt,
    height=768,   # Adjust height
    width=768,    # Adjust width
    ...
).images[0]
```

#### Adjust Quality vs Speed

```python
# Faster generation (lower quality)
num_inference_steps=5

# Higher quality (slower generation)
num_inference_steps=15
```

#### Use Different Seeds

```python
# For reproducible results
generator=torch.Generator("cuda").manual_seed(42)

# For random results
generator=torch.Generator("cuda").manual_seed(torch.randint(0, 1000000, (1,)).item())
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.yottalabs.ai/tutorials/image-and-video-generation/generating-images-with-z-image-model-quickstart.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
