Sparse-VideoGen on NVIDIA H200

Based on Sparse VideoGen2 (Xi et al., NeurIPS 2025 Spotlight).

Sparse VideoGen 2 (SVG2) is a training-free inference acceleration framework for video diffusion transformers. Rather than changing model weights, it exploits the inherent sparsity in 3D full attention — identifying which tokens actually matter via semantic-aware sparse attention and flash k-means clustering — to deliver roughly 2× end-to-end speedup with minimal visual quality loss. It supports HunyuanVideo and Wan 2.1 (T2V and I2V), all of which fit on H200 GPU.


Deploy a Pod

Log in to the Yotta Labs Console, go to Compute → Pods, and deploy a new Pod with the following settings:

Setting
Value

GPU

H200

Template

pytorch (CUDA 12.8)

System Volume

150 GB minimum

Once the Pod is Running, click Connect and open a terminal via File → New → Terminal in JupyterLab.


Environment Setup

Clone the repo with GIT_LFS_SKIP_SMUDGE=1 to skip the large demo assets:

cd ~
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/svg-project/Sparse-VideoGen.git
cd ~/Sparse-VideoGen

Create the conda environment and install the base package. SVG2's pyproject.toml uses hatchling with editable installs, so editables needs to be present before anything else:

conda create -n SVG python=3.12.9 -y
conda activate SVG

python -m ensurepip --upgrade
python -m pip install --upgrade pip

pip install editables hatchling
pip install -e .
pip install flash-attn --no-build-isolation

Install the pinned versions of diffusers and transformers. SVG2 requires diffusers==0.34.0, and that version is only compatible with transformers<5.0 — specifically transformers==4.49.0. Installing either package at the wrong version will cause import errors at runtime:

Install the remaining runtime dependencies:

Pull down the git submodules. Cutlass is large so this may take a few minutes:

Build the customized attention kernels. The conda environment injects its own C++ compiler via NVCC_PREPEND_FLAGS, which breaks CUDA header detection — unset it and point cmake explicitly at the system CUDA installation before building:

Install FlashInfer from the submodule. The correct path is svg/kernels/3rdparty/flashinfer — not 3rdparty/flashinfer as listed in the upstream README:

Finally, install cuVS:

You may see dependency conflict warnings about cuda-toolkit and nvidia-nvjitlink-cu12 from libcuvs. These are warnings, not errors, and do not affect inference.


Download Model Checkpoints

Set your Hugging Face token:

Wan 2.1 T2V (~54 GB). Use the official Wan-AI repo — other forks may have incompatible checkpoint formats:

Verify the transformer directory contains a shard index file:

Wan 2.1 I2V:

HunyuanVideo:

The inference scripts default to loading models directly from HuggingFace by repo ID. Update them to use your local paths instead:

Verify:


Run Video Generation

The default prompt is loaded from examples/1/prompt.txt. Edit it to change the prompt, or change prompt_id in the script to use a different example. Always run from the project root.

Wan 2.1 Text-to-Video:

Default prompts🔽

Wan 2.1 Image-to-Video:

Default prompts🔽

HunyuanVideo Text-to-Video:

Default prompts🔽

When running correctly, you should see Attention processors replaced with SAP pattern. followed by Centroids initialized at layer N as the k-means attention warms up. Output videos are saved under result/ with a nested directory structure that encodes the generation config.


View Output Videos

To play them inline in a JupyterLab notebook cell:

If the cell hangs on large files, interrupt the kernel and open the video directly from the JupyterLab file browser instead.

365KB
Open

Want to Use Your Own Prompt/Image?

Each inference script loads its prompt from a text file under examples/. The file used is determined by the prompt_id variable at the top of the script:

Script

Default prompt_id

Prompt file

Image file

wan_t2v_720p_sap.sh

1

examples/1/prompt.txt

wan_i2v_720p_sap.sh

1

examples/1/prompt.txt

examples/1/image.jpg

hyvideo_t2v_720p_sap.sh

7

examples/7/prompt.txt

You can also confirm which prompt is being used at runtime — it is printed to the terminal at the start of each run.

To use your own prompt, create a new example directory and write your prompt into it:

For I2V, also provide an input image:

Then open the script and change prompt_id to match your new directory:

Run as usual and the output filename will reflect your prompt_id, e.g. 8-0.mp4.


Additional Resources

Last updated

Was this helpful?