Sparse-VideoGen on NVIDIA H200
Based on Sparse VideoGen2 (Xi et al., NeurIPS 2025 Spotlight).
Sparse VideoGen 2 (SVG2) is a training-free inference acceleration framework for video diffusion transformers. Rather than changing model weights, it exploits the inherent sparsity in 3D full attention — identifying which tokens actually matter via semantic-aware sparse attention and flash k-means clustering — to deliver roughly 2× end-to-end speedup with minimal visual quality loss. It supports HunyuanVideo and Wan 2.1 (T2V and I2V), all of which fit on H200 GPU.
Deploy a Pod
Log in to the Yotta Labs Console, go to Compute → Pods, and deploy a new Pod with the following settings:
GPU
H200
Template
pytorch (CUDA 12.8)
System Volume
150 GB minimum
Once the Pod is Running, click Connect and open a terminal via File → New → Terminal in JupyterLab.
Environment Setup
Clone the repo with GIT_LFS_SKIP_SMUDGE=1 to skip the large demo assets:
cd ~
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/svg-project/Sparse-VideoGen.git
cd ~/Sparse-VideoGenCreate the conda environment and install the base package. SVG2's pyproject.toml uses hatchling with editable installs, so editables needs to be present before anything else:
conda create -n SVG python=3.12.9 -y
conda activate SVG
python -m ensurepip --upgrade
python -m pip install --upgrade pip
pip install editables hatchling
pip install -e .
pip install flash-attn --no-build-isolationInstall the pinned versions of diffusers and transformers. SVG2 requires diffusers==0.34.0, and that version is only compatible with transformers<5.0 — specifically transformers==4.49.0. Installing either package at the wrong version will cause import errors at runtime:
Install the remaining runtime dependencies:
Pull down the git submodules. Cutlass is large so this may take a few minutes:
Build the customized attention kernels. The conda environment injects its own C++ compiler via NVCC_PREPEND_FLAGS, which breaks CUDA header detection — unset it and point cmake explicitly at the system CUDA installation before building:
Install FlashInfer from the submodule. The correct path is svg/kernels/3rdparty/flashinfer — not 3rdparty/flashinfer as listed in the upstream README:
Finally, install cuVS:
You may see dependency conflict warnings about
cuda-toolkitandnvidia-nvjitlink-cu12fromlibcuvs. These are warnings, not errors, and do not affect inference.
Download Model Checkpoints
Set your Hugging Face token:
Wan 2.1 T2V (~54 GB). Use the official Wan-AI repo — other forks may have incompatible checkpoint formats:
Verify the transformer directory contains a shard index file:
Wan 2.1 I2V:
HunyuanVideo:
The inference scripts default to loading models directly from HuggingFace by repo ID. Update them to use your local paths instead:
Verify:
Run Video Generation
The default prompt is loaded from examples/1/prompt.txt. Edit it to change the prompt, or change prompt_id in the script to use a different example. Always run from the project root.
Wan 2.1 Text-to-Video:
Default prompts🔽
Wan 2.1 Image-to-Video:
Default prompts🔽
HunyuanVideo Text-to-Video:
Default prompts🔽
When running correctly, you should see Attention processors replaced with SAP pattern. followed by Centroids initialized at layer N as the k-means attention warms up. Output videos are saved under result/ with a nested directory structure that encodes the generation config.
View Output Videos
To play them inline in a JupyterLab notebook cell:
If the cell hangs on large files, interrupt the kernel and open the video directly from the JupyterLab file browser instead.

Want to Use Your Own Prompt/Image?
Each inference script loads its prompt from a text file under examples/. The file used is determined by the prompt_id variable at the top of the script:
Script
Default prompt_id
Prompt file
Image file
wan_t2v_720p_sap.sh
1
examples/1/prompt.txt
—
wan_i2v_720p_sap.sh
1
examples/1/prompt.txt
examples/1/image.jpg
hyvideo_t2v_720p_sap.sh
7
examples/7/prompt.txt
—
You can also confirm which prompt is being used at runtime — it is printed to the terminal at the start of each run.
To use your own prompt, create a new example directory and write your prompt into it:
For I2V, also provide an input image:
Then open the script and change prompt_id to match your new directory:
Run as usual and the output filename will reflect your prompt_id, e.g. 8-0.mp4.
Additional Resources
Last updated
Was this helpful?