How to Run Wan 2.2 Animate Locally vs Online: Complete Comparison Guide

2 months ago

If you’re an AI video creator who wants to run Wan 2.2 Animate locally—or you just want to get results fast without setting up a GPU—this guide walks you through both paths. We cover the official instructions on the Hugging Face model page, how to run it locally with pre‑ and post‑processing, and a simple online alternative you can use today.

We’ll keep the tone practical and step‑by‑step so you can decide the fastest, most reliable way to create “animate” and “replacement” videos using Wan 2.2 Animate—either on your own machine or instantly in the cloud.

Quick Answer: Pick Your Path

Want zero setup and the fastest time‑to‑result? Try an online SaaS that runs the same Wan 2.2 model—no install required.
Want full control or need to tune everything? Follow the official local workflow from the Hugging Face model page.
Prefer a no‑install middle ground? Use Hugging Face Spaces to run the official Animate demo online (subject to available compute).

What is Wan 2.2 Animate?

Wan 2.2 Animate takes two inputs and can work in two modes:

Animate: Generate a video where your character image mimics the human motion in an input video.
Replacement: Replace the character in a video with your chosen character image.

The official Animate model (Wan 2.2 Animate‑14B) is published on Hugging Face and includes:

The latest model weights and inference code.
Pre‑ and post‑processing scripts for both “animate” and “replacement” modes.
Guidance on single‑GPU and multi‑GPU inference (FSDP + DeepSpeed Ulysses).
Hardware performance benchmarks across common GPUs.

You can read the full model card for details and the latest instructions.

What You Need (At a Glance)

Python 3.9–3.11 recommended; torch >= 2.4.0.
GPU VRAM guidance (from the official model page benchmarks):
- 4090: ~28 GB peak for some workloads; capable of single‑GPU “fast” 720p runs.
- 3090: Typically < 40 GB for 14B variants using recommended flags; may fit single‑GPU with offloading/conversion.
- A100 40 GB: Excellent fit; suitable for 480p/720p 24fps runs.
- Consumer GPUs with lower VRAM: Consider offload flags or a 5B variant (TI2V‑5B), or try the online demo first.

How to Run Wan 2.2 Animate Locally (Hugging Face Way)

Follow these steps directly from the official model page. They’re tested and documented end‑to‑end for Animate‑14B.

1) Install the Environment

Clone the official repo:

git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2

Install dependencies (ensure torch >= 2.4.0). If flash_attn fails, install other packages first, then flash_attn last:
```
pip install -r requirements.txt
```
Optional (for speech‑to‑video experiments with S2V):
```
pip install -r requirements_s2v.txt
```

2) Download the Model

Using Hugging Face Hub CLI:

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-Animate-14B --local-dir ./Wan2.2-Animate-14B

Using ModelScope:

pip install modelscope
modelscope download Wan-AI/Wan2.2-Animate-14B --local_dir ./Wan2.2-Animate-14B

You should now have the checkpoint directory and any required assets.

3) Preprocess Inputs

Run preprocessing before inference. Pick the command matching your mode.

For “animate” mode:

python ./wan/modules/animate/preprocess/preprocess_data.py \
    --ckpt_path ./Wan2.2-Animate-14B/process_checkpoint \
    --video_path ./examples/wan_animate/animate/video.mp4 \
    --refer_path ./examples/wan_animate/animate/image.jpeg \
    --save_path ./examples/wan_animate/animate/process_results \
    --resolution_area 1280 720 \
    --retarget_flag \
    --use_flux

For “replacement” mode:

python ./wan/modules/animate/preprocess/preprocess_data.py \
    --ckpt_path ./Wan2.2-Animate-14B/process_checkpoint \
    --video_path ./examples/wan_animate/replace/video.mp4 \
    --refer_path ./examples/wan_animate/replace/image.jpeg \
    --save_path ./examples/wan_animate/replace/process_results \
    --resolution_area 1280 720 \
    --iterations 3 \
    --k 7 \
    --w_len 1 \
    --h_len 1 \
    --replace_flag

Preprocessing creates the artifacts you’ll pass to the generator (pose, flows, and other conditioning materials). The exact parameters can be tuned later for your specific sources.

4) Run Inference

Use the official generate.py with task animate‑14B. Examples below show typical flags from the model page.

Animate mode (single GPU):

python generate.py --task animate-14B --ckpt_dir ./Wan2.2-Animate-14B/ --src_root_path ./examples/wan_animate/animate/process_results/ --refert_num 1

Animate mode (multi‑GPU via FSDP + DeepSpeed Ulysses):

python -m torch.distributed.run --nnodes 1 --nproc_per_node 8 generate.py \
    --task animate-14B --ckpt_dir ./Wan2.2-Animate-14B/ \
    --src_root_path ./examples/wan_animate/animate/process_results/ \
    --refert_num 1 --dit_fsdp --t5_fsdp --ulysses_size 8

Replacement mode (single GPU):

python generate.py --task animate-14B --ckpt_dir ./Wan2.2-Animate-14B/ \
    --src_root_path ./examples/wan_animate/replace/process_results/ \
    --refert_num 1 --replace_flag --use_relighting_lora

Replacement mode (multi‑GPU via FSDP + DeepSpeed Ulysses):

python -m torch.distributed.run --nnodes 1 --nproc_per_node 8 generate.py \
    --task animate-14B --ckpt_dir ./Wan2.2-Animate-14B/ \
    --src_root_path ./examples/wan_animate/replace/process_results/src_pose.mp4 \
    --refert_num 1 --replace_flag --use_relighting_lora \
    --dit_fsdp --t5_fsdp --ulysses_size 8

Key flags you may want:

--offload_model True
--convert_model_dtype
--t5_cpu (for the 5B TI2V variant)
Avoid --use_prompt_extend unless you explicitly need it.

Note: The official page explicitly recommends not using LoRA models trained on Wan 2.2, as weight changes during training can lead to unexpected behavior.

5) Hardware Tips and Performance

The official model card shows benchmark numbers in the format Total time (s) / peak GPU memory (GB). Expect results like ~28 GB peak on a 4090 for certain runs, and <40 GB on a 3090 when using the recommended flags (offload_model, convert_model_dtype, etc.). If your VRAM is tight, try enabling offload and dtype conversion, or use fewer frames or a shorter clip.
If you don’t have enough VRAM, either:
- Use offload and conversion flags to shrink memory footprint.
- Switch to the 5B TI2V variant (supported via the same repo).
- Or run the online demo instead of local setup.

How to Run Wan 2.2 Animate Online (Hugging Face Spaces)

If you prefer to avoid local setup altogether, try the official Hugging Face Space for Wan 2.2 Animate. You can:

Upload a reference image and a template video.
Choose “Animate” to move the character from the image using the video’s motion, or “Replace” to swap the character in the video with your image.
Download the generated result.

Pros:

No local environment to install or debug.
Uses the official implementation.

Cons:

Limited compute availability.
Wait times during peak hours.
File size and duration limits may apply.

This option is ideal if you want to test the model quickly and validate your inputs before committing to longer renders.

Local vs HuggingFace vs Wan‑Animate SaaS: Head‑to‑Head Comparison

Choose the column that best fits your team’s skills, timeline, and budget.

Dimension	Local (DIY)	Hugging Face Spaces	Wan‑Animate SaaS
Technical setup	High (GPU drivers, torch, repo, deps)	None	None
Time to first result	Medium–High	Low	Lowest
Compute control	Full (flags, params, debugging)	Limited (shared resources)	Managed (cloud GPU)
Stability/Queue	Config‑dependent; no queues	Subject to shared‑compute waits	SaaS uptime
Output control	Very high	Moderate	High (480p/720p, Animate/Replace)
Cost model	Hardware capex, electricity	Free compute limits	Transparent pricing (packs or subscriptions)
Best for	Engineers who need full control	Quick testing without install	Fast production with zero setup

If you mainly want reliable results and minimal friction, the SaaS path is built for that exact use case.

FAQs (Based on Real Community Questions)

Do I need to install anything to try the model? No. You can run the official demo in Hugging Face Spaces or use a SaaS. Local install is optional.
What GPU do I need? From the official model page, 14B inference can run on a single 4090 for some workloads (~28 GB peak). A3090 with <40 GB can work using offload/conversion flags. The 5B TI2V variant is friendlier to smaller cards. If you don’t have a compatible GPU, use the online demo.
Why might I wait on Hugging Face Spaces? Popular Spaces share GPU time. Expect queueing during peaks.
Is it okay to use LoRA models on top of Wan 2.2? The official guidance is to avoid LoRA models trained on Wan 2.2, as they can cause unexpected behavior.
What are the output formats? You’ll download MP4 files. Frame rate depends on the source and configuration, with 480p/720p commonly used.

Bottom Line

If you’re technical and want maximum control, follow the official local workflow: install deps, download the model, preprocess, then run generate.py in animate or replacement mode.
If you want to test the model with minimal effort, use the Hugging Face Spaces demo.
If you want fast, reliable results without any setup or GPU management, use a SaaS that runs the same Wan 2.2 model with simple upload and download.

Ready to try the fast path? You can run Wan 2.2 Animate today without installing anything—just upload your files and download your video.

Official Model Card (Installation, Downloads, Benchmarks): https://huggingface.co/Wan-AI/Wan2.2-Animate-14B
Official Demo in Hugging Face Spaces: https://huggingface.co/spaces/Wan-AI/Wan2.2-Animate
Quick‑start SaaS (no setup required): https://wan-animate.com

Author

Wan-Animate Team