- Blog
- Wan 2.2 Animate Motion Types & Style Consistency: Full Body Movement Guide
Wan 2.2 Animate Motion Types & Style Consistency: Full Body Movement Guide
Quick Navigation
- Understanding the I2V Pipeline: Reference vs. Driving Video
- Motion Types: What Drives What
- Getting Natural Body Movement
- Style Consistency Across Video Clips
- The Sequential Frame Reference Trick
- Style Transfer Mode
- 30-Frame vs. 60-Frame Driving Clips
- Common Failure Modes and Fixes
- FAQ
Understanding the I2V Pipeline: Reference vs. Driving Video
Before diving into motion types, let's clarify the two core inputs that power every Wan 2.2 Animate generation:
Reference Image (Character Reference) The reference image defines who appears in the video — your character's face, body structure, clothing, and overall visual appearance. Wan 2.2 Animate extracts identity features from this image and preserves them throughout the generated clip. Think of it as the "identity anchor."
Driving Video The driving video is the motion source — it provides the movement that gets transferred onto your character. The model reads poses, gestures, and motion patterns from the driving video and applies them to the reference character's body. This is the "motion anchor."
The key insight: The reference image does NOT control motion, and the driving video does NOT control appearance. You control both independently. This separation is what makes Wan 2.2 Animate so powerful — you can take a professional dancer's movement and apply it to your custom character.
Motion Types: What Drives What
Wan 2.2 Animate's I2V pipeline breaks motion down into several controllable dimensions. Understanding which input drives which dimension helps you plan your generation strategy.
Full-Body Movement
What it is: Walking, running, sitting, standing, dancing, jumping — the overall body pose and locomotion through space.
What drives it: The driving video's skeletal pose data. If your driving video shows someone walking forward, Wan 2.2 will transfer that walking gait onto your reference character.
Best practice: Use driving videos with clear, full-body visibility. Avoid clips where the person is partially obscured or heavily cropped. The model needs visible joints and limb positions to anchor the motion.
Facial Expression
What it is: Smiling, frowning, talking, blinking, eyebrow movement — any emotion or expression conveyed through the face.
What drives it: The driving video's facial landmark data. This is extracted separately from body pose, meaning a driving video's facial expression transfers independently of the body motion.
Tip: For more expressive results, use driving videos with exaggerated, clear facial expressions. Subtle micro-expressions can sometimes get lost in the generation process. If you need precise lip-sync control, pair Wan 2.2 Animate with a dedicated lip-sync tool in post.
Hand Gestures
What it is: Pointing, waving, gesturing, holding objects, finger movement.
What drives it: The driving video's hand landmark data. This is often the weakest area of pose transfer — hands are complex and frequently misgenerated.
Common challenge: Fingers tend to blur, merge, or morph. Wan 2.2 performs better with gestures that are slightly distant from the camera (where fine details matter less) and struggles with close-up hand close-ups.
Workaround: Use driving videos where hands are at mid-distance, clearly visible but not overly zoomed in. Avoid gestures where fingers are splayed or in complex configurations.
Camera Movement
What it is: Pan, tilt, zoom, dolly, tracking — the cinematic camera motion within the frame.
What drives it: The driving video's global motion flow. Wan 2.2 can interpret camera movement from the driving video and apply it to your scene.
Note: Camera movement transfer is less precise than character motion transfer. For controlled camera moves, consider using a dedicated camera driving video (even a simple tripod pan) separate from your character action video.
Object Interaction
What it is: Character touching, holding, or interacting with objects in the scene.
What drives it: Driven by a combination of the driving video's pose and the prompt. The model attempts to render object interaction based on the pose data, but results vary.
Tip: Be specific in your prompt about object interactions. "Person holding a coffee cup" is more likely to succeed than expecting the model to infer complex object handling from pose alone.
Getting Natural Body Movement
Natural-looking movement comes from matching your driving video's motion characteristics to the style you want in the output. Here are the patterns that work best:
Walking and Running
What works:
- Driving videos with a full stride visible (ideally 3+ steps)
- Consistent pace — avoid driving videos where someone starts slow and speeds up
- Clear ground plane visibility (helps the model understand spatial context)
What to avoid:
- Walking in circles (causes disorienting rotational artifacts)
- Walking toward or directly away from the camera (foreshortening is hard to transfer)
- Very short clips (2-3 frames of walking don't give enough data)
Optimal driving clip length: 3-8 seconds of consistent walking at a steady pace gives the best locomotion transfer.
Sitting and Rising
What works:
- Driving videos showing the full transition (standing → sitting or sitting → standing)
- Clear chair/scene context in the driving video
- Slower, deliberate movements
Challenge: Wan 2.2 can struggle with the weight and momentum of sitting. The character may "float" into the seated position rather than dropping. Prompt explicitly: "sitting down on chair with full body weight."
Running and Athletic Movement
What works:
- Running in a straight line or gentle curve
- Driving videos from a slightly elevated angle (not ground level)
- Clear arm swing and leg stride cycle
Challenge: High-intensity motion can cause temporal artifacts — flickering, morphing, or inconsistent character appearance between frames. Keep driving clips shorter (3-5 seconds) for fast motion.
Dancing and Choreographed Movement
What works:
- Slow to medium tempo choreography (≤120 BPM movement)
- Full-body visibility throughout
- Driving videos with the dancer facing the camera at a 3/4 angle
Challenge: Complex, fast choreography can exceed the model's motion transfer fidelity. Break long dance sequences into shorter clips and string them together with careful frame reference management.
Style Consistency Across Video Clips
Generating a single clip is one thing. Generating a series of clips that look like they belong together — same character, same lighting, same art style — is the real challenge. Here's how to nail it:
The Core Problem
Without careful management, Wan 2.2 can produce subtle but noticeable inconsistencies between clips:
- Slightly different face shape or eye color
- Clothing color or pattern shifts
- Lighting temperature and direction changes
- Background style drift
Strategy 1: Fixed Reference Image
Use the exact same reference image for every clip in your sequence. Wan 2.2 uses this as the identity anchor — inconsistency in your reference is the #1 cause of character drift across clips.
Pro tip: Generate your reference image at higher resolution than you need for the final output, then downscale. The extra detail provides more stable identity signals during generation.
Strategy 2: Consistent Prompt Engineering
Include consistent descriptive elements in every prompt:
- Same lighting descriptors ("warm golden hour lighting," "soft studio lighting")
- Same art style keywords if using stylization ("anime cel shading," "photorealistic," "cinematic")
- Same environment descriptors ("modern office interior," "outdoor city street")
This gives the model consistent "style signals" across generations.
Strategy 3: Unified Background Context
If your scene has a consistent background, include a reference frame of the environment in your prompt or use a consistent background description across all clips. The model can use this to maintain spatial and lighting consistency.
Strategy 4: Color Grading in Post
Accept that AI generation has inherent variability. Plan to apply consistent color grading (brightness, contrast, color temperature, saturation) across all output clips in your video editor. This is faster than iterating on generations and gives you full creative control.
Strategy 5: Sequential Frame References
For complex sequences, instead of relying on a single still image as your reference, use a sequential frame reference — a short video of your character performing a representative pose. This provides more robust identity data than a single frame, helping the model maintain consistent appearance throughout a clip.
The Sequential Frame Reference Trick
This is one of the most powerful techniques for maintaining character consistency across long clips or multi-clip sequences.
What It Is
Instead of a single reference image, you provide a short video clip (5-10 frames) of your character in a stable, neutral pose. The model uses the temporal sequence to extract a more robust identity representation.
When to Use It
- Long clips (60+ frames) where single-image identity anchoring degrades over time
- Multi-clip sequences where you want rock-solid character consistency
- Scenes with significant pose variation (where a single reference pose may not cover all needed angles)
- When you're getting character drift or morphing mid-clip
How to Create a Good Sequential Frame Reference
- Pick a neutral pose — Standing, facing camera at 3/4 angle, arms relaxed at sides
- Use 5-10 clean frames — More frames can help, but quality matters more than quantity
- Match the lighting — If your driving video is in outdoor daylight, your frame reference should also be in daylight
- Match the clothing — The frame reference should show the character in the same outfit you want in the final video
- Avoid motion blur — Static, sharp frames work better than blurry ones
The Full-Body Character Sheet Alternative
For maximum consistency, some creators prepare a full-body character sheet — a high-quality illustration or carefully generated image showing the character in multiple views (front, side, 3/4). This provides the model with comprehensive identity data across different viewing angles, reducing the chance of inconsistent appearance from different camera perspectives.
Style Transfer Mode
Wan 2.2 Animate includes a style transfer mode that adjusts the aesthetic of your output based on a reference image or style description. This is separate from identity control — it changes the look while preserving the motion.
What Style Transfer Does
- Adjusts color palette, texture, and visual aesthetic to match a reference
- Can apply artistic styles (watercolor, anime, oil painting, sketch)
- Preserves the motion from your driving video
- Identity (character face/body) is preserved through the reference image, but the "render style" comes from the style reference
How to Use It
- In the generation interface, look for the "Style Reference" or "Style Transfer" input
- Upload an image representing the visual style you want
- Set the style strength (typically 0-100%)
- Generate — the motion transfers, but the look matches your style reference
Best Practices for Style Transfer
- Use high-quality style references with a clear, consistent aesthetic
- Start with lower style strength (30-50%) and increase as needed
- Very high style strength can cause identity leakage (your character starts looking like the style reference)
- Pair style transfer with careful prompt engineering for best results
30-Frame vs. 60-Frame Driving Clips
The length of your driving video significantly impacts results:
30-Frame Driving Clips (~1 second at 30fps)
Pros:
- Faster generation time
- More consistent motion (less time for drift or artifacts to accumulate)
- Better for quick, punchy motions (gestures, turns, reactions)
- Easier to keep motion coherent
Cons:
- Short actions may feel abrupt
- Limited ability to capture full motion cycles (a full walking stride may not fit)
- Harder to maintain natural pacing for slower actions
Best for: Facial expressions, hand gestures, quick reactions, short dialogue scenes
60-Frame Driving Clips (~2 seconds at 30fps)
Pros:
- Captures complete motion cycles (full walking stride, full turn)
- Better for continuous actions and longer scenes
- More natural pacing for slower movements
Cons:
- More susceptible to character drift/morphing over time
- Longer generation time
- More opportunity for artifacts to accumulate
- May introduce background inconsistencies
Best for: Walking sequences, athletic movement, dance choreography, sustained actions
Recommendation
Start with 30 frames. It's the safer default. Use 60 frames only when:
- Your action genuinely requires more frames (a complete walking stride)
- You've verified your reference image is stable enough to hold identity for that duration
- You're willing to iterate if you see character drift
For complex multi-step actions, consider multiple short clips chained together rather than one long clip. This gives you more control and reduces the drift risk.
Common Failure Modes and Fixes
Character Morphing Mid-Video
Problem: The character's face, body shape, or clothing changes subtly (or dramatically) as the video progresses.
Causes:
- Reference image identity signal is too weak
- Driving video motion is too complex or too long
- Inconsistent reference images across clips in a sequence
Fixes:
- Use a higher-quality, more detailed reference image
- Switch to a sequential frame reference (5-10 stable frames)
- Shorten the driving clip to reduce drift opportunity
- Add more descriptive identity details in your prompt ("red hair in ponytail, blue denim jacket, pale skin")
- Try the sequential frame reference technique described above
Floating or Unnatural Movement
Problem: The character moves without proper weight or grounding — floating, sliding, or disconnected from the environment.
Causes:
- Driving video lacks clear ground plane context
- Motion too fast for the model to render weight
- Insufficient prompt guidance about physicality
Fixes:
- Use driving videos with visible ground/surface
- Add explicit grounding cues to your prompt: "walking on concrete," "feet firmly planted," "heavy footsteps"
- For sitting/rising: include "sits down with full body weight" or "rises from seated position"
- Consider adding subtle post-processing effects (shadow, dust particles) to anchor the character
Hand and Finger Deformation
Problem: Fingers merge, blur, or become malformed during generation.
Causes:
- Hands are inherently difficult for AI models to render
- Driving video hand gestures are too complex
- Close-up camera angle on hands
Fixes:
- Use driving videos where hands are at mid-distance, not close-up
- Avoid complex finger configurations (spread fingers, intricate gestures)
- Choose gestures that are more forgiving: fist bumps, pointing at a distance, holding objects at arm's length
- If you need precise hand control, consider compositing hands in post from separate renders
- Use prompt guidance: "clear hands, visible fingers, clean gesture"
Inconsistent Lighting
Problem: The lighting on the character shifts between frames or across clips.
Causes:
- Driving video has inconsistent lighting
- Reference image lighting doesn't match driving video environment
- Prompt lacks consistent lighting descriptors
Fixes:
- Match the lighting in your reference image to your driving video environment
- Include consistent lighting descriptors in every prompt in your sequence
- For multi-clip sequences, apply uniform color grading in post-production
- Use style transfer mode to apply consistent lighting from a reference
Background Artifacts
Problem: Background elements flicker, shift, or introduce unwanted objects.
Fixes:
- Use driving videos with simple, clean backgrounds when possible
- Include background descriptions in your prompt for consistency
- For critical projects, consider compositing your character over a clean background plate in post
- Avoid driving videos with busy, moving backgrounds — the model may interpret background motion as part of the character motion
FAQ
Q1: Can I use different driving videos for body motion and facial expressions?
As of the current version, Wan 2.2 Animate uses a single driving video that controls both body and face simultaneously. You cannot yet specify separate videos for each. However, you can influence the balance through prompt engineering — explicitly describing the facial expression you want helps the model prioritize expression transfer. For advanced workflows, composite the face from a separate generation in post-production.
Q2: Why does my character's appearance change when they turn away from the camera?
This is one of the most common challenges in I2V generation. When a character turns significantly (especially showing their back), the model has less identity data from your reference image to work with, leading to drift. Fixes: Keep profile and back-facing angles brief in your driving video. Use the sequential frame reference technique for long sequences with significant camera movement. Add explicit prompt descriptors: "back view maintains same blue jacket and dark hair."
Q3: How do I maintain character consistency across a 10+ clip video sequence?
The key is treating your multi-clip project as a single system:
- Create one high-quality reference image (or sequential frame reference) that represents your character
- Use this exact same reference for every clip — never swap in a different image mid-sequence
- Write consistent prompts with matching style descriptors for every clip
- Apply uniform color grading across all output clips in post
- Use cross-dissolve transitions in your editor rather than hard cuts — this masks any micro-inconsistencies
Q4: Can I control the intensity of the motion transfer?
Yes — most interfaces allow you to tune the motion strength or influence. Lower values (60-80%) preserve more of the original driving motion while allowing the model to adapt it more naturally to your character. Higher values (90-100%) transfer motion more literally but may introduce artifacts if your character's proportions differ significantly from the driving video subject. For mismatched body types, start at 70-80% and adjust based on results.
Q5: What driving video quality is needed for best results?
Optimal driving videos have: clear full-body visibility, consistent lighting, minimal camera movement, a clean background, and steady, consistent motion (no sudden jerks). Resolution of 720p or higher is recommended. Avoid driving videos with heavy motion blur, extreme camera angles, or partially obscured subjects. For best results, record your own driving video in a controlled environment rather than using found footage with different lighting or quality than your reference image.
Ready to Master Wan 2.2 Animate?
Understanding motion types and style consistency is the foundation for professional-quality AI video. Now that you know how to control what drives what, you're ready to build more sophisticated workflows.
For deeper control over your prompts and getting the exact actions and expressions you want, check out our Prompt Engineering Guide for Wan 2.2 Animate.
For keeping your character looking the same across every clip and project, don't miss How to Maintain Character Consistency in AI Video — it covers advanced reference image techniques, character sheets, and post-production workflows.
Want to see these techniques in action? Head to the Wan 2.2 Animate Features page to explore more guides and start generating.
