WAN2.2 Overview and How to Prompt?


Updated:

AI Overview

The Wan 2.2 model, specifically referring to the ComfyUI version for image generation, has a prompt token limit of 256 input tokens. Extending the prompts can effectively enrich the details in the generated videos, further enhancing the video quality. Therefore, is recommend enabling a prompt extension (By default, the Qwen model is used for this extension.).

According to the official paper you should build your LLM profile around the next points:

  1. Instruct the LLM to add details to prompts without altering their original meanings, enhancing the completeness and visual appeal of the generated scenes.

  2. Rewritten prompts should incorporate natural motion attributes, where we add appropriate actions for the subject based on its category to ensure smoother and more fluent motion in the generated videos.

  3. Structuring the rewritten prompts similarly to the post-training captions, beginning with the video style, followed by an abstract of the content, and concluding with a detailed description. This method helps align prompts with the distribution of high-quality video captions

Main Feature: Cinematic Level

Wan2.2, have focused on incorporating the following innovations:

  • 👍 Cinematic-level Aesthetics: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more.

⚠️ Important: Always use Image-to-Video (I2V) mode for reference. The model does not understand abstract concepts well without visual guidance. If you need fixed poses or longer, controlled shots, switch to First-Last Frame to Video (FLF2V) mode instead.

General Guidelines

Use natural language to describe specific actions — avoid abstract concepts or keyword-stuffed phrases like in traditional AI image generation.

  • ✅ "The girl slowly lifts her right hand to adjust her sunglasses."

  • ❌ "Girl, hand, sunglasses, cool pose, stylish."

  • ❌ "Doing something cool with her hands."

Focus on one action per shot. Don't describe multiple actions in a single prompt, or the model may not be able to complete them within 5 seconds.

  • ✅ "The man waves his hand and smiles at the camera."

  • ❌ "The man walks into the room, picks up a book, reads a bit, then waves."

  • ❌ "She turns, waves, and jumps in excitement."

Be specific about which body parts are involved, what kind of clothing is worn, and what exactly is happening — avoid vague terms like 'clothes'.

  • ✅ "She pulls down the hood of her gray hoodie with both hands."

  • ❌ "She fixes her clothes."

  • ❌ "The character interacts with her outfit."

Avoid overly complex or large movements for now, such as bending over or torso twisting, as they may result in glitches or fail to render properly.

  • ✅ "He raises his left arm and points forward."

  • ❌ "She bends down to tie her shoes."

  • ❌ "The dancer spins twice and stretches backward."

Examples

Clear, specific action with natural phrasing, The woman winks at the camera while gently adjusting her hair.

🟡 Action is completed but feels vague or disconnected. A young man raises his arm and looks down at his wristwatch.

🟥 Too many actions stacked; difficult to complete in time. The woman kneels down, opens a backpack, takes out a book, and waves.

🟥 Multiple steps, only the first action may be completed. The child bounces a ball and then tries to jump and spin.

🟥 Too slow; the action doesn't finish in 5 seconds. The man slowly opens a wrapped gift box, carefully lifting the lid and removing the ribbon.

🟥 Too vague; lacks body part or clothing details. Subject adjusts their outfit and interacts with it.

License Agreement

The models in this repository are licensed under the Apache 2.0 License. WAN claim no rights over the your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license.

Source:

0