How to Use WAN 2.2
WAN 2.2 Discover the open-source AI video generator that’s taking the creative world by storm. I’ll show you exactly how to use it, whether you have a powerful PC or just a web browser.
Have you seen those stunning AI-generated videos popping up on social media lately? You know the ones. They feature neon-soaked cityscapes. dream sequences, or characters moving with startling realism. Chances are, you’ve been watching creations from Wan 2.2.
What Exactly Is Wan 2.2?
Before we get our hands dirty, let’s understand what we’re working with. Put, Wan 2.2 is an advanced AI model that generates videos from text descriptions or images. Essentially, think of it as having a miniature film studio inside your computer.
The magic happens through something called a Mixture-of-Experts architecture. Without getting too technical, imagine having two specialized artists working on your video.
Firstly, a high-noise expert handles the early stages. It focuses on overall layout, composition, and basic motion patterns. Then, a low-noise expert takes over later. It refines details like lighting, texture, and cinematic elements.
If you want to read about whiskAI, click here.
Key Capabilities
Wan 2.2 comes in several flavors. Each one is designed for different creative needs.
The T2V-A14B model handles text-to-video generation. It’s best for creating videos purely from written descriptions. The I2V-A14B model manages image-to-video generation. It excels at animating static photos or artwork. The TI2V-5B model offers hybrid text-image-to-video capabilities. It provides a lightweight setup. That runs on mid-range hardware.
Then, the results speak for themselves. Wan 2.2 produces sharp 720p and even 1080p videos.
Method 1: The Easy Route – Using Wan 2.2 Online
Let’s be honest. Not everyone wants to dive into technical installations. Maybe you don’t have a powerful GPU. Perhaps you want to test the waters before committing. Whatever your reason, online platforms offer a friction-free way to experience Wan 2.2.
MyEdit: Character Motion Swap
One of the most accessible entry points is MyEdit. It’s a web-based tool that integrates Wan 2.2’s motion animation capabilities. Here’s how it works.
First, open MyEdit in your web browser. Then, navigate to Character Motion Swap. Next, upload a photo of a person or character you want to animate. After that, add a reference video. Showing the motion you want to apply. Alternatively, choose from built-in templates. Subsequently, select your background preference. You can keep your original photo background or use the video’s background. Finally, click Generate and wait 5 to 15 minutes while the AI works its magic.
Other Online Options
Several other platforms have begun integrating Wan 2.2 capabilities.
For instance, Replicate hosts the Wan 2.2 I2V-A14B model. Consequently, you can run generations through their API or web interface. Similarly, Vast.ai offers the model in its library with flexible GPU infrastructure. Additionally, Google Colab provides a notebook template that automates the entire setup process.
The Colab option deserves special mention. Someone created a ready-to-use notebook that handles everything. Specifically, it clones repositories, installs dependencies, downloads models, and sets up the generation pipeline. The models require about 60GB of space. You upload an image, configure a few settings, and run the cells. On an A100 GPU, expect 5 to 10 minutes per video. On a T4, expect 15 to 25 minutes.
Method 2: The Power Route – Running Wan 2.2 Locally
Are you ready to unlock Wan 2.2’s full potential? Running it locally gives you unlimited generations. Moreover, you get complete privacy. Additionally, you have the freedom to experiment without restrictions. However, it requires some preparation.
System Requirements
Wan 2.2 demands respectable hardware. Fortunately, thanks to multiple workflow options, it can run on various systems.
At minimum, you need an NVIDIA GPU with 8GB VRAM, like an RTX 3060. You also need 16GB of RAM and 20GB of free storage. Your operating system should be Windows 10, Windows 11, or Linux. Finally, you need the latest NVIDIA drivers with CUDA support.
For recommended performance, aim for 12GB or more VRAM. An RTX 4090 is ideal. You’ll also want 32GB of RAM and 50 to 100GB of fast SSD storage. The operating system and driver requirements remain the same.
Don’t panic if your GPU falls short. For cards with 4 to 6GB VRAM, use lighter workflows like TI2V-5B. For 12GB cards like the RTX 3060, GGUF quantized models make 14B operation possible.
Step 1: Install ComfyUI
Wan 2.2 isn’t a standalone application. Instead, it runs inside ComfyUI. Think of ComfyUI as your cockpit. It’s a modular graphical interface for AI workflows.
First, download ComfyUI from its GitHub repository or official website. Then, extract and install it on your system. Finally, launch ComfyUI in GPU mode. CPU mode works, but runs painfully slow.
When ComfyUI opens, you’ll see a drag-and-drop workspace. This is where Wan 2.2 workflows live.
Step 2: Choose Your Workflow
Inside ComfyUI, navigate to Workflows. Then go to Browse Templates. Finally, select Video. You’ll find multiple Wan 2.2 options.
Firstly, T2V-A14B handles text-to-video. It offers the highest quality but requires serious VRAM. Secondly, I2V-A14B manages image-to-video. Then, it’s great for animating photos or stills. Thirdly, TI2V-5B provides a lightweight hybrid option. It’s designed specifically for mid-range GPUs.
Start with TI2V-5B if your graphics card isn’t top-tier. It’s much friendlier to limited VRAM. Moreover, it still delivers impressive results.
Step 3: Download Model Weights
Workflows are just blueprints. They need model weights. These are the massive trained files that power Wan 2.2.
Firstly, download the correct weights for your chosen workflow. Secondly, you can find them on Hugging Face or ModelScope. Next, place these files in the models folder inside ComfyUI. Then, relaunch ComfyUI if needed. Finally, load your workflow again.
For the I2V-A14B model, you need both a high-noise model and a low-noise model. Place them in the ComfyUI models unet folder.
Pro tip for 12GB cards. Use GGUF format models. Regular formats just won’t fit in limited VRAM. Look for Q4 quantized versions. These are about 8.5GB each for high and low noise. Additionally, you’ll need a UMT5 text encoder GGUF Q5. That’s about 4GB.
Step 4: Prompt, Tweak, and Run
Now the creative fun begins.
Firstly, enter your text prompt in the workflow. Be descriptive. Really descriptive.
Secondly, use this basic formula. Subject plus Scene plus Motion.
For richer results. Use this advanced formula. Subject with description. Then, plus Scene with description, plus Motion with description. Then, Plus Aesthetic Control, plus Stylization.
Here’s an example. A black-haired Miao girl wearing ethnic minority clothing. Standing in a misty forest at dawn. Slowly turning her head to smile. then, cinematic lighting, shallow depth of field.
Next, adjust your settings. Start with a resolution of 512 by 512 or 720p. For frame count, shorter clips are easier on memory. For duration and frame rate, try 3 seconds at 24 frames per second.
Then, hit Run and watch the logs. You’ll see ComfyUI processing each frame.
Generation can take minutes or longer. It all depends on your hardware and settings. However, seeing your words transform into motion never gets old.
Once rendering finishes, preview the frames. Then export them as a video file. Congratulations. You’ve just created an AI-generated video entirely on your own machine.
Crafting the Perfect Prompt
Your prompts make or break your results. Wan 2.2 responds beautifully to detail. Then, cinematic descriptions. Here’s a structured approach.
Basic Formula
For beginners or when seeking creative inspiration, use this simple structure. Start with your subject. Then describe the scene. Finally, specify the motion.
For example, try something like this. A dragon soaring through clouds, wings spread wide.
Advanced Formula
For richer, more vivid results, expand your approach. Begin with your subject and add detailed descriptions. Then describe the scene with rich environmental details. Next, specify the motion clearly. After that, add aesthetic controls. Finally, include stylization preferences.
Image-to-Video Formula
Since your image already establishes subject and scene, focus on motion. Describe the movement you want. Then specify camera movement if desired.
For example, try something simple like this. Slowly waving hello, gentle smile, static shot.
Aesthetic Controls to Explore
Light source makes a huge difference. You can specify natural light, golden hour, neon lighting, or dramatic side lighting.
Shot size changes the feel completely. Consider a close-up, medium shot, or wide establishing shot.
Camera movement adds dynamism. Try dolly in, pan left, tracking shot, or static camera.
Troubleshooting Common Issues
Even experienced users hit roadblocks. Here are solutions to frequent problems.
CUDA errors or driver mismatches happen often. The fix is simple. Update your GPU drivers. Additionally, ensure your CUDA toolkit matches your PyTorch build.
Out-of-memory errors can stop you cold. Lower your resolution. Alternatively, reduce your frame count. You could also switch to the TI2V-5B workflow. For 12GB cards, definitely use GGUF quantized models.
Extremely slow rendering frustrates everyone. Use SSD storage. Then, close background apps. Lower your parameters. Check your system RAM speed. For instance, upgrading from DDR4-2666 to 3200MHz cut one user’s render times by 30 percent.
FAQ: Everything You Need to Know
Is Wan 2.2 really free?
Yes, absolutely. Wan 2.2 is completely open-source. Moreover, it’s free to download and use under the Apache 2.0 license. The only costs come from hardware, electricity, or cloud GPU usage if you choose that route.
What are the minimum system requirements?
You need an NVIDIA GPU with at least 8GB VRAM. An RTX 3060 or higher is recommended. Additionally, you need 16GB RAM minimum. However, 32GB is preferred. Finally, you need 20 to 100GB of free storage.
Can I run it on a laptop?
It depends on your laptop’s GPU. Many gaming laptops with RTX graphics can run lighter workflows like TI2V-5B. For thin-and-light laptops with integrated graphics, stick to online options.
How long does generation take?
On high-end hardware like an RTX 4090, expect 5 to 10 minutes for short clips. On mid-range cards with optimized settings, expect 15 to 25 minutes. Google Colab with T4 GPU runs 15 to 25 minutes per video.
Do I need to be a programmer?
For online platforms like MyEdit, no programming is required. You simply upload and click. For local installation, you’ll need basic comfort with the command line and file management. The Google Colab option sits in the middle. You follow step-by-step instructions without coding.
What’s the difference between 14B and 5B models?
The 14B models use the Mixture-of-Experts architecture. Consequently, they offer higher quality but require more VRAM. The 5B model is a dense model optimized for consumer GPUs. Specifically, it generates 720P video in under 9 minutes on a single GPU.
Can I create long videos?
Wan 2.2 excels at short clips. Typically, it produces 5 seconds at 24 frames per second. For longer content, you’d need to generate multiple clips. Then you’d edit them together.
Is there a version with audio?
The newer Wan 2.6 introduces audio synchronization capabilities. For Wan 2.2, focus on visual generation.
Which Path Should You Choose?
Still wondering whether to go online or local? Here’s my honest advice.
Choose online platforms if you want to test Wan 2.2 without commitment. Also, choose them if your computer lacks a powerful GPU. Choose them if the technical setup sounds overwhelming. Then, choose them if you need quick results for a specific project.
And if you’re somewhere in the middle. Try Google Colab first. It gives you a taste of local-style control. Moreover, it requires no permanent installation.
The Future Is Open
Wan 2.2 represents something special in the AI video landscape. While giants like OpenAI and Google. Keep their most powerful models behind closed doors. Alibaba chose to share Wan with the world. Consequently, this openness lets creators, researchers, and hobbyists experiment freely. As a result, they push the technology in new directions.
Your first generated clip may be short and imperfect. However, it’s yours. It was born from your words. It was generated by your machine or chosen platform. Most importantly, it’s limited only by your imagination.
So what will you create? A neon-soaked cyberpunk trailer? A fantasy creature brought to life? A family photo that suddenly waves back?
The tools are ready. The knowledge is yours. Now make something amazing.
Have you tried Wan 2.2? Share your creations and questions in the comments below!
