We explore the Reward Backpropagation technique 1 2 to optimized the generated videos by Wan2.1-Fun for better alignment with human preferences. We provide the following pre-trained models (i.e. LoRAs) along with the training script. You can use these LoRAs to enhance the corresponding base model as a plug-in or train your own reward LoRA.
For more details, please refer to our GitHub repo.