The swap quality will always depend heavily on the quality of your input images. Larger, clean images with little noise or compression artifacts generally produce the best results. Keep in mind that the model always follows the quality of the body image, since it becomes the final rendered frame—so even if the face source is high-quality, a low-resolution or noisy body image will limit the outcome.
Most of the images I generate are created without using the LightX2V lighting LoRA, since I noticed that enabling it tends to make the skin appear more plastic-like and reddish, and finding the right balance requires extra tuning that I didn’t focus on. If anyone has discovered good configurations, feel free to share them in the comments of this template.
In short, using LightX2V makes the model less versatile because it operates with a fixed CFG value of 1.0. So before assuming it “didn’t work,” I recommend first testing the workflow I published without LightX2V to compare the results.
If you’re getting results with too much contrast, overly strong colors, or plastic-like textures while using LightX2V’s lightning models, try reducing the number of inference steps. For example, if you’re using the Qwen Image Edit 2509 Lightning (8 steps) model, try running it with 4 steps instead. The excessive contrast often comes from running too many steps while CFG remains fixed at 1.0.
If you encounter similar issues without using the lighting LoRA, try lowering the steps as well—e.g., from 20 down to around 16 or fewer—and reduce CFG to values like 1.2 or 1.5, which can help produce smoother, more natural results.
Another important detail: in images where the body is positioned farther from the camera, the face region becomes smaller, which can reduce swap accuracy and overall quality. This happens because the model has less pixel information to work with in that small facial area. To handle these cases, you can use my older workflow, which automatically crops the face region from the body image and performs an inpainting-like process to improve results in distant or small-face compositions.
Finally, if you notice loss of similarity between faces or poses—especially when the reference and target images differ significantly in aesthetics or angles—try increasing the strength of your head swap LoRA slightly (for instance, to 1.2 or 1.3) to restore consistency.
BFS — “Focus Head”
The “Focus Head” version was trained as a continuation of Focus Face, extending the dataset and shifting focus toward full head swaps.
It was trained on a NVIDIA RTX 6000 PRO, rank 32, for 12,000 steps, using 628 image pairs (face, body, target, and sometimes pose maps generated via MediaPipe).
🔹 Training Phases
Standard Face Swap – same Focus Face, focusing on facial identity.
Pose-Conditioned Face Swap – added pose maps to align gaze and head angle.
Full Head Swap – replaced the entire head (including hair) for stronger identity control.
After ~2000 steps, the focus moved toward head swap refinement. At ~4000 steps, the dataset was narrowed to perfect skin-tone matches, and by the end of training, the dataset evolved from 628 → 138 → 76 high-quality samples for final fine-tuning.
⚠️ Note: While Focus Face can still perform standard face swaps, it’s more naturally inclined toward full head swaps due to its data balance. This was intentional in part, but also a side-effect of dataset distribution and mixed conditioning.
⚠️ Important Notice
Do not share results involving real people, celebrities, or public figures. Civitai’s moderation may disable posts that violate likeness or consent rules. This model is intended only for artistic and fictional characters, educational use, and AI experimentation.
I take no responsibility for any misuse of this model. Please use it responsibly and respect all likeness rights.


