You can also use this method with black-and-white manga as your training dataset.
As long as the tags are properly applied, it works just fine.
You can manually erase unwanted elements like text, speech bubbles, panel borders, or any distracting lines—this helps reduce visual noise in the dataset.
That said, if you have any colored images available, be sure to include them too.
That way, the model can learn the correct color mappings from the color images, and apply them to the grayscale ones.
Otherwise, the AI will try to guess color information based on its own understanding—and it’ll just end up picking something random based on the style.

When training on black-and-white manga, tagging is absolutely essential.
At a minimum, make sure you include:
“greyscale,” “monochrome,” “comic,” “halftone,” etc.—these tags tell the AI what kind of image it’s looking at.
When using an auto-tagging tool, try adjusting the confidence threshold to 0.1—this helps detect lower-confidence visual elements that might still be important.
Also, manga-specific drawing techniques—like "halftone""speech bubble",—should be explicitly tagged if possible.
You can use the Danbooru tag index to look up the correct vocabulary for these features.
Even for things like hair and eye color in B&W manga, it’s totally okay to use normal tags like “brown hair” or “brown eyes.”
As long as the image is also tagged with “greyscale” or “comic,” then more advanced community-trained checkpoints will be able to account for that and avoid misinterpreting the color info.

And if you find that your results still aren’t coming out right, you can always tweak things using positive/negative prompt tokens, regenerate some images, and retrain using those images.
Next, let’s talk about how many images you actually need in the training set when using this method.
In single-character LoRAs, it's ok to make the dataset as diverse and rich as possible, since you can always fix minor issues during image generation.
But for 2boys LoRAs, it’s a different story.
You really need to carefully filter out the best-quality images—skip the blurry ones—and put both your original source material and any AI-generated images you plan to reuse in the same folder.
If you want the LoRA to lean more toward the original art style, you can increase the ratio of raw, unmodified source images.
And don’t worry about file names—even if you duplicate the best images and their corresponding .txt tag files, your OS (like Windows11) will auto-rename the duplicates, and LoRA training tools support UTF-8 filenames just fine. You don’t need to rename everything into English.
Now, let’s talk about image types.
Close-up images almost always look better than full-body ones.
That’s just how AI generation works—detail quality falls off fast at long distances.
Even tools like adetailer may struggle to recover facial features at full-body scale.
However, full-body images are crucial for teaching the AI correct proportions.
So here’s a trick I discovered: when generating full-body samples, you can deliberately suppress certain features to prevent the AI from learning bad versions of them.
For example:
Use prompt tokens like “closed eyes,” “faceless,” or even “bald” to make the AI leave out those features entirely.
That way, you get a clean full-body reference without noisy detail on the face or hair.
Don’t worry—hair or eyes features will still be learned from other images in your set.
And because you explicitly tagged those “blank” images as “faceless” etc., they won’t bleed into the generation unless you use those tags later.

But, if your setup can generate proper full-body detail using hires.fix or adetailer, don’t bother with the faceless trick.
In my testing, 99% of the time the AI respects the faceless tag, but there’s always that one weird generation where you get something uncanny. It can be spooky.
The same logic applies to hands and feet—you can use poses, gloves, socks, or out of frame to suppress poorly drawn hands, or even generate dedicated hand references separately.
The key point here is: this trick isn’t about increasing generation quality—it’s about making sure the AI doesn’t learn bad patterns from messy source data.
If you’ve ever used character LoRAs that produce blurry or distorted hands, this is one of the reasons.
Yes, some of it comes from checkpoint randomness—but more often it’s because anime source material barely bothers drawing hands, and the LoRA ended up overfitting on lazy animation frames.