( 4 ) My Second Note on Experiences, About "2boys" ( 4 )


Updated:

You can also use this method with black-and-white manga as your training dataset.

As long as the tags are properly applied, it works just fine.

 

You can manually erase unwanted elements like text, speech bubbles, panel borders, or any distracting lines—this helps reduce visual noise in the dataset.

 

That said, if you have any colored images available, be sure to include them too.

That way, the model can learn the correct color mappings from the color images, and apply them to the grayscale ones.

Otherwise, the AI will try to guess color information based on its own understanding—and it’ll just end up picking something random based on the style.

 

 

When training on black-and-white manga, tagging is absolutely essential.

At a minimum, make sure you include:

“greyscale,” “monochrome,” “comic,” “halftone,” etc.—these tags tell the AI what kind of image it’s looking at.

When using an auto-tagging tool, try adjusting the confidence threshold to 0.1—this helps detect lower-confidence visual elements that might still be important.

 

Also, manga-specific drawing techniques—like "halftone""speech bubble",—should be explicitly tagged if possible.

You can use the Danbooru tag index to look up the correct vocabulary for these features.

 

Even for things like hair and eye color in B&W manga, it’s totally okay to use normal tags like “brown hair” or “brown eyes.”

As long as the image is also tagged with “greyscale” or “comic,” then more advanced community-trained checkpoints will be able to account for that and avoid misinterpreting the color info.

 

And if you find that your results still aren’t coming out right, you can always tweak things using positive/negative prompt tokens, regenerate some images, and retrain using those images.

Next, let’s talk about how many images you actually need in the training set when using this method.

 

In single-character LoRAs, it's ok to make the dataset as diverse and rich as possible, since you can always fix minor issues during image generation.

 

But for 2boys LoRAs, it’s a different story.

You really need to carefully filter out the best-quality images—skip the blurry ones—and put both your original source material and any AI-generated images you plan to reuse in the same folder.

If you want the LoRA to lean more toward the original art style, you can increase the ratio of raw, unmodified source images.

 

And don’t worry about file names—even if you duplicate the best images and their corresponding .txt tag files, your OS (like Windows11) will auto-rename the duplicates, and LoRA training tools support UTF-8 filenames just fine. You don’t need to rename everything into English.

 

Now, let’s talk about image types.

 

Close-up images almost always look better than full-body ones.

That’s just how AI generation works—detail quality falls off fast at long distances.

Even tools like adetailer may struggle to recover facial features at full-body scale.

 

However, full-body images are crucial for teaching the AI correct proportions.

So here’s a trick I discovered: when generating full-body samples, you can deliberately suppress certain features to prevent the AI from learning bad versions of them.

For example:

 Use prompt tokens like “closed eyes,” “faceless,” or even “bald” to make the AI leave out those features entirely.

 That way, you get a clean full-body reference without noisy detail on the face or hair.

 Don’t worry—hair or eyes features will still be learned from other images in your set.

 

And because you explicitly tagged those “blank” images as “faceless” etc., they won’t bleed into the generation unless you use those tags later.

 

But, if your setup can generate proper full-body detail using hires.fix or adetailer, don’t bother with the faceless trick.

In my testing, 99% of the time the AI respects the faceless tag, but there’s always that one weird generation where you get something uncanny. It can be spooky.

 

The same logic applies to hands and feet—you can use poses, gloves, socks, or out of frame to suppress poorly drawn hands, or even generate dedicated hand references separately.

 

The key point here is: this trick isn’t about increasing generation quality—it’s about making sure the AI doesn’t learn bad patterns from messy source data.

 

If you’ve ever used character LoRAs that produce blurry or distorted hands, this is one of the reasons.

Yes, some of it comes from checkpoint randomness—but more often it’s because anime source material barely bothers drawing hands, and the LoRA ended up overfitting on lazy animation frames.

 

So that’s the general idea behind building your dataset—

but how many images do you actually need, and how should you set your training parameters?

Here’s what I found after testing dozens of trial LoRAs based on this method:

 

At minimum, you need:

 20 solo images per character (A and B)

 15 dual-character images (AB)

 And in general:

 If you keep the step count the same, having more images with lower repeat values gives better results than fewer images with high repeat values.

 Sure, you can take shortcuts by duplicating images to pad the count, but it’s still best to exceed those minimums above:

 At least 20 solo images for A, 20 for B, and 15 for AB.

 

From my testing, for any 2boys LoRA, more images is never a problem,

but less than that threshold greatly increases the risk of feature blending or characters failing to train.

 

How to Calculate Step Count

Let’s get into the math.

 Assuming batch size = 1,

your effective training steps per folder are:

 Number of images × repeat × epoch

 Let’s say:

 A and B both have 60 images,

 repeat = 2,

 epoch = 10.

 Then:

 A: 60 × 2 × 10 = 1200 steps

 B: same, 1200 steps

 Now, for AB (dual-character) images:

Try to keep total training steps at 20%–40% of the solo step count.

 

For example:

 If A and B are both trained with 1200 steps,

 Then AB should use: 1200 × 0.4 = 480 steps

 Assuming epoch = 10:

 That’s 480 ÷ 10 = 48 steps per epoch

 So your AB folder should have:

 48 images × 1 repeat

 or 24 images × 2 repeat

 or 15 images × 3 repeat

(any of these combos works)

 

Why this matters:

As I explained earlier, the model doesn’t actually “understand” that A and B are separate characters.

Instead, “A + B” gets merged into a conjoined concept like a two-headed chimera.

 

Training on “AB” is essentially learning a completely different concept than A or B solo.

But because they all share overlapping tokens, they affect each other.

 

So when you prompt “A + B” during generation, the model is actually stacking:

 1 of “A,” 1 of “B,”

 and 1 hidden “A” and 1 hidden “B” lurking underneath it all.

 

The more your training steps for AB approach those of A and B, the more this overlapping weight stacking leads to feature confusion.

 

Now, here’s another issue:

 

Each character learns at a different pace.

Let’s say A gets learned completely by step 600, but B still lacks features at that point.

If you continue to step 800, A becomes overfit, and B is only just reaching the ideal point.

At 1000 steps, A is a mess and B might only just overfit.

 

This mismatch increases the chance that their traits will blend together.

 

One workaround is to:

 

Make sure AB has more than 15 images, and

 Give B’s solo folder a few more images than A’s, while keeping the repeat value the same between them.

 

What About Larger Batch Sizes?

If your goal is to stabilize character clothing or reduce merging,

you can try using a higher batch size for intentional overfitting.

 

Still keep the solo step count between 800–1200, and recalculate repeats.

 

Here’s an example:

 A and B each have 60 images

 Target: 1000 steps

 batch size = 4

 epoch = 10

 Then:

 1000 ÷ 10 × 4 = 400 steps per epoch

400 ÷ 60 ≈ 6.66 → round down to 6 repeats

 So:

 60 × 6 ÷ 4 × 10 = 900 steps

 Now for AB:

 Say you have 30 images

 900 × 0.4 = 360

 360 ÷ 10 × 4 = 144

 144 ÷ 30 ≈ 4.8 → round to 5 repeats

 So:

 30 × 5 ÷ 4 × 10 = 375 steps

375 ÷ 900 = ~0.417 → which is within the ideal 40% range.

0