( 3 ) My Second Note on Experiences, About "2boys" ( 3 )

Let’s rewind to the beginning—back when I started working on my Lagoon Engine LoRA.

Step one, of course, was building the training dataset.

My first instinct was: Well, it’s a 2boys LoRA, so I need images with both boys together.

Even though the available materials were limited, the good thing was that the two brothers almost always appear together in the source material—solo shots of either of them are actually pretty rare.

So I collected some dual-character images, then cropped them manually to extract individual character shots. I put each set of solo images into its own folder and made sure they had the same number of repeats during training.

From the very first alpha version, all the way through various micro-adjustments and incremental releases up to beta, I stuck to that same core idea.

And when the beta version finally came out and the results looked pretty good, I was over the moon.

Naturally, I wanted to replicate that setup for another 2boys LoRA.

But… total failure.

I made several more 2boys LoRAs with different characters, and every single one of them had serious problems.

Either the features got horribly blended, or the LoRA straight-up failed to learn the characters at all.

It was super frustrating.

I couldn’t figure it out.

Was it really just luck that the first one worked? Did I get lucky with the way the alpha and beta versions happened to progress, avoiding the worst-case scenarios?

I didn’t want to believe that.

So I went back and did a series of controlled variable tests, trying to isolate what might be causing the difference. I made a whole bunch of test LoRAs just to look for clues.

That process was full of messy trial-and-error, so I won’t write it all out here.

Let’s skip to the conclusion:

Making a truly stable and controllable 2boys LoRA is almost impossible.

Most of what you’re doing is just trying to stack the odds in your favor—doing whatever you can to make sure the correct information is actually learned and embedded into the LoRA, so that it at least has the chance to generate something accurate.

Let me try to explain, at least in a very basic and intuitive way, how this works—both from what I’ve felt in practice and from a surface-level understanding of the actual mechanics.

Training a LoRA is kind of like doing a conceptual replacement.

Say you have this tag combo in your dataset:

“1boy, A, black hair, red eyes.”

Let’s say “A” is your character’s trigger token.

Inside the LoRA, those tags don’t really exist independently. The model ends up treating them like a single, bundled concept:

“1boy = A = black hair = red eyes.”

That means when you use these tags during generation, the LoRA will override whatever the base checkpoint originally had for those tags—and generate “A.”

Even if you remove some of the tags (like “A” or “black hair”) and only keep “1boy,” you’ll still get something that resembles A, because the LoRA associates all of those traits together.

Now let’s add a second character and look at what happens with this:

“2boys, A, black hair, red eyes, B, blue hair, grey eyes.”

The AI doesn’t actually understand that these are two separate boys.

Instead, it just sees a big lump of tags that it treats as a single concept again—this time, the whole block becomes something like:

“2boys + A + black hair + red eyes + B + blue hair + grey eyes.”

So if your dataset only contains pictures with AB, it won’t be able to generate A or B separately—because A and B are always bundled with each other’s features.

If you try generating “1boy, A,” it won’t really give you A—it’ll give you a blend of A and B, since A’s identity has been polluted with B’s features in the training data.

On the flip side, if your dataset only contains solo images of A and B—no dual-character pictures at all—it’s basically the same as training two separate LoRAs and loading them together. The features will mix horribly.

Even as I’m writing this explanation in my native language, I’m tripping over the logic a little—so translating it into English might not fully capture the idea I’m trying to get across.

Apologies in advance if anything here seems off or confusing.

If you find any parts that sound wrong or unclear, I’d really appreciate any feedback or corrections.

And that brings us to an important question:

If we want a 2boys LoRA to be able to generate both 1boy and 2boys images, does that mean we need both solo and dual images in the training set?

Yes.

Like I mentioned earlier, for popular characters, you don’t even need a LoRA—the checkpoint itself can usually generate them just fine.

But when it comes to more obscure ships, or really niche character pairings, there just aren’t many usable images out there.

You might not even have enough high-quality dual shots of them together—let alone clean solo images.

And for older anime series, image quality is often poor, which directly affects your final LoRA performance.

So I had to find a workaround for this data scarcity problem.

I came up with an idea, tested it—and surprisingly, it worked.

If you’ve seen any of the LoRAs I’ve uploaded, you’ll notice that they all come from source material with very limited visual assets. And yet, they’re all capable of generating multi-character results.

So let me explain how this approach works.

Fair warning: this part might get a little wordy and logically tangled.

I’m not that great at explaining complex processes in a concise way.

So translating this into English might only make it more confusing, not less.

Please bear with me!

First, you can follow the method from my first article to train two separate single-character LoRAs.

If you’ve got plenty of high-quality materials, one round of training is usually enough.

But if the source is old, low-res, or limited in quantity, you can use the LoRA itself to generate better-quality solo images, then retrain on those.

Next, take those generated solo images and combine them into dual-character compositions.

Yes, I’m talking about literally splicing two single-character images together.

This kind of "composite 2boys" image even has a proper tag on danbooru—so don't worry about the AI getting confused.

In fact, based on my tests, these handmade 2boys images are actually easier for the model to distinguish than anime screenshots or official illustrations.

Let me break that down a bit:

In anime screenshots, characters are often drawn at different depths—one in front, one behind, etc. That makes it harder for the AI to learn accurate relative body proportions.

If the shot is medium or long distance, facial detail is usually poor.

If it’s a close-up or fanart-style composition, the characters tend to be more physically close, which makes it easier for the AI to confuse features between them during training.

By contrast, the composite images you make using generated solo pics tend to have clear spacing and symmetrical framing—making it easier for the AI to learn who’s who.

Of course, this whole process is more work.

But for obscure character pairs with almost no usable material, this was the most effective method I’ve found for improving training quality.

If the characters you want to train already appear together all the time, and you can easily collect 100+ dual-character images, then you don’t need to bother with this method at all.

Just add a few solo images and train like normal.

The more material you have, the better the AI learns to distinguish them across different contexts—and the fewer errors you’ll get.

This whole process I’m describing is really just for when you have two characters with little or no decent solo or dual images available.

It’s a workaround, not the ideal path.

As shown above, you can even manually adjust the scale between the characters in your composite image to increase the chance of getting accurate proportions.

However, testing shows this only really helps when the characters have a noticeable size difference, or when the height goes all the way up to the forehead.

When you scale a character to chest or neck height, it often doesn’t work very well.

Also, even though this method can boost generation quality, it also increases the risk of overfitting—especially with facial expressions and poses, which may come out looking stiff.

To fix this, you can create multiple scenes with different angles and actions, and include both the original and the composite versions in your training set.

That way, the model learns more variety and avoids becoming too rigid.