( 5 ) My Second Note on Experiences, About "2boys" ( 5 )
Now let’s go over the training parameters.Once your images are ready, it’s recommended to manually crop and resize them.Then, enable the training options:“Enable arb bucket” and “Do not upscale in arb bucket”.Set the resolution to 2048×2048.This way, none of the images in your dataset will be automatically scaled or cropped.Otherwise, if an image is smaller than the set resolution, it will be upscaled automatically—but you won’t know what effect that upscaling has.If it’s larger than the set resolution, you won’t know what part got cropped. Please note that resolution size directly affects VRAM usage.Here’s a rough explanation of what factors impact VRAM consumption:In the training settings, as long as you don’t enable “lowram”—meaning the U-Net, text encoder, and VAE are loaded directly into VRAM—and you enable “cache latents to disk” and similar options,then VRAM usage is mostly determined by the maximum resolution in your training set and your network_dim.It has nothing to do with the number of images, and batch size has only a minor impact. Running out of VRAM won’t stop training entirely—it’ll just make it much slower.Of course, if you're using online training platforms like Tensor, you don’t need to worry about this at all. At dim = 32, a resolution of 1024×1024 basically uses around 8GB of VRAM.I assume that if you’re training locally, you’ve probably already moved past 8GB GPUs. At dim = 64, anything above 1024×1024, up to 1536×1536, will pretty much fill up a 16GB GPU. At dim = 128, VRAM usage increases drastically.Any resolution will exceed 16GB—Even 1024×1024 might go over 20GB,and 1536×1536 will come close to 40GB,which is far beyond what consumer gaming GPUs can typically handle. To put it simply:the dim value determines the level of detail a LoRA learns—the higher it is, the more detail it can capture.Lower dims learn less.Higher resolution and more content require higher dim values. For single-character 2D anime LoRAs, 32 is usually enough.For dual-character LoRAs, I recommend 32 or 64.For more than two characters, things get tricky—You can try 64 or 128, depending on your actual hardware. As for resolution, things get a little more complicated. In theory, the higher the resolution, the better the result.But in practice, I found that training at 2048×2048 doesn’t actually improve image quality when generating at 768×1152 or 1024×1536 etc.It also doesn’t really affect the body shape differences that show up at various resolutions.That said, since most checkpoints don’t handle high resolutions very well, generating directly at 2048×2048, 1536×2048, etc., often leads to distortions.However—those high-res images do have significantly better detail compared to low-res ones.Not all of them are distorted, either.At certain resolutions—like 1200×1600 or 1280×1920—the generations come out correct and stunningly good, far better than low-res + hires fix. But… training at 2048×2048 comes at a huge cost. So here’s the trade-off I recommend: Use dim = 64 Set resolution to 1280×1280 or 1344×1344 This gives you a balanced result on a 16GB GPU. I recommend 1344×1344, and setting the ARB bucket upper limit to 1536. With auto-upscaling enabled: Images with a 2:3 aspect ratio will be upscaled to 1024×1536 Images with a 3:4 ratio will become 1152×1536 This covers the most common aspect ratios, making it super convenient for both image generation and screenshot usage. Just a quick note here:Since I was never able to get Kohya to run properly in my setup, I ended up using a third-party GUI instead.The settings are pretty much the same overall—I’ve compared them with Tensor’s online interface,just simplified a bit, but all the important options are still there. As for the optimizer:I’ve tested a lot of them, and under my specific workflow—based on my setup, training set—I’ve found that the default settings of “Prodigy” worked best overall.But that’s not a universal rule, so you should definitely adjust it based on your own experience. I recommend always enabling gradient checkpointing.But gradient accumulation steps are usually unnecessary.If your resolution and dim don’t exceed your VRAM limits,there’s really no need to turn it on.And even if you do exceed the limit, turning it on doesn’t help much. Also, if you do enable it, you should enter the number as your batch size here,like"4", and your batch size should be 1. Put simply, gradient accumulation is just a simulation of batch sizes.In the end, the actual effect is determined by the product of accumulation steps × batch size.Here’s a quick summary of my “2boys LoRA” workflow: 1、Prepare source material to train two individual LoRAs, and use them to generate high-quality solo character images. 2、Stitch together the solo images to create dual-character compositions. If the original solo source images are already high-quality, you can use those directly for stitching too. 3、Sort images into three folders: “A”, “B”, and “AB”. Based on the number of images and your target steps (usually 800–1200), calculate the repeat value. repeat for “A” and “B” should be the same. For “AB,” calculate steps as 20%–40% of the solo steps, and adjust its repeat accordingly. 4、Tag the images. Pay attention to tag order, and for stitched images, make sure to include specific tags like "split screen", "collage", "two-tone background", etc. (More on this part later.) 5、Set up your training parameters. An epoch count of 10–16 is ideal; usually, 10 is more than enough. Start training, and monitor the loss curve and preview outputs as it runs. I have to throw some cold water on things first:Even after completing all the steps above,you might get a LoRA that works decently,but more often than not, it won’t turn out to be a satisfying “2boys LoRA.” What follows might get a bit long-winded,but I think these explanations are necessary—for better understanding and comparison—so I’ll try my best to keep the logic clear. As mentioned earlier, generating with a dual-character LoRA tends to run into a lot of common errors.So here I’ll try to analyze, from my limited understanding, the possible causes, some countermeasures, and also the unsolvable issues.Like I said before, the AI’s learning mechanism when training a LoRA doesn’t treat something like “A, black hair, red eyes” as an actual boy.Instead, it treats “1boy = A = black hair = red eyes” as a single, bundled concept.When all those tags show up together, the AI will generate that concept completely.But when only some of the tags are present, because they are so highly correlated as a group—especially with “1boy”—you end up getting partial features of the full set, even if not all tags are written. This is why a LoRA trained only on A and B single-character images can’t generate a correct two-boy image:“1boy” includes both A and B’s traits. Similarly, if your dataset includes only “AB” (two-character) images, then“2boys” becomes something like: “A + black hair + red eyes + B + blue hair + grey eyes” In this case, “A” or “B” is no longer a clean standalone identity,because they’re each tied to part of the “2boys” concept. Notice how I used “+” signs here, instead of the “=” we used with single-character images.That’s because when you train “1boy,” there are fewer tags involved,so all those traits get lumped together neatly. But for “2boys,” the AI doesn’t understand that there are two separate boys.It’s more like it takes all those features—“A, black hair, red eyes, B, blue hair, grey eyes”—and smears them onto a blank sheet of paper, which becomes two humanoid outlines,and the AI just randomly paints the traits onto the shapes. Even though most checkpoints don’t natively support “2boys” well,when you load a LoRA, the checkpoint’s weights still dominate over the LoRA.In my testing, whether using solo or duo images—if you don’t set up the trigger words A, B, to help the model associate certain features (like facial structure or skin tone) with specific characters,then the base model’s original understanding of “1boy” and “2boys” will interfere heavily,causing the model to simply fail to learn correctly.So for a “2boys LoRA,” it’s essential to define character trigger tags. Here’s where things get paradoxical: In single-character images, the tag “A” equals all of A’s traits.But in two-character images, “A” doesn’t just equal A.Instead, “A” is associated with the entire bundle of“2boys + A + black hair + red eyes + B + blue hair + grey eyes.” So when generating “2boys,”using the trigger “A” + “B” actually becomes a critical hit—a double trigger:you’re stacking one “definite A” + one “fuzzy A,”plus one “definite B” + one “fuzzy B.” That’s why using a 2boys LoRA often leads to trait confusion or extra characters appearing.It’s not just because the checkpoint itself lacks native support for dual characters.