kitakaze

kitakaze

純愛戦士 pixiv: users/2139800
255
Followers
338
Following
3.4K
Runs
38
Downloads
2.1K
Likes
240
Stars

Articles

View All
( 5 ) My Second Note on Experiences, About "2boys" ( 5 )

( 5 ) My Second Note on Experiences, About "2boys" ( 5 )

Now let’s go over the training parameters.Once your images are ready, it’s recommended to manually crop and resize them.Then, enable the training options:“Enable arb bucket” and “Do not upscale in arb bucket”.Set the resolution to 2048×2048.This way, none of the images in your dataset will be automatically scaled or cropped.Otherwise, if an image is smaller than the set resolution, it will be upscaled automatically—but you won’t know what effect that upscaling has.If it’s larger than the set resolution, you won’t know what part got cropped. Please note that resolution size directly affects VRAM usage.Here’s a rough explanation of what factors impact VRAM consumption:In the training settings, as long as you don’t enable “lowram”—meaning the U-Net, text encoder, and VAE are loaded directly into VRAM—and you enable “cache latents to disk” and similar options,then VRAM usage is mostly determined by the maximum resolution in your training set and your network_dim.It has nothing to do with the number of images, and batch size has only a minor impact. Running out of VRAM won’t stop training entirely—it’ll just make it much slower.Of course, if you're using online training platforms like Tensor, you don’t need to worry about this at all. At dim = 32, a resolution of 1024×1024 basically uses around 8GB of VRAM.I assume that if you’re training locally, you’ve probably already moved past 8GB GPUs. At dim = 64, anything above 1024×1024,  up to 1536×1536, will pretty much fill up a 16GB GPU. At dim = 128, VRAM usage increases drastically.Any resolution will exceed 16GB—Even 1024×1024 might go over 20GB,and 1536×1536 will come close to 40GB,which is far beyond what consumer gaming GPUs can typically handle. To put it simply:the dim value determines the level of detail a LoRA learns—the higher it is, the more detail it can capture.Lower dims learn less.Higher resolution and more content require higher dim values. For single-character 2D anime LoRAs, 32 is usually enough.For dual-character LoRAs, I recommend 32 or 64.For more than two characters, things get tricky—You can try 64 or 128, depending on your actual hardware. As for resolution, things get a little more complicated. In theory, the higher the resolution, the better the result.But in practice, I found that training at 2048×2048 doesn’t actually improve image quality when generating at 768×1152 or 1024×1536 etc.It also doesn’t really affect the body shape differences that show up at various resolutions.That said, since most checkpoints don’t handle high resolutions very well, generating directly at 2048×2048, 1536×2048, etc., often leads to distortions.However—those high-res images do have significantly better detail compared to low-res ones.Not all of them are distorted, either.At certain resolutions—like 1200×1600 or 1280×1920—the generations come out correct and stunningly good, far better than low-res + hires fix. But… training at 2048×2048 comes at a huge cost. So here’s the trade-off I recommend: Use dim = 64 Set resolution to 1280×1280 or 1344×1344 This gives you a balanced result on a 16GB GPU. I recommend 1344×1344, and setting the ARB bucket upper limit to 1536. With auto-upscaling enabled: Images with a 2:3 aspect ratio will be upscaled to 1024×1536 Images with a 3:4 ratio will become 1152×1536 This covers the most common aspect ratios, making it super convenient for both image generation and screenshot usage. Just a quick note here:Since I was never able to get Kohya to run properly in my setup, I ended up using a third-party GUI instead.The settings are pretty much the same overall—I’ve compared them with Tensor’s online interface,just simplified a bit, but all the important options are still there. As for the optimizer:I’ve tested a lot of them, and under my specific workflow—based on my setup, training set—I’ve found that the default settings of “Prodigy” worked best overall.But that’s not a universal rule, so you should definitely adjust it based on your own experience. I recommend always enabling gradient checkpointing.But gradient accumulation steps are usually unnecessary.If your resolution and dim don’t exceed your VRAM limits,there’s really no need to turn it on.And even if you do exceed the limit, turning it on doesn’t help much. Also, if you do enable it, you should enter the number as your batch size here,like"4", and your batch size should be 1. Put simply, gradient accumulation is just a simulation of batch sizes.In the end, the actual effect is determined by the product of accumulation steps × batch size.Here’s a quick summary of my “2boys LoRA” workflow:  1、Prepare source material to train two individual LoRAs, and use them to generate high-quality solo character images.  2、Stitch together the solo images to create dual-character compositions. If the original solo source images are already high-quality, you can use those directly for stitching too.  3、Sort images into three folders: “A”, “B”, and “AB”. Based on the number of images and your target steps (usually 800–1200), calculate the repeat value. repeat for “A” and “B” should be the same. For “AB,” calculate steps as 20%–40% of the solo steps, and adjust its repeat accordingly.  4、Tag the images. Pay attention to tag order, and for stitched images, make sure to include specific tags like "split screen", "collage", "two-tone background", etc. (More on this part later.)  5、Set up your training parameters. An epoch count of 10–16 is ideal; usually, 10 is more than enough. Start training, and monitor the loss curve and preview outputs as it runs. I have to throw some cold water on things first:Even after completing all the steps above,you might get a LoRA that works decently,but more often than not, it won’t turn out to be a satisfying “2boys LoRA.” What follows might get a bit long-winded,but I think these explanations are necessary—for better understanding and comparison—so I’ll try my best to keep the logic clear. As mentioned earlier, generating with a dual-character LoRA tends to run into a lot of common errors.So here I’ll try to analyze, from my limited understanding, the possible causes, some countermeasures, and also the unsolvable issues.Like I said before, the AI’s learning mechanism when training a LoRA doesn’t treat something like “A, black hair, red eyes” as an actual boy.Instead, it treats “1boy = A = black hair = red eyes” as a single, bundled concept.When all those tags show up together, the AI will generate that concept completely.But when only some of the tags are present, because they are so highly correlated as a group—especially with “1boy”—you end up getting partial features of the full set, even if not all tags are written. This is why a LoRA trained only on A and B single-character images can’t generate a correct two-boy image:“1boy” includes both A and B’s traits. Similarly, if your dataset includes only “AB” (two-character) images, then“2boys” becomes something like: “A + black hair + red eyes + B + blue hair + grey eyes” In this case, “A” or “B” is no longer a clean standalone identity,because they’re each tied to part of the “2boys” concept. Notice how I used “+” signs here, instead of the “=” we used with single-character images.That’s because when you train “1boy,” there are fewer tags involved,so all those traits get lumped together neatly. But for “2boys,” the AI doesn’t understand that there are two separate boys.It’s more like it takes all those features—“A, black hair, red eyes, B, blue hair, grey eyes”—and smears them onto a blank sheet of paper, which becomes two humanoid outlines,and the AI just randomly paints the traits onto the shapes. Even though most checkpoints don’t natively support “2boys” well,when you load a LoRA, the checkpoint’s weights still dominate over the LoRA.In my testing, whether using solo or duo images—if you don’t set up the trigger words A, B, to help the model associate certain features (like facial structure or skin tone) with specific characters,then the base model’s original understanding of “1boy” and “2boys” will interfere heavily,causing the model to simply fail to learn correctly.So for a “2boys LoRA,” it’s essential to define character trigger tags. Here’s where things get paradoxical: In single-character images, the tag “A” equals all of A’s traits.But in two-character images, “A” doesn’t just equal A.Instead, “A” is associated with the entire bundle of“2boys + A + black hair + red eyes + B + blue hair + grey eyes.” So when generating “2boys,”using the trigger “A” + “B” actually becomes a critical hit—a double trigger:you’re stacking one “definite A” + one “fuzzy A,”plus one “definite B” + one “fuzzy B.” That’s why using a 2boys LoRA often leads to trait confusion or extra characters appearing.It’s not just because the checkpoint itself lacks native support for dual characters.
( 4 ) My Second Note on Experiences, About "2boys" ( 4 )

( 4 ) My Second Note on Experiences, About "2boys" ( 4 )

You can also use this method with black-and-white manga as your training dataset.As long as the tags are properly applied, it works just fine. You can manually erase unwanted elements like text, speech bubbles, panel borders, or any distracting lines—this helps reduce visual noise in the dataset. That said, if you have any colored images available, be sure to include them too.That way, the model can learn the correct color mappings from the color images, and apply them to the grayscale ones.Otherwise, the AI will try to guess color information based on its own understanding—and it’ll just end up picking something random based on the style.  When training on black-and-white manga, tagging is absolutely essential.At a minimum, make sure you include:“greyscale,” “monochrome,” “comic,” “halftone,” etc.—these tags tell the AI what kind of image it’s looking at.When using an auto-tagging tool, try adjusting the confidence threshold to 0.1—this helps detect lower-confidence visual elements that might still be important. Also, manga-specific drawing techniques—like "halftone""speech bubble",—should be explicitly tagged if possible.You can use the Danbooru tag index to look up the correct vocabulary for these features. Even for things like hair and eye color in B&W manga, it’s totally okay to use normal tags like “brown hair” or “brown eyes.”As long as the image is also tagged with “greyscale” or “comic,” then more advanced community-trained checkpoints will be able to account for that and avoid misinterpreting the color info. And if you find that your results still aren’t coming out right, you can always tweak things using positive/negative prompt tokens, regenerate some images, and retrain using those images.Next, let’s talk about how many images you actually need in the training set when using this method. In single-character LoRAs, it's ok to make the dataset as diverse and rich as possible, since you can always fix minor issues during image generation. But for 2boys LoRAs, it’s a different story.You really need to carefully filter out the best-quality images—skip the blurry ones—and put both your original source material and any AI-generated images you plan to reuse in the same folder.If you want the LoRA to lean more toward the original art style, you can increase the ratio of raw, unmodified source images. And don’t worry about file names—even if you duplicate the best images and their corresponding .txt tag files, your OS (like Windows11) will auto-rename the duplicates, and LoRA training tools support UTF-8 filenames just fine. You don’t need to rename everything into English. Now, let’s talk about image types. Close-up images almost always look better than full-body ones.That’s just how AI generation works—detail quality falls off fast at long distances.Even tools like adetailer may struggle to recover facial features at full-body scale. However, full-body images are crucial for teaching the AI correct proportions.So here’s a trick I discovered: when generating full-body samples, you can deliberately suppress certain features to prevent the AI from learning bad versions of them.For example: Use prompt tokens like “closed eyes,” “faceless,” or even “bald” to make the AI leave out those features entirely. That way, you get a clean full-body reference without noisy detail on the face or hair. Don’t worry—hair or eyes features will still be learned from other images in your set. And because you explicitly tagged those “blank” images as “faceless” etc., they won’t bleed into the generation unless you use those tags later. But, if your setup can generate proper full-body detail using hires.fix or adetailer, don’t bother with the faceless trick.In my testing, 99% of the time the AI respects the faceless tag, but there’s always that one weird generation where you get something uncanny. It can be spooky.  The same logic applies to hands and feet—you can use poses, gloves, socks, or out of frame to suppress poorly drawn hands, or even generate dedicated hand references separately. The key point here is: this trick isn’t about increasing generation quality—it’s about making sure the AI doesn’t learn bad patterns from messy source data. If you’ve ever used character LoRAs that produce blurry or distorted hands, this is one of the reasons.Yes, some of it comes from checkpoint randomness—but more often it’s because anime source material barely bothers drawing hands, and the LoRA ended up overfitting on lazy animation frames. So that’s the general idea behind building your dataset—but how many images do you actually need, and how should you set your training parameters?Here’s what I found after testing dozens of trial LoRAs based on this method: At minimum, you need: 20 solo images per character (A and B) 15 dual-character images (AB) And in general: If you keep the step count the same, having more images with lower repeat values gives better results than fewer images with high repeat values. Sure, you can take shortcuts by duplicating images to pad the count, but it’s still best to exceed those minimums above: At least 20 solo images for A, 20 for B, and 15 for AB. From my testing, for any 2boys LoRA, more images is never a problem,but less than that threshold greatly increases the risk of feature blending or characters failing to train. How to Calculate Step CountLet’s get into the math. Assuming batch size = 1,your effective training steps per folder are: Number of images × repeat × epoch Let’s say: A and B both have 60 images, repeat = 2, epoch = 10. Then: A: 60 × 2 × 10 = 1200 steps B: same, 1200 steps Now, for AB (dual-character) images:Try to keep total training steps at 20%–40% of the solo step count. For example: If A and B are both trained with 1200 steps, Then AB should use: 1200 × 0.4 = 480 steps Assuming epoch = 10: That’s 480 ÷ 10 = 48 steps per epoch So your AB folder should have: 48 images × 1 repeat or 24 images × 2 repeat or 15 images × 3 repeat(any of these combos works) Why this matters:As I explained earlier, the model doesn’t actually “understand” that A and B are separate characters.Instead, “A + B” gets merged into a conjoined concept like a two-headed chimera. Training on “AB” is essentially learning a completely different concept than A or B solo.But because they all share overlapping tokens, they affect each other. So when you prompt “A + B” during generation, the model is actually stacking: 1 of “A,” 1 of “B,” and 1 hidden “A” and 1 hidden “B” lurking underneath it all. The more your training steps for AB approach those of A and B, the more this overlapping weight stacking leads to feature confusion. Now, here’s another issue: Each character learns at a different pace.Let’s say A gets learned completely by step 600, but B still lacks features at that point.If you continue to step 800, A becomes overfit, and B is only just reaching the ideal point.At 1000 steps, A is a mess and B might only just overfit. This mismatch increases the chance that their traits will blend together. One workaround is to: Make sure AB has more than 15 images, and Give B’s solo folder a few more images than A’s, while keeping the repeat value the same between them. What About Larger Batch Sizes?If your goal is to stabilize character clothing or reduce merging,you can try using a higher batch size for intentional overfitting. Still keep the solo step count between 800–1200, and recalculate repeats. Here’s an example: A and B each have 60 images Target: 1000 steps batch size = 4 epoch = 10 Then: 1000 ÷ 10 × 4 = 400 steps per epoch400 ÷ 60 ≈ 6.66 → round down to 6 repeats So: 60 × 6 ÷ 4 × 10 = 900 steps Now for AB: Say you have 30 images 900 × 0.4 = 360 360 ÷ 10 × 4 = 144 144 ÷ 30 ≈ 4.8 → round to 5 repeats So: 30 × 5 ÷ 4 × 10 = 375 steps375 ÷ 900 = ~0.417 → which is within the ideal 40% range.
( 3 ) My Second Note on Experiences, About "2boys" ( 3 )

( 3 ) My Second Note on Experiences, About "2boys" ( 3 )

Let’s rewind to the beginning—back when I started working on my Lagoon Engine LoRA. Step one, of course, was building the training dataset.My first instinct was: Well, it’s a 2boys LoRA, so I need images with both boys together.Even though the available materials were limited, the good thing was that the two brothers almost always appear together in the source material—solo shots of either of them are actually pretty rare.So I collected some dual-character images, then cropped them manually to extract individual character shots. I put each set of solo images into its own folder and made sure they had the same number of repeats during training. From the very first alpha version, all the way through various micro-adjustments and incremental releases up to beta, I stuck to that same core idea. And when the beta version finally came out and the results looked pretty good, I was over the moon.Naturally, I wanted to replicate that setup for another 2boys LoRA. But… total failure.I made several more 2boys LoRAs with different characters, and every single one of them had serious problems.Either the features got horribly blended, or the LoRA straight-up failed to learn the characters at all. It was super frustrating. I couldn’t figure it out. Was it really just luck that the first one worked? Did I get lucky with the way the alpha and beta versions happened to progress, avoiding the worst-case scenarios?I didn’t want to believe that. So I went back and did a series of controlled variable tests, trying to isolate what might be causing the difference. I made a whole bunch of test LoRAs just to look for clues.That process was full of messy trial-and-error, so I won’t write it all out here.Let’s skip to the conclusion: Making a truly stable and controllable 2boys LoRA is almost impossible. Most of what you’re doing is just trying to stack the odds in your favor—doing whatever you can to make sure the correct information is actually learned and embedded into the LoRA, so that it at least has the chance to generate something accurate. Let me try to explain, at least in a very basic and intuitive way, how this works—both from what I’ve felt in practice and from a surface-level understanding of the actual mechanics. Training a LoRA is kind of like doing a conceptual replacement. Say you have this tag combo in your dataset:“1boy, A, black hair, red eyes.”Let’s say “A” is your character’s trigger token. Inside the LoRA, those tags don’t really exist independently. The model ends up treating them like a single, bundled concept:“1boy = A = black hair = red eyes.” That means when you use these tags during generation, the LoRA will override whatever the base checkpoint originally had for those tags—and generate “A.”Even if you remove some of the tags (like “A” or “black hair”) and only keep “1boy,” you’ll still get something that resembles A, because the LoRA associates all of those traits together. Now let’s add a second character and look at what happens with this:“2boys, A, black hair, red eyes, B, blue hair, grey eyes.” The AI doesn’t actually understand that these are two separate boys.Instead, it just sees a big lump of tags that it treats as a single concept again—this time, the whole block becomes something like:“2boys + A + black hair + red eyes + B + blue hair + grey eyes.” So if your dataset only contains pictures with AB, it won’t be able to generate A or B separately—because A and B are always bundled with each other’s features.If you try generating “1boy, A,” it won’t really give you A—it’ll give you a blend of A and B, since A’s identity has been polluted with B’s features in the training data. On the flip side, if your dataset only contains solo images of A and B—no dual-character pictures at all—it’s basically the same as training two separate LoRAs and loading them together. The features will mix horribly. Even as I’m writing this explanation in my native language, I’m tripping over the logic a little—so translating it into English might not fully capture the idea I’m trying to get across.Apologies in advance if anything here seems off or confusing.If you find any parts that sound wrong or unclear, I’d really appreciate any feedback or corrections. And that brings us to an important question:If we want a 2boys LoRA to be able to generate both 1boy and 2boys images, does that mean we need both solo and dual images in the training set?Yes.Like I mentioned earlier, for popular characters, you don’t even need a LoRA—the checkpoint itself can usually generate them just fine.But when it comes to more obscure ships, or really niche character pairings, there just aren’t many usable images out there.You might not even have enough high-quality dual shots of them together—let alone clean solo images. And for older anime series, image quality is often poor, which directly affects your final LoRA performance. So I had to find a workaround for this data scarcity problem. I came up with an idea, tested it—and surprisingly, it worked. If you’ve seen any of the LoRAs I’ve uploaded, you’ll notice that they all come from source material with very limited visual assets. And yet, they’re all capable of generating multi-character results. So let me explain how this approach works. Fair warning: this part might get a little wordy and logically tangled.I’m not that great at explaining complex processes in a concise way.So translating this into English might only make it more confusing, not less. Please bear with me! First, you can follow the method from my first article to train two separate single-character LoRAs.If you’ve got plenty of high-quality materials, one round of training is usually enough.But if the source is old, low-res, or limited in quantity, you can use the LoRA itself to generate better-quality solo images, then retrain on those.  Next, take those generated solo images and combine them into dual-character compositions.Yes, I’m talking about literally splicing two single-character images together.This kind of "composite 2boys" image even has a proper tag on danbooru—so don't worry about the AI getting confused.In fact, based on my tests, these handmade 2boys images are actually easier for the model to distinguish than anime screenshots or official illustrations. Let me break that down a bit:In anime screenshots, characters are often drawn at different depths—one in front, one behind, etc. That makes it harder for the AI to learn accurate relative body proportions.If the shot is medium or long distance, facial detail is usually poor.If it’s a close-up or fanart-style composition, the characters tend to be more physically close, which makes it easier for the AI to confuse features between them during training. By contrast, the composite images you make using generated solo pics tend to have clear spacing and symmetrical framing—making it easier for the AI to learn who’s who. Of course, this whole process is more work.But for obscure character pairs with almost no usable material, this was the most effective method I’ve found for improving training quality. If the characters you want to train already appear together all the time, and you can easily collect 100+ dual-character images, then you don’t need to bother with this method at all.Just add a few solo images and train like normal.The more material you have, the better the AI learns to distinguish them across different contexts—and the fewer errors you’ll get.This whole process I’m describing is really just for when you have two characters with little or no decent solo or dual images available.It’s a workaround, not the ideal path.  As shown above, you can even manually adjust the scale between the characters in your composite image to increase the chance of getting accurate proportions.However, testing shows this only really helps when the characters have a noticeable size difference, or when the height goes all the way up to the forehead.When you scale a character to chest or neck height, it often doesn’t work very well.Also, even though this method can boost generation quality, it also increases the risk of overfitting—especially with facial expressions and poses, which may come out looking stiff.To fix this, you can create multiple scenes with different angles and actions, and include both the original and the composite versions in your training set.That way, the model learns more variety and avoids becoming too rigid.
( 2 ) My Second Note on Experiences, About "2boys" ( 2 )

( 2 ) My Second Note on Experiences, About "2boys" ( 2 )

So why not just use NovelAI, since it’s so powerful?Because it’s closed-source and you can’t use LoRAs with it. And honestly, that whole universe of LoRAs—hundreds, thousands of them—is just too tempting to walk away from. Sure, NovelAI includes a lot of character data, but it’s still “limited.” And there was a point where it couldn’t generate NSFW content, depending on version and timing. Can we be sure the same won’t happen again, especially considering how things are going on platforms like Tensor? If the pairing you want to generate isn’t super obscure—say, something like Killua x Gon, or a crossover like Ash x Taichi—then yeah, NovelAI is highly recommended. It’s easy to use, has tons of art styles and themes, and honestly just blows open-source models out of the water in many cases. Even with open-source checkpoints, it’s not like you can’t generate those pairings at all. But keep in mind—results vary a lot depending on the checkpoint. Sometimes, using high-quality single-character LoRAs along with matching trigger words can help maintain character identity while improving the final quality. Yeah, this pose is a joke—it’s based on recent events involving Tensor and other platforms. 😂 Anyway, as shown here, if the checkpoint already contains character information, then adding a LoRA doesn’t usually create too much chaos. For popular characters from visually polished and high-profile anime—like Tanjiro, for example—you really don’t even need a LoRA. The base checkpoint already does a pretty good job of recreating them. Just to emphasize it again:If you’re trying to generate fanart for “2boys” pairings that aren’t super obscure, or ones with more than two characters, I strongly recommend using NovelAI first. Alternatively, try generating them directly using a capable checkpoint—you might be surprised how many characters don’t actually need a LoRA to be recognizable. Here’s another example:Say you’re trying to generate Edward Elric and Alphonse Elric from Fullmetal Alchemist. Both can be directly generated using certain checkpoints. Alphonse, when prompted alone, will show up in human boy form—not the armor—so that’s great.But as soon as you try to generate both together, the model pulls up the most common metadata for the two brothers—which usually means Ed + armored Al—so the human form of Al becomes almost impossible to get. If one character’s info is richly embedded in the checkpoint and the other’s isn’t, you’ll get poor results. Even when you pair this kind of checkpoint with a LoRA to help boost the weaker character, features still end up getting mixed, and the overall art style may also be altered.To the untrained eye, this kind of fusion might not stand out—but once you know what to look for, it becomes obvious. If you try to generate a character that isn’t in the checkpoint at all, only relying on a LoRA, then character features almost always get fused together. (There are tons of examples of this online, so I won’t bother including another one.) This is a NovelAI-generated image, and it didn’t use any positive or negative prompt tuning at all.You’ll need to experiment and tweak things more if you want to generate high-detail or highly stylized results. But even in this basic output, Leonhardt’s outfit is reproduced way better than what most LoRAs could manage. And Aoto, who’s a really niche character, still comes out surprisingly accurate.These two characters have nothing to do with each other, but NovelAI lets you combine them however you want. That said, NovelAI is a paid, closed-source commercial model, so every bit of GPU time costs money—you can’t afford to waste resources on endless trial and error. Now let’s talk about img2img and regional control workflows. There’s one big downside when it comes to characters being physically close to each other—like if they’re hugging or holding hands: the area where they make contact tends to look blurry or smudged.And beyond that, the workflow itself is pretty tedious. Whether you’re setting it up or just trying to use it day-to-day, it’s a hassle.That’s why, if you look at some creators on social media—even ones who’ve clearly made good-looking dual-character images—you’ll notice that most of their posts are still focused on 1boy.Why? Because working with LoRA + regional control setups is just that much trouble. Also, think about those moments where inspiration strikes out of nowhere—say, you come up with a perfect prompt on the spot. If you’re using an online app, it’s easy to just type it in and get going.But with these regional workflows, you’ll need a decent local PC to run everything. Worse yet, each plugin and workflow often needs a specific environment or version, and things constantly break.Personally, I gave up right away after trying them out. The more I tried to fix stuff after setup, the more bugs I ran into. So yeah, things like img2img and regional control can give decent results if you’re patient and willing to learn, but the learning curve is real—and the payoff isn’t always worth it. Still, if you’ve got the time and hardware for it, it might be worth a try. Also, let’s be real—there are some creator out there posting what look like flawless 2boys or multi-character images, and they don’t have that telltale NovelAI look either.The quality is way beyond what you can get with just LoRA stacking or inpainting.So don’t even doubt it—some of those are hand-drawn. Yes, people who know how to draw can absolutely also use AI. The two aren’t mutually exclusive. In fact, combining both skillsets might just be the ultimate power move. So, with all that said, I decided that making a dedicated 2boys LoRA for each of my favorite ships was the better path forward. It really can be any pairing. Any art style.You don’t need a crazy workflow—just the LoRA and the right prompt. The basic “principle” of a 2boys LoRA is actually pretty simple: as long as you have enough images where the two characters appear together, the LoRA will start learning from that.The hard part isn’t the theory—it’s how to make a dataset and training process that results in a LoRA that’s actually stable and usable. If you’ve ever trained or used a dual-character or multi-character LoRA, then you already know the common headaches:  1、Character features still get mixed up or swapped, and a fully correct output is just a matter of luck. 2、Body proportions between the two characters become completely random—unless their original designs have super obvious size differences (like Chilchuck and Laios). Even if their builds are similar, you’ll end up with one shrinking or growing unpredictably. You’ll instinctively start adding prompt like “same size,” but... yeah, that doesn’t help at all. 3、Overall image quality can be pretty low, so you have to pair it with style/detail LoRAs, and usually upscale with hires.fix, adetailer, or something similar. 4、And all the usual LoRA flaws still apply, like overfitting or distorted hands, etc.In the first article, I already talked a bit about the emotional rollercoaster of making my very first LoRA—which, by the way, was a 2boys LoRA.But I didn’t go into too much technical detail at the time.So this time, let’s really dive in.But first—sorry in advance.As far as I can tell, there’s basically zero public discussion online about how to train LoRAs for 2boys.Maybe there are some small, private Discord servers where people have shared their methods. But there's no way for us to know.My approach here is based entirely on:personal experience using the few existing multi-character LoRAs that are out there, andthe training parameters that were made public by those creators.From there, I did a bunch of experiments and exploration.So there’s always the chance that I took a completely wrong path right from the beginning. Even if the results I get now seem “okay,” I have no idea if I’m doing things the “right” way.At best, consider this a reference method.If you’re interested in making your own 2boys LoRA and don’t know where to start, maybe my process can serve as one way to approach it—or at least something to compare with.
( 1 ) My Second Note on Experiences, About "2boys"  ( 1 )

( 1 ) My Second Note on Experiences, About "2boys" ( 1 )

I'm currently testing out the article's sections to fit a word count, and the full version will be finished in the coming days.A quick recap— in a previous article, I talked about how I first fell into the world of AI-generated images, and how I started trying out LoRA training. I also shared some of my thoughts and experiences along the way. (Here’s the link: https://tensor.art/articles/868883505357024765) Looking at that article now, I have to say—some of the opinions in it feel a bit outdated at this point, though some parts are still valid and usable. I’ll be referencing a few of those ideas later in this article, so feel free to check the first one for additional context. I also made a promise in that post—if I ever learned more, I’d come back and share the updates. So here we are. This time, I’m going to talk about something I’ve been completely hooked on for a while now: making 2boys fanart and training LoRAs for "2boys". Before we jump in, let me say this clearly: English isn’t my native language, and honestly, I’m not very good at it either. So, like last time, this article was written in Chinese first and then translated into English with the help of AI tools. I’ll also be uploading the original Chinese version in case anyone prefers to read it directly or wants to translate it themselves. (As for the first article, I won’t be uploading a Chinese version—because for that one, I actually edited and tweaked the English draft a lot after translating, and never went back to rewrite a proper Chinese version.) As you already know, when it comes to AI-generated images, randomness is the absolute king. Everyone has their own go-to settings, and the results vary wildly depending on the model, the prompts, and personal aesthetic preferences. So this article is just me sharing my own experience—or maybe “impressions” is a better word—and the techniques I’ve picked up along the way. This is not a serious technical guide or a step-by-step how-to. Any theory I mention is only a surface-level take based on what I’ve felt from experimenting, and both the image generation and LoRA training discussed here are focused entirely on ACG-style boy characters—specifically, pre-existing characters from anime, manga, and games. That means no AI-randomly-generated “boys,” and definitely no real people involved. All of the image generation in this article is based on NoobAI and Illustrious derivative models. As for LoRA training, I only use the official Illustrious 2.0 model—unless otherwise stated later on, you can assume everything is built on top of that. Over the past few months, I’ve noticed that platforms like X , pixiv, Civitai, and all kinds of posts have been flooded with more and more 2boys content—and even multi-boy group images. Of course, most of it is NSFW, and a lot of it is the kind of thing you swipe past in a second: instant gratification, forgettable the next moment. But there are also some incredibly polished, undeniably well-made images that make you stop and wonder—how did they even make this? Back then, I was still a total beginner, so my very first thought was: They must have used two separate character LoRAs, right? And obviously, I had to try it myself. It just so happened that I found LoRAs for two of my favorite boys—both from a super obscure anime, with a ship that literally no one cared about (Maki x Arashi)—and somehow, it... kinda worked? Yep, it was an R-18 image, and I was way too excited to think clearly. I didn’t even care about the generation quality. The characters looked more or less correct, so I assumed I’d done it right. Full of confidence, I made another one with a different ship from the same series (Tsubasa x Shingo). Again, it kinda worked—yes, the characters would sometimes get blended together, but not always. So I doubled down on this misunderstanding that “2boys” images could just be done by loading two LoRAs together. Then came a total shocker: a well-known (actually, legendary) creator in the community released a dual-character LoRA for Yuta x Yomogi, and I was blown away. So this is also possible? Just one LoRA, and both characters in the same image? I posted a whole bunch of images using it (they’ve since been hidden on Tensor), and I misunderstood again—thinking oh wow, making a 2boys LoRA must be super easy! I got so hyped, thinking there’d be tons of dual-character LoRAs coming out soon, and that making fanart for my favorite ships would become a total breeze. By now, I bet you’re already laughing. Yep, it was a huge misunderstanding. If you’ve ever tried this yourself, you’ll know exactly what I’m talking about: when it comes to boys LoRAs, their individual features almost always end up getting blended together. Whether you're loading two separate LoRAs or trying to use one of those rare dual-character LoRAs, it's incredibly hard to get a clean, correct image. And dual-character LoRAs are slightly more manageable, but trying to load multiple LoRAs together? That’s not just “difficult,” it’s borderline impossible. You can mess with the prompt order, used "BREAK" and “different boys,” tweak the LoRA weights all you want—but the characters’ features still end up all mixed up. Looking back now at the images I made—and a lot of the ones I’ve seen on social media—they were mostly just wrong, only I was too hyped at the time to notice. Sure, there are a few images that look “perfect,” but I never dared ask the creators how they did it. And to make things worse, there’s barely any discussion online about how these images are actually made. You’ll find some info on mixed-gender or girl-girl LoRA combos, but boy-boy? None.So let’s skip the unnecessary storytelling and jump straight to the point. After a lot of experimenting (and failing), I’ve sorted out a few methods that more or less work: So after all that trial and error, here are the main approaches I’ve figured out so far:1. The checkpoint already “knows” a lot of characters.Sometimes, you can just write their names directly in the prompt. By checking the model’s README or checkpoint notes, you can find out which characters are already embedded into the model. For those that are, you can generate “2boys” images by simply using their recognizable names. In fact, many popular or classic characters work perfectly without any LoRA at all, and you can even mix them across different series.This should be the easiest method, but the catch is—the pairings people want to draw usually aren’t in the checkpoint. Even when the characters are there, their fidelity varies a lot. Some characters are only partially recognizable, and you’ll often need to add extra tags like eye color, hair color, hairstyle, and other traits to boost accuracy. But this method is only good for generating 1boy images. If you try to do 2boys, these extra tags end up interfering with each other. 2. Img2img or inpainting.A common and pretty straightforward method. You can load two LoRAs and generate a base image, then selectively redo the mixed-up parts by enabling just one LoRA at a time and using inpainting. This is just one version of the process—there are many ways to implement this idea. 3. Regional control.This requires plugins and complex workflows in WebUI or ComfyUI. The idea is to divide the canvas into different regions and assign different LoRA weights to each region. I won’t name specific extensions here—you can look them up yourself. Sometimes this works surprisingly well, but I’ll explain later why I eventually gave up on this method. 4. Dual- or multi-character LoRAs.Although there aren’t many of these around right now, most of the ones that do exist work decently. Some are a bit unstable, but others are quite effective. I’ll talk later about how I create 2boys LoRAs, so I won’t go into more detail just yet. 5. And this one’s huge: NovelAI.Still the most powerful commercial anime image model to this day. It supports an insanely large number of characters—far more than any open-source model. While most open models only include the main characters from a handful of franchises, NovelAI can generate even obscure side characters, and it updates new character data pretty fast.It also supports built-in regional control. So not just two boys—you can specify position and action for even more characters. It’s incredibly powerful, though yeah, the subscription isn’t cheap. A lot of those beautifully done crossover images you see on social media are actually NovelAI creations. You can usually spot them too: the art style is super consistent across characters, and clearly distinct from open-source models. 6. Other new methods—like Flux Kontext.This might be one of the biggest breakthroughs of the year in image generation models. I haven’t had time to explore it deeply yet, but it is capable of generating fanart. If you haven’t tried it yet, I recommend giving it a shot. People often talk about how Flux is #1 for photorealism, but it’s also surprisingly strong with anime-style content. The downside is the high cost and complexity—especially compared to SDXL-based workflows for LoRA training.So yeah, those are the main approaches I’ve explored so far. I’m sure there are other methods I haven’t discovered yet. But out of all these, I eventually chose to go with making "2boys" LoRAs. 
Possible Future Doujin LoRA Production Plans (Ongoing)

Possible Future Doujin LoRA Production Plans (Ongoing)

A list of LoRA I'm currently working on or would like to work on—no particular order. It might take a year, maybe two? Anyway, it's just a reminder to myself not to forget or get too lazy. If you happen to see one you're interested in, please don't get your hopes up—truth is, I really don't have much free time... (T_T)Also, I’m calling these the doujin versions because I’ve chosen to sacrifice fidelity in order to improve usability and reduce the chance of errors. The LoRA will only include character information—there won't be original outfits, accessories, or even things like band-aids. Since anime adaptations often mess with the original designs, the general rule is: if there's a manga, it takes priority; if not, then novel illustrations. Anime is used only as a color reference.The coloring and redrawing of black and white comics requires a lot of time to select and redo, and if necessary, some additional painting styles need to be added to assist.The more steps involved, the more the similarity may drop. But through repeated refinement, I can strip out almost 100% of the unnecessary info, so that only the simplest prompt words are needed when using it, and no additional messy quality prompt words and negative prompt words are used to give full play to the performance of the checkpoint model. It'll also be easier to combine with other effect or art-style LoRA.*******, & *** – that's a secret.轟駆流 & 軍司壮太宿海仁太 & 本間聡志Mikami riku & hidaka yukioTakaki & Aston, ride & mikazuki”long-term plans“ 星合の空 – arashi, maki, toma, yuta, tsubasa, shingo, shinjirou****,**、、 – another secretTenkai knights – toxsa, chooki, guren, ceylancomplete Motomiya daisuke – da02 moviecomplete 急襲戦隊ダンジジャー – kosuke, midori, kouji刀剣乱舞 – aizen & atsushiShinra, Shou, Arthur銀河へキックオフ!! – shou, aoto, tagi, ouzou, kotashinkalion – tsuranuki, hayato, ryuji, tatsumi, gin, jou, ryota, taisei, ten****, & **** – another secret, and with only the manga, it feels super difficultVanguard – kamui & kuronoBlack☆star & soulSubaru & GarfielShirou & kogirucomplete アライブ - 最終進化的少年 - Alive: The Final Evolution – 叶太輔 & 瀧沢勇太touma & tsuchimikadoyuta & yomogiEtc.
12
A bit of my experience with making AI-generated images and LoRAs ( 5 )

A bit of my experience with making AI-generated images and LoRAs ( 5 )

https://tensor.art/articles/868883505357024765 ( 1 )https://tensor.art/articles/868883998204559176 ( 2 )https://tensor.art/articles/868884792773445944 ( 3 )https://tensor.art/articles/868885754846123117 ( 4 )Extract the character from the image and place them onto a true white background. You might lose a bit of original coloring or brushstroke texture, but compared to the convenience it brings, that’s a minor issue.But don’t be too naive—things like expression, pose, clothing, and camera angle still need to be described properly. Doing so helps the AI learn accurate depth of field, which in turn helps it learn the correct body proportions. After that, even if you don’t include camera-related prompts when using the LoRA, it’ll still consistently output correct body shapes.A lot of people use cutout characters for their training data, but their tags miss things like camera info. So you might get a buff adult under the “upper body” prompt, and a toddler under “full body.”By now, you should have a solid understanding of how to prepare your training dataset. 5. Parameter Settings This part is quite abstract and highly variable. There are already many tutorials, articles, and videos online that go into detail about training parameters and their effects. Based on those resources, I arranged several parameters and ran exhaustive tests on them. Since I’m not particularly bright and tend to approach things a bit clumsily, brute-force testing has always been the most effective method for me. However, given the limits of my personal time and energy, my sample size is still too small to really compare the pros and cons of different parameter sets. That said, one thing is certain: do not use any derivative checkpoints as your base model. Stick with foundational models like Illustrious or Noobai. Using a derived checkpoint will make Lora work only in this one checkpoint. Another helpful trick for learning from others is that when you deploy LoRA locally, you can directly view the training metadata within SD or WebUI. I’ll also include the main training parameters in the descriptions of any LoRAs I upload in the future for reference. In the following section, I’ll use the long and drawn-out process of creating the LoRA for Ragun Kyoudai as an example, and give a simple explanation of what I learned through it. But before we get into that case study, let’s quickly summarize the basic LoRA training workflow:The real first is always your passion.2. Prepare your dataset.3. Tag your dataset thoroughly and accurately.4. Set your training parameters and begin training. Wait—and enjoy the surprise you’ll get at the end. As mentioned earlier, the first LoRA I made for Ragun Kyoudai was somewhat disappointing. The generated images had blurry eyes and distorted bodies. I chalked it up to poor dataset quality—after all, the anatomy and details in the original artwork weren’t realistic to begin with. I thought it was a lost cause. I searched through all kinds of LoRA training tutorials, tips, articles, and videos in hopes of salvaging it. And surprisingly, I stumbled upon something that felt like a breakthrough: it turns out you can train a LoRA using just a single image of the character. The method is pretty simple. Use one clear image of the character’s face, and then include several images of unrelated full body figures. Use the same trigger words across all of them, describing shared features like hair color, eye color, and so on. Then adjust the repeat values so that the face image and the body images get the same total weight during training. When you use this LoRA, you just need to trigger it with the facial feature tags from the original image, and it swaps in a consistent body from the other images. The resemblance outside the face isn’t great, but it dramatically reduces distortion. This inspired me—what if the “swapped-in” body could actually come from the original character, especially when working with manga? That way, I could use this method to supplement missing information. I went back through the manga and pulled references of side profiles, back views, full body shots, and various camera angles that weren’t available in the color illustrations. I tagged these grayscale images carefully using tags like greyscale, monochrome, comic, halftone, etc., to make sure the AI learned only the body shape, hairstyle, and other physical features, without picking up unwanted stylistic elements. This approach did help. But problems still lingered—blurry eyes, malformed hands, and so on. So I pushed the idea further: I used the trained LoRA to generate high-quality character portraits using detail-focused LoRAs, specific checkpoints, and adetailer. These results then became new training data. In parallel, I used other checkpoints to generate bodies alone, adjusting prompt weights like shota:0.8, toned:0.5, to guide the results closer to the target physique or my own expectations. The idea was that the AI could “fit” these new generated samples to the rest of the dataset during training. And it worked. This is how the Lagoon Engine Beta version came to be. At this point, I could completely ditch the low-resolution color and manga images from the training dataset and just use AI-generated images. I used prompts like simple background + white background to create portrait and upper body images with only the character. To avoid having blurry eyes and inconsistent facial features in full body shots, I used the faceless tag or even manually painted over the heads to prevent the AI from learning them—allowing it to focus solely on body proportions. That said, white background tends to be too bright and can wash out details, while darker backgrounds can cause excessive contrast or artifacts around the character edges. The most effective backgrounds, in my experience, are grey or pink. During this time, I also experimented with making a LoRA using just one single character portrait—again from Lagoon Engine. It was just one full color image with a clear, unobstructed view. And when I applied the same method and added new characters to create a LoRA with four characters, I hit a wall. The characters started blending together—something I’d never encountered before. With En & Jin, mixing was incredibly rare and negligible, but with four characters, it became a real problem.I adjusted parameters based on references from other multi-character LoRAs, but nothing worked. I’m still testing—trying to find out if the problem is with parameters, the need for group images, or specific prompt settings. Although the four-character LoRA was a failure, one great takeaway was this: black-and-white manga can be used to make LoRAs. With current AI redrawing tools, you can generate training data using AI itself.Example: Compared to LoRAs based on rich animation materials, using black-and-white manga is much more difficult and time-consuming. But since it’s viable, even the most obscure series have a shot at making a comeback. To summarize, creating multiple LoRAs for the same target is a process of progressive refinement, like crafting a drink from a shot of espresso. The first dataset with detailed tags is your espresso—it can be consumed as-is or mixed however you like. This method also works surprisingly well when creating LoRAs for original characters (OCs). Since OCs can have more complex features, you can start by generating a base image using a fixed seed with just hair/eye color and style. Train a first LoRA on that, then gradually add more features like dyed highlights or complex hairstyles during image generation. If the added features aren’t stable, remove some, and train another LoRA. Repeat this layering process until your OC’s full complexity is captured. This approach is far more stable than trying to generate all features in one go, even with a fixed seed, due to randomness, it’s hard to maintain complex character traits across different angles without breaking consistency. One more note: regarding character blending when using multiple LoRAs—there seems to be no foolproof way to prevent it. Even adding regularization sets during training doesn’t completely avoid it. As of now, the lowest error rate I’ve seen is when using characters from the same series, trained with the same parameters, by the same author, and ideally in the same training batch. And with that, we’ve reached the end—for now. I’ll continue to share new insights as I gain more experience. See you next time~
1
A bit of my experience with making AI-generated images and LoRAs ( 4 )

A bit of my experience with making AI-generated images and LoRAs ( 4 )

https://tensor.art/articles/868883505357024765 ( 1 )https://tensor.art/articles/868883998204559176 ( 2 )https://tensor.art/articles/868884792773445944 ( 3 )https://tensor.art/articles/868890182957418586 ( 5 )When it comes to training LoRAs, trying to fix all the bugs at the source is seriously exhausting. Unless you're doing LoRA training full-time, who really has the time and energy to spend so much of their free time on just one LoRA? Even if you are full-time, chances are that you'd still prioritize efficiency over perfection. And even after going through all the trouble to eliminate those bugs, the result might only be improving the “purity” from 60% to 80%—just a guess. After all, AI is still a game of randomness. The final training parameters, repeats, epochs, learning rate, optimizer, and so on will all influence the outcome. You’ll never “purify” it to 100%. And really, even 60% can already be impressive enough. So—worth it? My personal take: absolutely. If a certain character—or your OC—is someone your favorite since childhood, someone who’s part of your emotional support, someone who represents a small dream in your life, then why not? They’ll always be worth it.I’ve only made a handful of LoRAs so far, each with a bit of thought and some controlled variables. I’ve never repeated the same workflow, and each result more or less met the expectations I had at the beginning. Still, the sample size is way too small. I don’t think my experiences are close to being truly reliable yet. If you notice anything wrong, please don’t hesitate to point it out—thank you so much. And if you think there’s value in these thoughts, why not give it a try yourself?Oh, right—another disclaimer: due to the limitations of my PC setup, I have no idea what effect larger parameter values would have. All of this is based on training character LoRAs using the Illustrious model.Also, a very important note: this is not a LoRA training tutorial or a definitive guide. If you’ve never made a LoRA yourself but are interested in doing so, try searching around online and go ahead and make your first one. The quality doesn’t matter; just get familiar with the process and experience firsthand the mix of joy and frustration it brings. That said, I’ll still try to lay out the logic clearly and help you get a sense of the steps involved.0. Prepare your training set. This usually comes from anime screenshots or other material of the character you love. A lot of tutorials treat this as the most crucial step, but I won’t go into it here—you’ll understand why after reading the rest.1. Get the tools ready. You’ll need a computer, and you’ll need to download a local LoRA trainer or a tagging tool of some kind. Tools like Tensor can sometimes have unstable network connections, but they’re very convenient. If your internet is reliable, feel free to use Tensor; otherwise, I recommend doing everything on your PC.2. If you’ve never written prompts using Danbooru-style tags before, go read the tag wiki on Danbooru. Get familiar with the categories, what each one means, and look at the images they link to. This is super important—you’ll need to use those tags accurately on your training images.3. Do the auto-tagging. These tagging tools will detect the elements in your image and generate tags for them. On Tensor, just use the default model wd-v1-4-vit-tagger-v2—it’s fine, since Tensor doesn’t support many models anyway, and you can’t adjust the threshold. On PC, you can experiment with different tagger models. Try setting the threshold to 0.10 to make the tags as detailed as possible. You can adjust it based on your own needs.4. Now comes the most critical step—the one that takes up 99% of the entire training workload.After tagging is complete, fix your eyes on the first image in your dataset. Just how many different elements are in this image? Just like how the order of prompts affects output during image generation, prompts during training follow a similar rule. So don’t enable the “shuffle tokens” parameter. Put the most important tokens first—like the character’s name and “1boy.”For the character’s traits, I suggest including only two. Eye color is one of them. Avoid using obscure color names; simple ones like “red” or “blue” are more than enough. You don’t need to describe the hairstyle or hair color in detail—delete all automatically generated hair-related tags. Of course, double-check the eye color too. Sometimes it tags multiple colors like “red” and “orange” together—make sure to delete the extra ones.When it comes to hair, my experience is: if the color is complex, just write the hairstyle (e.g., “short hair”); if the hairstyle is complex, just write the color. Actually, if the training is done properly, you don’t even need to include those—just the character name is enough. But in case you use this LoRA with others that have potential for overfitting, it’s a safety measure to include them.Any tags about things like teeth, tattoos, etc., should be completely removed. If they show up in the auto-tags, delete them. The same goes for tags describing age or body type, such as “muscular,” “toned,” “young,” “child male,” “dark-skinned male,” etc. And if there are nude images in your dataset, and you think the body type looks good and you want future generations to match that body type, do not include tags like “abs” or “pectorals.”You may have realized by now—it’s precisely because those tags weren’t removed that they got explicitly flagged, and so the AI treats them as interchangeable. That’s why you might see the body shape, age, or proportions vary wildly in outputs. Sometimes the figure looks like a sheet of paper. That’s because you had “abs” and “pectorals” in your tags and didn’t realize those became part of the trigger prompts.If you don’t take the initiative to remove or add certain tags, you won’t know which ones have high enough weight to act as triggers. They’ll all blend into the chaos. If you don’t call them, they won’t appear. But if you do—even unintentionally—they’ll show up, and it might just bring total chaos.Once you’re done with all that, your character’s description should include only eye color and hair.For the character name used as a trigger word, don’t format it like Danbooru or e621. That’s because Illustrious and Noobai models already recognize a lot of characters. If your base model already knows your character, a repeated or overly formal name will only confuse it. What nickname do you usually use when referring to the character? Just go with that.See how tedious this process is, even just for tags setup? It’s far more complex than just automatically tagging everything, batch-adding names, and picking out high-frequency tags.Remember the task at the start of this section? To identify all the elements in the first image. You’ve now covered the character features. Now let’s talk about the clothing.Let’s say the boy in the image is wearing a white hoodie with blue sleeves, a tiger graphic on the front, and a chest pocket. Now you face a decision: do you want him to always wear this exact outfit, or do you want him to have a new outfit every day?Auto-tagging tools don’t always fully tag the clothing. If you want him to wear different clothes all the time, then break down this outfit and tag each part accordingly using Danbooru-style tags. But if you want him to always wear the same thing, just use a single tag like “white hoodie,” or even give the outfit a custom name.There’s more to say about clothing, but I’ll save that for the section about OCs. I already feel like this part is too long-winded, but it’s so tightly connected and info-heavy that I don’t know how to express it all clearly without rambling a bit.Next, observe the character’s expression and pose. Use Danbooru-style tags to describe them clearly. I won’t repeat this later. Just remember—tags should align with Danbooru as closely as possible. Eye direction, facial expression, hand position, arm movement, leaning forward or backward, the angle of knees and legs—is the character running, fighting, lying down, sitting, etc.? Describe every detail you can.Now, observe the background. Sky, interiors, buildings, trees—there’s a lot. Even a single wall, or objects on the wall, or the floor material indoors, or items on the floor—or what the character is holding. As mentioned earlier, if you don’t tag these things explicitly, they’re likely to show up alongside any chaotic high-weight tags you forgot to remove, suddenly appearing out of the ether.Are there other characters in the scene? If so, explain them clearly using the same process. But I recommend avoiding images like this altogether. Many LoRA datasets include them—for example, a girl standing next to the boy, or a mecha, or a robot. You need to “disassemble” these extra elements. Otherwise, they’ll linger like ghosts, randomly interfering with your generations.Also, when tagging anime screenshots, the tool often adds “white background” by default—so this becomes one of the most common carriers of chaos.At this point, you might already be feeling frustrated. The good news is that there are plenty of tools now that support automatic background removal—like the latest versions of Photoshop, some ComfyUI workflows, and various online services. These can even isolate just the clothes or other specific objects.
6
A bit of my experience with making AI-generated images and LoRAs ( 3 )

A bit of my experience with making AI-generated images and LoRAs ( 3 )

https://tensor.art/articles/868883505357024765 ( 1 )https://tensor.art/articles/868883998204559176 ( 2 )https://tensor.art/articles/868885754846123117 ( 4 )https://tensor.art/articles/868890182957418586 ( 5 )Alright, let’s talk about LoRA—many things in AI image generation really need to be discussed around it. But before that, I suppose it’s time for a bit of preamble again. LoRA, in my view, is the most captivating technology in AI image generation. Those styles—whether they’re imitations or memes. Those characters—one girl in a hundred different outfits, or the body of that boy you’re madly in love with. A large part of the copyright debate surrounding AI actually stems from LoRA, though people who aren’t familiar with AI might not realize this. In reality, it has hurt many people—but it has also captured many hearts. When you suddenly see an image of a boy that no one on any social media platform, in any language, is talking about—don’t you feel a sense of wonder? And when you find out that the image was created with LoRA, doesn’t your heart skip a beat? By the time you’re reading this, my first LoRA for Ragun Kyoudai has already been released. From the moment I had even the slightest thought of making a LoRA, I was determined that they had to be the first—the absolute first. But it wasn’t easy. The full-color illustrations I saved of them as a kid? Gone, thanks to broken hard drives and lost phones. The images you can find online now are barely 200x300 in resolution, and there are painfully few of them. I still remember the composition and poses of every single color illustration from 20 years ago, but in the internet of 2024, they’ve completely disappeared. All I had left were the manga and its covers, CDs, and cards. Could it be done? While searching for LoRA training tutorials and preparing the dataset for training, more and more doubts formed in my mind. Because of the art style, these images didn’t contain accurate anatomical structures. There weren’t multi-angle views—especially not from behind. Compared to datasets sourced from anime, mine felt pitifully incomplete. Still, I nervously gave it a first try. The result was surprising—AI managed to reproduce the facial features of the characters quite well. But it was basically just close-up shots. On the base model used for training, the generated images were completely unrecognizable outside the face. Switching to other derivative models, the characters no longer resembled themselves at all. So was it that AI couldn’t do it? Or was I the one who couldn’t? Or was it simply impossible to create a LoRA with such a flawed dataset? I decided to set it aside for the time being, since with my limited experience, it was hard to make a solid judgment. Later, while generating AI images, I began using LoRAs made by various creators. I wanted to know what differences existed between LoRAs—aside from the characters themselves. I didn’t discover many differences, but I did notice a lot of recurring bugs. That’s when I realized—I’d found a lead. Maybe understanding the causes of these bugs is the key to improving LoRA training. So let’s talk about it: What are these bugs? What do I think causes them? How can we minimize them during image generation? How can we reverse-engineer them to improve LoRA training? Just to clarify—as you know, these experiences are only based on LoRAs of boy characters. Not girls, and not those overly bara-styled characters either. 1. Overexposure2. Feminization3. On the base model used to train the LoRA (e.g., Pony, Illustrious), it doesn’t work properly: prompts struggle to change character poses or expressions; it’s impossible to generate multi-angle images like side or front views; eyes remain blurry even in close-ups; body shapes are deformed; figures become flat like paper; body proportions fluctuate uncontrollably.4. Because of the above, many LoRAs only work on very specific checkpoints.5. Even on various derivative checkpoints, key features like the eyes are still missing; the character doesn’t look right, appears more feminine, character traits come and go; regardless of the clothing prompt used, the original costume features are always present.6. Character blending: when using two character LoRAs, it’s hard to distinguish between them—let alone using more than two.7. Artifacts: most notably, using a white background often results in messy, chaotic backgrounds, strange character silhouettes, and even random monsters from who-knows-where.8. Sweat—and lots of sweat.9. I haven’t thought of the rest yet. I’ll add more as I write. All of these issues stem from one core cause: the training datasets used for LoRAs are almost never manually tagged. Selecting and cropping the images for your dataset may take only 1% of the time spent. Setting the training parameters and clicking “train”? Barely worth mentioning. The remaining 99% of the effort should go into manually tagging each and every image.But in reality, most people only use an auto-tagger to label the images, then bulk-edit them to add the necessary trigger words or delete unnecessary ones. Very few go in and manually fix each tag. Even fewer take the time to add detailed, specific tags to each image.AI will try to identify and learn every element in each image. When certain visual elements aren’t tagged, there’s a chance the AI will associate them with the tagged elements, blending them together.The most severe case of this kind of contamination happens with white backgrounds.You spent so much effort capturing, cropping, cleaning, and processing animation frames or generating OC images. When you finally finish training a LoRA and it works, you’re overjoyed. Those “small bugs” don’t seem to matter.But as you keep using it, they bother you more and more.So you go back and create a larger dataset. You set repeats to 20, raise epochs to 30, hoping the AI will learn the character more thoroughly.But is the result really what you wanted?After pouring in so much effort and time, you might have no choice but to tell yourself, “This is the result I was aiming for.”Yet the overexposure is worse. The feminization is worse. There are more artifacts. The characters resemble themselves even less.Why?Because the untagged elements from the training images become more deeply ingrained in the model through overfitting.So now it makes sense:Why there's always overexposure: modern anime tends to overuse highlights, and your dataset probably lacks any tag information about lighting.Why it's so hard to generate multi-angle shots, and why character sizes fluctuate wildly: because your dataset lacks tags related to camera position and angle.Why the character becomes more feminine: perhaps your tags inadvertently included terms like 1girl or ambiguous gender.Why certain actions or poses can't be generated: because tags describing body movement are missing, and the few that exist are overfitted and rigid.In short:Elements that are tagged get learned as swappable; elements that are untagged get learned as fixed.That may sound counterintuitive or even go against common sense—but it’s the truth.This also explains why, when using two character LoRAs together, they often blend: because tags for traits like eye color, hair color, hairstyle, even tiny details like streaks, bangs, short ponytails, facial scars, shark teeth—all of these are written in detail, and the more detailed the tags, the more they influence each other. Because the AI learns them as swappable—not inherent to the character.And no matter what clothing prompts you use, the same patterns from the original outfit keep showing up—because those patterns were learned under the clothes tag, which the AI considers separate and constant.LoRAs that are overfitted also tend to compete with each other over the same trigger words, fighting for influence.So, from a usage perspective, some of these bugs can be minimized.Things like overexposure, feminization, sweat—if you don’t want them, include them in your negative prompts.For elements like lighting, camera type, and viewing angle—think carefully about your composition, refer to Danbooru-style tags, describe these elements clearly and include them in your positive prompts.Also, make sure to use more effective samplers, as mentioned earlier.Use LoRAs that enhance detail but don’t interfere with style—such as NoobAI-XL Detailer. Hand-fixing LoRAs aren’t always effective, and it’s best not to stack too many together.One final reminder: you usually don’t need to add quality-related prompts. Just follow the guidance provided on the checkpoint’s official page.
5
A bit of my experience with making AI-generated images and LoRAs ( 2 )

A bit of my experience with making AI-generated images and LoRAs ( 2 )

https://tensor.art/articles/868883505357024765 ( 1 )https://tensor.art/articles/868884792773445944 ( 3 )https://tensor.art/articles/868885754846123117 ( 4 )https://tensor.art/articles/868890182957418586 ( 5 )Second, the prompts  are always the most critical. Many people don't realize that, and haven't read the instructions for the use of those checkpoints, that the number of prompts has an upper limit, and they are also in order, from first to last, so don't let those quality prompts occupy too much. “score_9_up,” “score_8_up,” etc., are used by the Pony model, while the Illustrious and Noobai models don't need them at all. So, regardless of which base model you're using, just follow the instructions written on the page. Whether you write a hundred perfect hands in the positive prompt or add six-finger, seven-finger hands in the negative prompt, it won’t make the hand generation stable. I used to think it would be helpful, but in the face of a lot of facts, it’s just a psychological effect. Excessive quality prompts will make the image worse, not better. The order of these quality prompts does have an effect, but it can generally be ignored. The most important factor is the order of your prompts. Although the prompts are generally random, their order and adjacency do have an impact: tokens placed earlier are more likely to produce better results than those placed later, and neighboring tokens tend to interact with each other. So if you want the image to be more in line with your imagination, it's best to conceive and write the elements of the picture in order. Here's a tool called BREAK, which recalculates the number of tokens. One of the effects it brings is that it tries to interrupt the influence between adjacent prompts. For example, writing "artist name" at the beginning and "BREAK, artist name" at the end will produce a much stronger style than writing the trigger word in the middle. Alternatively, placing it between different characters will likely make the characters more separate. Another tool is the | symbol, which strengthens the connection between two adjacent prompts and tries to merge their effects. Try experimenting with both and using them flexibly. Because of the tag-based training methods of Illustrious and Noobai, it's best to use prompts that align completely with the tags found on Danbooru. When thinking of an action or an object, it's advisable to check Danbooru for corresponding tags. You can also refer to Danbooru’s tag wiki or use many online tag-assistance websites to make your promptss more precise. Elements like lighting, camera angles, and so on can be researched for their effects and incorporated. E621 tags are only applicable to Noobai, while Danbooru tags are universal. Although natural language is not well-supported by Illustrious and Noobai, it can still be useful as a supplement. Be sure to start with a capital letter and end with a period. For example, if you want to describe a blue-eyed cat, writing "cat, blue eyes" might result in several cats with the boy's eyes, but writing "A blue eyes cat." will make sure the cat's eyes are blue. You can also use this method to add extra details after describing a character's actions using reference tags. Additionally, you can describe a scene and use AI tools like Gemini or GPT to generate natural language prompts for you. Prompt weights can also be assigned, with the most common method being using ( ) or a value like :1.5. This will make the weighted prompt appear more often or have a stronger or weaker effect. Fixing a random seed and assigning different weights to prompts is a very useful technique for fine-tuning the image. For example, if you generate an image with the right action but the character looks too muscular, you can recreate the image, find the random seed parameter, fix it, and then adjust with something like "skinny:1.2" or "skinny:0.8" to tweak the character's appearance. This method won’t usually change the original composition of the image. As for the method like (promptA:0.5, promptB:1.3, promptC:0.8), I didn’t find any pattern in it, so it can be used just as a kind of randomness. The above experiences in prompts may not be as good as good luck. Sometimes, just emptying your mind and writing randomly can lead to unexpected results. So, don’t get too caught up in it. If you can’t achieve the desired effect, just let it go and change your mindset. As for the images I’ve posted on Tensor, aside from the first few, all of the prompts have been tested using the same checkpoint on my local ComfyUI. Even though the LORA and parameters used may differ, generating a correct image doesn’t require repeating the process many times, unless it is full of bugs when it is released. There are still some things I haven’t thought of, but I’ll add them when I write the LORA section later. Third, parameter settings such as sampler, schedule, steps, CFG, etc. The principles behind these are too technical and hard to understand, but you can use simple trial and error combined with the test results of others to find the best settings.It’s really important to point out that a lot of people have never touched these settings—those options only show up when you switch Tensor to advanced mode. Free users on Civitai only get a few default choices, which are nowhere near as rich as what Tensor offers. The default sampler, “Euler normal”, generally performs quite poorly. If you haven’t tried other samplers, you might not even realize how much hidden potential your slightly underwhelming LoRA actually has.Below are the ones I use most often. The names are too long, so I’ll use abbreviations: dpm++2s_a, beta, 4, 40. If you’re using ComfyUI, switching the sampler to "res_m_a", “seeds_2”, “seeds_3” will yield unexpected surprise results. The default descriptions of these parameters on Tensor and other websites don’t fully explain their real effects, and many people haven’t tried changing them. In fact, they’re constantly evolving, and the most commonly used and recommended samplers for most checkpoints are "euler_a" and "dpm++2m", "norma" and "karras" don’t perform well in practice. Based on my experience, no matter which sampler you use, combining it with "beta" always gives the best results. If your checkpoint has bugs when using "beta", try "exponential"—these two are always the best, though they are also the slowest. Don’t mind the time; waiting an extra 10 or 20 seconds is worth it. "dpm++2s_a" is also the best in most cases, with more details and a stronger stylization. Only use something else if bugs persist regardless of how you modify the prompts. Next, "euler_dy" or "euler_smea_dy", which are supported by Tensor, offer a balance of detail between "euler_a" and "dpm++2s_a", while being more stable and having fewer bugs than "dpm++2m". Only use classic "dpm++2m" and "karras" if the checkpoint can’t handle the above parameters, and only in the most extreme cases should you resort to "euler_a" and "normal", because this combination results in images with poor details but less bugs. As for the number of steps, I personally like 30 and 40, but they aren’t crucial. More steps doesn’t always mean better results. Sometimes, for a single character image, 20 steps is more than enough, and 40 might introduce a lot of bugs. The real purpose of steps is to randomly generate a composition you’re happy with, and if there are small bugs, fixing the random seed and adjusting the steps can sometimes eliminate them. CFG has a pretty big impact on the results. The default explanation on the site doesn’t really match how it actually feels when you use it. With so many combinations of different checkpoints and LoRAs, there’s no one-size-fits-all reference — you just have to experiment. From what I’ve noticed, in general, the lower the CFG, the more conservative the composition tends to be, and the higher it is, the more exaggerated or dramatic it gets.Fourth, resolution. Each checkpoint will clearly specify the recommended resolution to use. The default resolution for Tensor is well-supported across various checkpoints with relatively fewer bugs. However, it’s quite small. You can use upscaler to increase the image resolution, but many checkpoints can generate larger resolutions directly, as long as the width and height maintain the same ratio as the recommended resolution and are multiples of 64. However, one thing to note is that compared to the default resolution, larger resolutions will result in a larger background area, while smaller resolutions will tend to have the characters occupy more of the space. Changing the resolution, even with the same parameters and seed, will still generate different images. This is also an interesting aspect, so feel free to experiment more with it.
3
A bit of my experience with making AI-generated images and LoRAs ( 1 )

A bit of my experience with making AI-generated images and LoRAs ( 1 )

At the beginning, please allow me to apologize—English isn't my native language, and this article was written with the help of AI translation. There may be grammatical errors or technical term inaccuracies that could cause misunderstandings.https://tensor.art/articles/868883998204559176  ( 2 )https://tensor.art/articles/868884792773445944 ( 3 )https://tensor.art/articles/868885754846123117 ( 4 )https://tensor.art/articles/868890182957418586 ( 5 )And this article is not a guide or tutorial. If you have never used AI drawing, you may not understand it. If you have made pictures or Lora, it will be easier to find possible misunderstandings and errors in the article. The variables and approaches I mention are based on personal experience, and given AI's vast randomness, they may not be universally applicable, your own experiments might yield totally different results. You can use the experience in this article as a comparison, or even as potential "wrong answers" to rule out in your own workflow. Some of my friends have just started AI painting or are preparing to start, I just hope that this article will be of some help to them. Like many people, when I first heard about AI painting, I thought it was a joke. It wasn't until more and more AI pictures of popular characters appeared on social media that I gradually changed my mind. It turned out that there was a way to make doujin like this. However, the carnival of those star boys and the grand occasion of girls who were hundreds and thousands of times more than these star boys did not make me interested in AI painting. One day, on the homepage of Twitter, I saw a post featuring an extremely obscure boy - the kind that barely anyone knows about. Why? How? At that time, my mind was full of excitement except for questions.After that, I would search for the names that lingered in my heart forever on Twitter or pixiv every day, hoping for a miracle to appear, and then, it really appeared.So I continued to wait for the results of others as if I was longing for a miracle, and still didn't think about whether I should try it. I didn't even know about websites like civitai or tensor at that time. More and more people started to make AI pictures, and then I knew about the existence of these websites from their links.These online AIs became a place for daily prayers. I never even clicked the start button, and just indulged in the joy of winning the lottery. Those pioneers shared new pictures and new things called lora every day. One day, I saw the lora of the boy that I was most fascinated with. Finally, I couldn't help it. I clicked the button, figured out how to start, copied other people's prompts, and replaced them with my favorite boy. In this way, I began to try to make pictures myself - the pictures full of bugs. The excitement gradually faded. It turned out that AI couldn't do it, or I couldn't do it. Questions replaced the excitement and occupied my brain. I continued to copy, paste, copy, and paste. Why were other people's results so good, but mine were always so disappointing? At that time, I thought that people didn't need to know the principle of fire, as long as they could use it. I just tried repeatedly without thinking.At this time, there was a phenomenon on the Internet that was becoming more and more common.That is, just hearing about AI drawing in the previous second, the next second, copying and pasting began, and the next second, sponsorship was opened to sell those error-filled picture packages - when the creators of those checkpoint and the creators of lora completely disagreed. What's worse, steal these picture packages and resell them. Not to mention those who stole the pictures that people released for free and sold them. The copyright of AI was originally controversial, and I had my own doubts, but these thieves were too utilitarian and too despicable. Why? How? In anger, I no longer have any doubts. I must think about how AI graphics came about, and I must know that fire needs air to burn and how to extinguish it. -- I have to regain the original motivation for using the Internet when I was still a chuuni boy -- sharing the unknown, sharing the joy, at least sharing it with my friends. I have no power to fight against those shameless guys, but they can never invade my heart. I'm sorry for writing so much nonsense. Let's write down my thoughts and experiences over the past year in a way that's easier to understand.These understandings are only for make character doujin and are not applicable to more creative uses. These experiences are mainly based on the use of illustrious and noobai and their derivative models. They are also based on using only prompt and basic workflows. More complex workflows, such as Controlnet, are not discussed. They can indeed produce more complete pictures, but they are still too cumbersome for spare time. Just writing prompts is enough to generate eye-catching pictures in the basic workflow. First of all, the most important conclusion is that AI drawing is still a probability game at present. All we can do is to intervene in this probability as much as possible to increase the possibility of the expected result. How to improve or change this probability to improve the quality? We need to understand a concept first. The order of AI drawing and human drawing (just digital, not traditional media) is completely opposite. When people draw, they will first conceive the complete scene of the image, then fill in the details, line draft, color, and then zoom in on a certain area to refine, such as the eyes. At this time, the resolution is very high, and even if you zoom back to the full picture, it will not affect the details. However, AI is the opposite. It first generates small details, such as eyes, and then gradually zooms in to depict the full picture. Therefore, when the resolution remains unchanged and undergoes several zooms, those initial details may be blurred. This is why AI drawing a headshot is much better than a full-body picture. So the easiest way to improve the quality is to only do the upper body, or close-up.

Posts