UrangDiffusion | an AingDiffusion XL sequel

CHECKPOINT
Reprint


Updated:

UrangDiffusion (oo-raw-ng Diffusion) is a sequel to AingDiffusion. This checkpoint is fully trained, unlike its predecessor.

The name “Urang” comes from Sundanese, meaning “We/Our/I.” The history behind the name is to make the model not only suitable for me but also for many people. Another reason is that I use many resources (training scripts, dataset collecting scripts, etc.) from other people. It’s unfair to claim this model as “my **** work.”

The model went through two steps of training: pretraining and finetuning. Pretraining is to make the model learn new things, while finetuning ensures the images produced by the model are decent (A.K.A. having a standard style) without mentioning style in the prompt.

Standard Prompting Guidelines

The model is finetuned from Animagine XL 3.1. However, I didn’t finetune the aesthetic tags trained with 3.1 due to some considerations. Therefore, the default prompt uses 3.0’s default prompting format:

  • Default prompt: 1girl/1boy, character name, from what series, everything else in any order, masterpiece, best quality

  • Default negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name

  • Default configuration: Euler a with around 25-30 steps, CFG 5-7, and ENSD set to 31337.

Training Configurations

Finetuned from: Animagine XL 3.1

Pretraining:

  • Dataset size: ~17,200 images

  • GPU: 1xA100

  • Optimizer: AdaFactor

  • Unet Learning Rate: 2.5e-6

  • Text Encoder Learning Rate: 1.25e-6

  • Batch Size: 48

  • Gradient Accumulation: 1

  • Epoch: 10 (epoch 8 is used)

Finetuning:

  • Dataset size: ~1,300 images

  • GPU: 1xA100

  • Optimizer: AdaFactor

  • Unet Learning Rate: 2e-6

  • Text Encoder Learning Rate: - (Train TE set to False)

  • Batch Size: 48

  • Gradient Accumulation: 1

  • Epoch: 10 (epoch 8 is used)

Added Series

Wuthering Waves and hololiveEN Justice have been added to the model. Warning, the dataset is very small, and it still struggles to generate the characters added accurately. You can generate them with alternate costumes, but if you’re trying to generate them following the official art, you will struggle a lot.

Special Thanks

  • My co-workers(?) at CagliostroLab for the insights and feedback.

  • Nur Hikari and Vanilla Latte for quality control.

  • Linaqruf, my tutor and role model in AI-generated images.

License

UrangDiffusion falls under the Fair AI Public License 1.0-SD license.

Hide

Version Detail

SDXL 1.0
15
UrangDiffusion XL v3.1 is fine-tuned from Animagine XL 4.0 Base (not Zero). This 4.0 Base model serves as the base model pre-trained for the final release of Animagine XL 4.0 (not the Opt version). I have received permission from the team to fine-tune the base model using my own method and release it under the UrangDiffusion series. Base model: Animagine XL 4.0 Base Fine-tuning details: Dataset size: ~1,600 images GPU: 1× A100 80GB Optimizer: AdaFactor UNet learning rate: 1.25e-6 Text encoder learning rate: N/A (disabled) Batch size: 48 Gradient accumulation: 1 Warmup steps: 5% Minimum SNR: 5 Due to some quirks of the model, please keep the following in mind: v3.0 may perform better with anatomy v3.1 may perform better with more fluid poses If you encounter anatomical issues at 28 steps, try lowering to 27 or increasing to 29. If it improves but isn’t perfect, continue adjusting slightly up or down. If the result worsens, the previous step count was likely the optimal one.

Project Permissions

Model reprinted from : https://civitai.com/models/537384?modelVersionId=597401

Reprinted models are for communication and learning purposes only, not for commercial use. Original authors can contact us to transfer the models through our Discord channel --- #claim-models.

Related Posts

Describe the image you want to generate, then press Enter to send.