RouWei

CHECKPOINT
Reprint


Updated:

creator: Minthybasis

(PLEASE VISIT THE CIVITAI PAGE FOR FULL DESCRIPTION OF THE MODEL, I COULDN'T FIT IT ALL IN HERE)

In depth retraining of Illustrious to achieve best prompt adherence, knowledge and state of the art performance.

Large scale finetune using gpu cluster with a dataset of ~13M pictures (~4M with natural text captions)

  • New knowledge for characters, concepts, styles

  • The best prompt adherence among SDXL anime models at the moment of release

  • Solved main problems with tags bleeding and biases, common for Illustrious, NoobAi and other checkpoints

  • Excellent aesthetics and knowledge across a wide range of styles (over 50,000 artists, including hundreds of unique cherry-picked datasets from private galleries, including those received from the artists themselves)

  • High flexibility and variety without stability tradeoff

  • No more annoying watermarks for popular styles thanks to clean dataset

  • Vibrant colors and smooth gradients without trace of burning, full range even with epsilon

  • Pure training from Illustrious v0.1 without involving third-party checkpo

There are also some issues and changes compared to the previous version, please RTFM.

Vpred version for v0.8 is baking, will be soon.

Dataset cut-off - end of April 2025.

FEATURES AND PROMPTING:

Important change:

When you are prompting artist styles, especially mixing several, their tags MUST BE in a separate CLIP chunk. Just add BREAK after it (for A1111 and derivatives), use conditioning concat node (for Comfy) or at least put them in the very end. Otherwise, significant degradation of results is likely.

BASIC:

The checkpoint works both with short-simple and long-complex prompts. However, if there are contradictory or weird things - unlike with others they won't be ignored affecting the output. No guide-rails, no safeguards, no lobotomy.

Just prompt what you want to see and don't prompt what shouldn't be on the picture. If you want to have a view from above - don't put ceiling into positive, if you want to have crop view with head out of frame - don't make detailed description of character facial features, and so on. Pretty simple but sometimes people are missing it.

Version 0.8 comes with advanced understanding of natural text prompts. It doesn't mean that you are obligated to use it, tags only - completely fine, especially because understanding of tags combinations is also improved.

Do not expect it to perform like Flux or other models based on T5 or LLM text encoders. The whole size ot SDXL checkpoint is less then only that text encoder, in addition illustrious-v0.1 which is used as the base completely forgot a lot of general things from vanilla sdxl-base.

However, even in current state it works much better, allows to do new things usually impossible without external guidance, as well making manual editing, inpainting, etc more convenient.

To achieve best performance you should keep track of CLIP chunks. In SDXL the prompt is separated into a chunks of 75 (77 including BOS and EOS) tokens, that are processing by CLIP separately, and only then are concatinating and comes as conditions to unet.

If you want to specify some features for character/object and separate them from other prompt parts - make sure they are in the same chunk and optionally separate it with BREAK. It will not solve problem of traits mixing completely, but can reduce it improving overall understanding, since text encoders on RouWei are able to process the whole sequence, not individual concepts better then others.

Dataset contains only booru-style tags and natural text expressions. Despite having a share of furries, real life photos, western media, etc. all captions have been converted to classic booru style to avoid a number of problems from mixing of different systems. So e621 tags won't be understanded properly.

Sampling parameters:

  • ~1 megapixel for txt2img, any AR with resolution multiple of 32 (1024x1024, 1056x, 1152x, 1216x832,...). Euler_a, 20..28steps.

  • CFG: for epsilon version 4..9 (7 is best), for vpred version, 3..5

  • Sigmas multiply may improve results a bit, CFG++ samplers work fine. LCM/PCM/DMD/... and exotic samplers untested.

  • Some schedulers doesn't work well.

  • Highresfix - x1.5 latent + denoise 0.6 or any gan + denoise 0.3..0.55.

Quality classification:

Only 4 quality tags:

masterpiece, best quality (for positives)

low quality, worst quality (for negatives)

Nothing else. Actually you can even omit positive and reduce negative to low quality only, since they can affect basic style and composition.

Meta tags like lowres have been removed and don't work, better not to use them. Low resolution images have been either removed or upscaled and cleaned with DAT depending on their importance

Negative prompt:

worst quality, low quality, watermark

That's all, no need of "rusty trombone", "farting on prey" and others. Do not put tags like greyscale, monochrome in negative unless you understand what are you doing. It will lead to burning and over-saturation, colors are fine out of box.

Artist styles:

Grids with examples, list (also can be found in "training data").

Used with "by " it's mandatory. Multiple give very interesting results, can be controlled with prompt weights.

General styles:

2.5d, anime screencap, bold line, sketch, cgi, digital painting, flat colors, smooth shading, minimalistic, ink style, oil style, pastel style

Booru tags styles:

1950s (style), 1960s (style), 1970s (style), 1980s (style), 1990s (style), 1990s (style), animification, art nouveau, pinup (style), toon (style), western comics (style), nihonga, shikishi, minimalism, fine art parody

and everything from this group.

Can be used in combinations (with artists too), with weights, both in positive and negative prompts.

Characters:

Use full name booru tag and proper formatting, like "karin_(blue_archive)" -> "karin \(blue_archive\)", use skin tags for better reproducing, like "karin \(bunny \(blue_archive\)". Autocomplete extension might be very useful.

Natural text:

Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you.

Dataset contains over 800k of pitures with hybrid natural-text captions made by Opus-Vision, GPT-4o and ToriiGate

Lots of Tail/Ears-related concepts:

tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tail through clothes, tail under clothes, lifted by tail, tail biting, tail insertion, tail masturbation, holding with tail, ...

(booru meaning, not e621) and many others with natural text. The majority works perfectly, some requires rolling.

Brightness/colors/contrast:

You can use extra meta tags to control it:

low brightness, high brightness, low gamma, high gamma, sharp colors, soft colors, hdr, sdr, limited range

Example

They work both in epsilon and vpred version and works really good.

Unfortunately here is an issue - the model relies on them too much. Without low brightness or low gamma or limited range (in negative) it might be difficult to achieve true 0,0,0 black, the same often true for white.

Both epsilon and vpred versions have like true zsnr, full range of colors and brightness without common flaws observed. But they behaves differently, just try it.

Known issues:

Off course there are:

  • As mentioned, model relies too much on brightness meta tags, so you'll have to use them to get full performance

  • Vpred version has problems with chunks padding or smth else, solved with BREAK

  • Inferior in furry-related knowledge compared to NoobAi

  • Some cherry-picked character datasets have prompting issues - Yozora and few cute fox-girls are not consistent

  • A little small details polishing finetune or lora would be nice, it's up to community

  • To be discovered

Requests for artists/characters in future models are open. If you find artist/character/concept that perform weak, inaccurate or has strong watermark - please report, will add them explicitly. Follow for a new versions.

JOIN THE DISCORD SERVER

License:

Same as illustrious. Fell free to use in your merges, finetunes, ets. just please leave a link.

How it's made

I'll consider to make a report or something like it later.

In short, 98% of work is related to dataset preparations. Instead of blindly relying on loss-weighting based on tag frequency from nai paper, a custom guided loss-weighting implementation along with asynchronous collator for balancing have been used. Ztsnr (or close to it) with Epsilon prediction was achieved using noise scheduler augmentation.

Thanks:

First of all I'd like to acknowledge everyone who supports open source, develops in improves code. Thanks to the authors of illustrious for releasing model, thank to NoobAI team for being pioneers in open finetuning of such a scale, sharing experience, raising and solving issues that previously went unnoticed.

Personal:

Artists wish to remain anonymous for sharing private works; Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; T.,[] - datasets, testing, advices; rred, dga, Fi., ello - donations; other fellow brothers that helped. Love you so much ❤️.

And off course everyone who made feedback and requests, it's really valuable.

If I forgot to mention anyone, please notify.

Donations

If you want to support - share my models, leave feedback, make a cute picture with kemonomimi-girl. And of course, support original artists.

AI is my hobby, I'm spending money on it and not begging for donations. However, it has turned into a large-scale and expensive undertaking. Consider to support to accelerate new training and researches.

(Just keep in mind that I can waste it on alcohol or cosplay girls)

BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c

ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db

if you can offer gpu-time (a100+) - PM.

Version Detail

Illustrious

Project Permissions

Model reprinted from : https://civitai.com/models/950531

Reprinted models are for communication and learning purposes only, not for commercial use. Original authors can contact us to transfer the models through our Discord channel --- #claim-models.

Related Posts

Describe the image you want to generate, then press Enter to send.