Prompt size - things to be aware of

Pony models use score system

A quirk in training means one has to preface prompt with 'score_9 score_8 score_7 ' to reach the training data for the model. Problem might be fixed in latest pony model. Check docs.

Prompt does not have to be verbatim. Just a kind if 'vibe' is enough.

Illustrious models do not have this issue.

See 'prompt as soundwaves' article: https://tensor.art/articles/913912699476253681

To mix / enforce concepts its best to repeat stuff at different points in the prompt , with a good space of other kinds if text in between them

CLIP has context size of 75 tokens. You can count tokens here , see coverohoto for this article: sd-tokenizer dot rocker dot boo (Google it)

Exceeding the token limit <150 tokens will create two encoding vectors A and B , and the average (A+B)/2 will be used for input

You can also blend within each encodings , as long as there is a decent amount of separation , see article for further details why that works

//--//

For SDXL especially , avoid token lengths slightly above 75 tokens , or slighly above 150 tokens as that will create an empty batch encoding vector B to the actual 75 token prompt A , when calculated as the final encoding (A+B)/2 would just make A be added to an empty vector B , essentially the same as prompting at half strength (A:0.5) when only slightly exceeding 75 tokens in prompt length.

//---//

To fix this; repeat the prompt again so the prompt is at a size slightly below 150 tokens.

Or reduce word count so prompt is below 75 tokens.

Capital letters A-Z and symbols are counted as single tokens.

You can browse all of the token vectors here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/tokenizer/vocab.json