SDXL 1.0 (Distilled/Predictive)
Full FP32 Model
CLIP-G and CLIP-L are both trained and distilled.
Distilled: The clip models have been distilled from larger text/token projections to teacher the smaller model the latent shape of the larger.
Predictive: The clip model had separate training where the padding token was used as a mask. Allowing for the clip model to predict some additional context if the 75 token limit is not used.