Chroma Models - The Stuff I Know

Chroma is a finetune of FLUX Schnell created by Lodestones.

HF Repo: https://huggingface.co/lodestones/Chroma

Allegedly a technical report on Chroma may be released in the future.

Don't hold your breath though. From my own personal experience and others , Lodestone is not keen on explaining exactly what he is doing or what he is planning on doing with the Chroma models.

TLDR; There is no documentation for Chroma. We just have to figure it out ourselves.

I'm writing this guide despite having nothing close to factual information with regards to exact training data , recommended use and background information on the Chroma models.

Aside from the total lack of documentation , the Chroma models are an excellent upgrade to base FLUX model and Lodestones deserve full credit for his efforts.

The cost for training the Chroma models is allegedly (at present) over 200K USD for running the GPU.

Model Architecture for Chroma

Key feature is that this model has been pruned from the FLUX Schnell model , i.e the architecture is different.

The Keys for the .safetensor file of FLUX Dev fp8 (B) , FLUX KREA (K) and FLUX Chroma (C)

As such , don't expect good results from running FLUX Dev Trained LoRa on Chroma.

Another minor changes in archetecture include the removal of the CLIP_L encoder. Chroma relies solely on the T5 encoder.

Architecture (Sub-models)

The Chroma models have different versions

Chroma V49 - The latest trained checkpoint on Chroma. It has undergone 'high resolution training'. Unlike V48 , it is assumed Chroma V49 has undergone 'hi-res training' like the V50 , but not confirmed due to lack of documentation. https://tensor.art/models/895059076168345887

Chroma V50 Annealed - A checkpoint merge for the last 10 Chroma checkpoints V49-V39 , from which it has undergone 'hi res training'. https://tensor.art/models/895041239169116458

'Annealed' I have been told on Discord means that the model has undergone one final round of training through all 5 million images in training data at a very low learning rate.

Plans are to make the V50 Annealed the 'official' FLUX Chroma model under the name 'Chroma1-HD'

Chroma V50 - A bulls1t checkpoint merge created to secure funding for training the other checkpoint models. Don't use it.

Chroma V50 Heun - An 'accidental' checkpoint off-shoot that arose when training the Chroma model. It works surprisingly well for photorealism at 'Heun' or 'Euler' sampler with 'Beta' Scheduler at 10 steps 1 CFG , hence the model name. https://tensor.art/models/895078034153975868

Chroma V46 Flash - Another 'accidental' offshoot in training that boasts the highest stability in output of all the Chroma checkpoints. Try running at Euler Sampler with SGM Uniform sampler at 10 Steps , 1 CFG. An excellent model! https://tensor.art/models/889032308265331973

What model should I use for LoRa training?

Either V49 or V50 Annealed are excellent choices in my opinion.

The V49 and V50 Annealed models can both run at 10 steps with Beta Scheduler at CFG = 1 and Guidance Scale = 5 , at the cost of 0.4 credits per image generation here on Tensor.

Training

The Chroma model can do anime , furry and photorealistic content alike , including NSFW , using both natural language captions and danbooru tags.

The training data has been captioned using Google Gemma 12B model. A repo assembled by me has a collection of training text-image pairs used to train Chroma , which are stored as parquet files accessible via Jupyter Notebook in the same repo:

https://huggingface.co/datasets/codeShare/chroma_prompts/blob/main/parquet_explorer.ipynb

You'll need to download parquet to your Google Drive to read the prompts:

Example output from the E621 set

Lodestones repo's (⬆️ items from these sets are included in my Chroma prompts repo for ease of use) :

https://huggingface.co/datasets/lodestones/pixelprose

https://huggingface.co/datasets/lodestones/e621-captions/tree/main

Tip; Ask GROK on Twitter for Google Colab code to read items in these sets.

//---//

The Redcaps dataset

A peculiar thing is that Chroma is trained on the redcaps dataset

These are text - image pairs where the image is a image found on reddit and the trxt prompt is the title of the reddit post!

If you want to have fun time prompting Chroma; copy paste a reddit title either off the page , or from the chroma_prompts repo parquet files , and see for yourself.

Example of a redcaps prompt:

I found this blue thing in my backyard. 
Can someone tell me what it is?

The 'Aesthetic' tags

The pixelprose dataset used to train Chroma has an 'aesthetic' score assigned to each image as a float value

This value has been rounded down as 'aesthetic 1, aesthetic 2, .... , aesthetic 10'

Additionally , all AI images used to train Chroma has been tagged as 'aesthetic 11'

(more later)

Anime test

Prompt

what is the aesthetic 0 style
type of art?  anime screencap with a 
title in red text  Fox-like girl 
holding a wrench and a knife, 
dressed in futuristic armor, 
looking fierce with yellow eyes. 
Her outfit is a dark green cropped 
jacket and a skirt-like bottom. 
\: title the aesthetic 0 style 
poster "Aesthetic ZERO"

Captioning

Gemma 12B model was used to caption Chroma prompts , however this model dies not run on free tier T4 Colab GPU like the well established Joycaptions.

To mitigate this ; I'm training the Gemma 4B model to specialize in captioning images in the same format as the Chroma training data.

More info on the project here: https://huggingface.co/codeShare/flux_chroma_image_captioner

Finding Prompts

I recommend you visit the AI generator at for Chroma prompts. They have had the Chroma model for their T2i generator for awhile and there are lots of users posting to the galleries.

Its hard to browse old posts on perchance so it will do you well to 'rescue' some prompts and post them here to Tensor Art.

Resolutions

Refer to the standard values for Chroma and SDXL models