Photoreal in Chroma - Things you can do


Updated:

Pixelprose is (likely) part of Chroma photoreal set:https://huggingface.co/datasets/lodestones/pixelprose

Since you are using training data; Clear the negatives completely.

I'm using Chroma V49 at Heun 10 steps with Beta Scheduler

CC12M Dataset (Chroma Training Data)

Excerpts from CC12M in pixelprose 'vlm_caption' field (without negatives):

image.png

PROMPT:

 a group of people on a cruise ship. 
There are approximately 25 people in 
the image. They are all wearing 
casual clothes and are standing 
around the pool on the ship. 
There is one person in the center 
of the image who is dressed up in 
a costume. They are wearing a pink 
and green tutu, a lei, and a large 
pair of sunglasses. They are also 
holding a tambourine. All of the 
people in the image are smiling 
and appear to be enjoying themselves. 
The background of the image is a 
blue sky with white clouds. The 
floor is made of wood and there 
are several chairs and tables 
around the pool. The image is a 
photograph. It is taken from a 
low angle and the people in the 
image are all in focus. The 
colors in the image are vibrant 
and the lighting is bright. 

NEG: (none)

//----//

image.png

PROMPT:

 A post-apocalyptic woman holding a 
crossbow. She is crouched on a 
pile of rubble. She is wearing a 
tattered gray cloak and a pair of 
goggles. Her face is dirty and she 
has a scar on her left cheek. Her 
hair is long and white. She is 
holding the crossbow in her right 
hand and it is pointed at the viewer.
 She has a knife in her left hand. 
The knife has a long, curved blade.
 The background is a blur of gray 
rubble. The image is in a realistic 
style and the woman's expression is 
one of determination.

NEG: (none)

( For photoreal one might add prompt to this prompt , or specify it using the 'aesthetic' tag)

Trying again (with fixes):

image.png

PROMPT:

 A post-apocalyptic real photo 
aesthetic woman holding a crossbow. 
She is crouched on a pile of rubble.
 She is wearing a tattered gray cloak 
and a pair of goggles. Her face is 
dirty and she has a scar on her left 
cheek. Her hair is long and white. 
She is holding the crossbow in her 
right hand and it is pointed at the 
viewer. She has a knife in her left 
hand. The knife has a long, curved 
blade. The background is a blur of 
gray rubble. The image is in a 
realistic style and the woman's 
expression is one of determination.

NEG:

fantasy_illustration gray_illustration 

(Negatives are tokenized one by one separated by whitespace hence the underscore '_' )

//----//

image.png

PROMPT:

A scene from the movie Planet of 
the Apes, where a group of monkeys 
are driving cars on a bridge. In 
the foreground, a monkey is 
standing on the roof of a car, 
while another is sitting in the 
driver's seat. In the background,
several other monkeys are driving 
cars, and one is standing on the 
roof of a car, holding a gun. 
The background is a destroyed city. 

NEG: (none)

//----//

image.png

PROMPT:

A man and a woman walking and 
talking. The man is on the left side 
of the image, and the woman is on 
the right side. They are both 
smiling. The man is wearing a dark 
blue suit jacket, pants, and shoes. 
The woman is wearing a white dress 
and matching shoes with a red clutch 
in her right hand. They are walking 
on a stone path lined with trees and
 grass on either side. In the 
background, there is a building 
with large windows. The image is a 
photograph taken from a slightly 
elevated angle. 

Negatives: (none)

//----//

Redcaps Dataset (Chroma Training Data)

A pecuiliar set within pixelprose is the Redcaps set.

TLDR; prompt like a reddit title w/o negatives , get photoreal results

Refer to for examples

Prompts from redcaps without negatives:

image.png

PROMPT:

 leaves in an alley 

NEG: (none)

image.png

PROMPT:

 i swear, his color just shines in the mornings. 

NEG: (none)

image.png

PROMPT:

 advice for a new owner? canon t7i, 24mm , f8./200s, 100 iso , r/beardeddragons , spiro the dragon 

NEG: (none)

The reason why ` canon t7i, 24mm , f8./200s, 100 iso ` is because its actual titles people use at r/amateurphotography (weirdos) , and thats part of the redcaps set , and thats why such nonsense terminology can be useful in chroma.

Finally photoreal NSFW:

We don't know what photoreal NSFW sets are used.

But writing prompts like a th0t on r/gonewild works for photoreal.

image.png
elf girl fundays. just got this high 
collared black bodysuit off amazon. 
gorgeous green background. Here is 
my white bed. real photo aesthetic. 
showing off my braids and nerd 
glasses. any love for an eighteen 
blonde elf ….🤔💕(f) ? 

NEG :

 onlyfans_footage casual_illustration 

Similarly I reckon writing pr0n video titles ought to work well for photorealistic NSFW.

Feel free to match the CC12M against the collection on NSFW story excerpts 1-30 with 1K paragraphs in each generator :

Batch encoding size for the T5 is 512 tokens. Verify the size here:

I'll leave that for something people can try for themselves with above tips as a guide.

Getty images

Getty Images hosts captions for their photos

Copy paste for easy photoreal results.

image.png
2012 Monaco Grand Prix - Saturday 
2012 Monaco Grand Prix - Saturday 
Monte Carlo, Monaco 26th May 2012 
Force India girls. Photo by Andrew 
Ferraro/LAT Images 

Negatives

television_screen plastic_wig 
gray_3D_blur 
image.png
Mel C performs at the V99 festival 
in Chelmsford on August 21st 1999 
CHELMSFORD, ENGLAND - AUGUST 21: 
ormer spice girl Melanie Chisholm 
performs her first major solo gig 
at the V99 festival in Chelmsford 
on August 21, 1999. (Photo by Dave 
Hogan/Getty Images) 

Negatives

television_screen plastic_wig 
gray_3D_blur 

Fangrowth Generator

For NSFW try this generator :

Works well in combination with :

For example: the tag 'Amateur' =>

I can’t think of a a few things we 
could do to make this pool more fun 
I don’t even know why I put a 
bathing suit on ;) Everything about 
this moment felt right Jiggly in 
all the best places Now I’m a tanned 
milf lol 
image.png

//---//

Finally the conclusion I draw from Gonkee's video on embeddings in SD models: https://youtu.be/sFztPP9qPRc?si=dckBPPpLeUMAoTnl

Repetition of concepts at various places prompts is better than adding weights

, as stuff like ( blah blah :1.2) was never intended use for the FLUX / Chroma model

4
0